{ "info": { "author": "David Powell", "author_email": "BitLooter@users.noreply.github.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Libraries", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: System :: Archiving", "Topic :: Text Processing :: Markup :: HTML" ], "description": "HTMLArk\n=======\n\n.. image:: https://img.shields.io/github/downloads/BitLooter/htmlark/total.svg\n :target: https://github.com/BitLooter/htmlark\n.. image:: https://img.shields.io/pypi/v/HTMLArk.svg\n :target: https://pypi.python.org/pypi/HTMLArk\n.. image:: https://img.shields.io/pypi/l/HTMLArk.svg\n :target: https://raw.githubusercontent.com/BitLooter/htmlark/master/LICENSE.txt\n\nEmbed images, CSS, and JavaScript into an HTML file. Through the magic of `data URIs `_, HTMLArk can save these external dependencies inline right in the HTML. No more keeping around those \"reallycoolwebpage_files\" directories alongside the HTML files, everything is self-contained.\n\nNote that this will only work with static pages. If an image or other resource is loaded with JavaScript, HTMLArk won't even know it exists.\n\nInstallation and Requirements\n-----------------------------\nPython 3.5 or greater is required for HTMLArk.\n\nInstall HTMLArk with ``pip`` like so:\n\n.. code-block:: bash\n\n pip install htmlark\n\nTo use the `lxml `_ (recommended) or `html5lib `_ parsers, you will need to install the lxml and/or html5lib Python libraries as well. HTMLArk can also get resources from the web, to enable this functionality you need `Requests `_ installed. You can install HTMLArk with all optional dependencies with this command:\n\n.. code-block:: bash\n\n pip install htmlark[http,parsers]\n\n\nIf you want to install it manually, the only hard dependency HTMLArk has is `Beautiful Soup 4 `_.\n\n\nCommand-line usage\n------------------\nYou can also get this information with ``htmlark --help``.\n\n::\n\n usage: htmlark [-h] [-o OUTPUT] [-E] [-I] [-C] [-J]\n [-p {html.parser,lxml,html5lib,auto}] [-v] [--version]\n [webpage]\n\n Converts a webpage including external resources into a single HTML file. Note\n that resources loaded with JavaScript will not be handled by this program, it\n will only work properly with static pages.\n\n positional arguments:\n webpage URL or path of webpage to convert. If not specified,\n read from STDIN.\n\n optional arguments:\n -h, --help show this help message and exit\n -o OUTPUT, --output OUTPUT\n File to write output. Defaults to STDOUT.\n -E, --ignore-errors Ignores unreadable resources\n -I, --ignore-images Ignores images during conversion\n -C, --ignore-css Ignores stylesheets during conversion\n -J, --ignore-js Ignores external JavaScript during conversion\n -p {html.parser,lxml,html5lib,auto}, --parser {html.parser,lxml,html5lib,auto}\n Select HTML parser. If not specifed, htmlark tries to\n use lxml, html5lib, and html.parser in that order (the\n 'auto' option). See documentation for more\n information.\n -v, --verbose Prints information during conversion\n --version Displays version information\n\n\nUsing HTMLArk as a module\n-------------------------\nYou can also integrate HTMLArk into your own scripts, by importing it and calling ``convert_page``. Example:\n\n.. code-block:: python\n\n import htmlark\n packed_html = htmlark.convert_page(\"samplepage.html\", ignore_errors=True)\n\nDetails::\n\n def convert_page(page_path: str, parser: str='auto',\n callback: Callable[[str, str, str], None]=lambda *_: None,\n ignore_errors: bool=False, ignore_images: bool=False,\n ignore_css: bool=False, ignore_js: bool=False) -> str\n\n Take an HTML file or URL and outputs new HTML with resources as data URIs.\n\n Parameters:\n pageurl (str): URL or path of web page to convert.\n Keyword Arguments:\n parser (str): HTML Parser for Beautiful Soup 4 to use. See\n `BS4's docs. `_\n Default: 'auto' - Not an actual parser, but tells the library to\n automatically choose a parser.\n ignore_errors (bool): If ``True`` do not abort on unreadable resources.\n Unprocessable tags (e.g. broken links) will simply be skipped.\n Default: ``False``\n ignore_images (bool): If ``True`` do not process ```` tags.\n Default: ``False``\n ignore_css (bool): If ``True`` do not process ```` (stylesheet) tags.\n Default: ``False``\n ignore_js (bool): If ``True`` do not process ``