{ "info": { "author": "Taher Shihadeh", "author_email": "taher@unixwars.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Topic :: Documentation", "Topic :: Utilities" ], "description": "*wnget*: web novel getter\n=========================\n\nRarely will web novel sites provide any means to read their contents off\nline, and this is precisely why *wnget* came to be. It is a tool to\nscrape web novels from blogs, and optionally convert them to epub\nformat. It provides several options to configure the exact behaviour,\nwhile at the same time trying to provide sane defaults. The strings for\nthe next/previous navigation links, as well as CSS class for\ntitle/content containers can be configured, among other settings.\n\nBesides the main *wnget* utility, the suite include *wnbook*, to\ngenerate ebooks from the downloaded contents, and *wnlocal*, to rewrite\nthe links of html files that can be resolved to a local file using the\nsame rules applied by *wnget*.\n\nInstallation\n------------\n\n.. code-block:: shell\n\n $ pip install wnget\n\nIf you happened to have cloned the repo and are playing with the code,\nyou probably want to install *wnget* in \"editable\" mode while you\u2019re\nworking on it. This is so it becomes both installed and editable in\nproject form.\n\nAssuming you\u2019re in the root of the project, just run:\n\n.. code-block:: shell\n\n $ pip install -e .\n\nConfiguration\n-------------\n\nAny argument that starts with '--' can also be set in a config file\n(./wnget.conf or ~/.wnget.conf or specified via -s). The\nrecognized syntax for setting (key, value) pairs is based on the INI and YAML\nformats (e.g. key=value or foo=TRUE). If an argument is specified in more than\none place, then commandline values override config file values which override\ndefaults.\n\nThe default configuration looks like this:\n\n.. code-block:: yaml\n\n keeplinks: no\n firsttitle: no\n next: \"Next Chapter\"\n previous: \"Previous Chapter\"\n titleclass: entry-title\n contentclass: entry-content\n epub: default_ebook\n limit: 0 # No limit if < 1\n\nSince every web novel site will have a unique theme, it is unlikely the\ndefault configurations shipped with _wnget_ will work out of the box (unless\nyou happen to be addicted to the same ones as yours truly). Feel free to\nextend data/sites.yaml for your favorite sites and issue a pull request.\n\nUsage\n-----\n\n.. code-block:: shell\n\n $ wnget --help\n usage: wnget [-h] [-s file] [-k] [-f] [-n caption] [-p caption] [-t css_class]\n [-c css_class] [-e title] [-l N] [-v]\n first_url [last_url]\n\n positional arguments:\n first_url first URL to crawl\n last_url optional last URL to crawl (stops after reaching)\n\n optional arguments:\n -h, --help show this help message and exit\n -s file, --settings file\n config file path\n -k, --keeplinks rewrite and keep navigation links in HTML content\n -f, --firsttitle keep first title match instead of trying to be smart\n about it\n -n caption, --next caption\n next link caption (default: 'Next Chapter')\n -p caption, --previous caption\n previous link caption (default: 'Previous Chapter')\n -t css_class, --titleclass css_class\n title container class (default: 'entry-title')\n -c css_class, --contentclass css_class\n content container class (default: 'entry-content')\n -e title, --epub title\n create Epub with this title (will use cover.jpg/png if\n found)\n -l N, --limit N crawl at most N pages\n -v, --version show program's version number and exit\n\nUsage examples\n--------------\n\nInvoke each command without arguments to display help information.\n\nThe scraper looks for a given CSS class in the title and content containers,\nand those can be set manually to suit your web novel site of choice. It also\nlooks for links with default strings to find the next and previous chapters,\nand this can also be set by hand.\n\nTo scrape all chapters of a given web novel, following links, and saving\neach chapter in a different html file in the current directory:\n\n.. code-block:: shell\n\n $ wnget http://example.com/first_chapter_link\n\nOr, for more advanced uses, downloading all chapters until a given link\nis retrieved, and generate an EPUB with the loot:\n\n.. code-block:: shell\n\n $ wnget -e \"My Web Novel\" \\\n http://example.com/first_chapter_link \\\n http://example.com/first_chapter_link\n\nAdditionally, the ebook functionality can be used directly through the\n*wnbook* standalone utility. Just provide the HTML index file and a\nname for the book, and it will generate an ebook with all referenced\nresources in the working directory:\n\n.. code-block:: shell\n\n $ wnbook index.html \"My Web Novel\"\n\nAlso, if a cover.png or cover.jpg file is present, it will be used as\ncover page. Its use as standalone command will often prove more\nflexible, as it exposes features not normally used by the main\napplication, while allowing some manual tweaking of the downloaded\ncontents and index files.\n\nHere, generating a book with relative paths, and custom filename, cover\nimage, and language/author metadata:\n\n.. code-block:: shell\n\n $ wnbook ../index.html \"I Shall Seal the Heavens (\u6211\u6b32\u5c01\u5929)\" \\\n --filename=issth.epub --language=zh --author=\"Ergen (\u8033\u6839)\" \\\n --cover ~/images/MengHao.png\n\nAnd finally, to rewrite links within a file so that they point to\nalready downloaded resources, *wnlocal* can be used.\n\nEither to print the converted file to stdout:\n\n.. code-block:: shell\n\n $ wnlocal introduction.html\n\nOr to write it back to disk:\n\n.. code-block:: shell\n\n $ wnlocal introduction.html newfile.html\n\n\nTODO\n----\n\n+ Make content selection more flexible: by tag, class, caption, or xpath.\n+ Add option for elements to be removed during the parsing stage.\n+ Add interactive title selection mode (and ability to repeat choice).\n+ In-content image support (download, and store locally).", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/unixwars/wnget", "keywords": "webscraper epub ebook webnovel", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "wnget", "package_url": "https://pypi.org/project/wnget/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/wnget/", "project_urls": { "Homepage": "https://github.com/unixwars/wnget" }, "release_url": "https://pypi.org/project/wnget/0.3.1/", "requires_dist": [ "requests (>=2.9)", "lxml (>=3.5)", "eventlet (>=0.18)", "ebooklib (>=0.15)", "configargparse (>=0.10)", "pyyaml (>=3)" ], "requires_python": "", "summary": "web novel getter, a web scraping and ebook generation tool.", "version": "0.3.1" }, "last_serial": 1949248, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "b4664643eed3c61b34711cdbae72b46f", "sha256": "dfa106060977b489a69563cd99ce86e47ac0d69858d7000c7035adfb0ee221f3" }, "downloads": -1, "filename": "wnget-0.2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "b4664643eed3c61b34711cdbae72b46f", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 10937, "upload_time": "2016-01-31T18:08:45", "url": "https://files.pythonhosted.org/packages/37/c8/aa05e73f99b8cf844b7f1a9d97a13ddb373d41c5a8ce35a64f7d23f386ad/wnget-0.2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9c7263fd68bc2021ae183e8e123c4ba3", "sha256": "8054bf44865fcbba1ad85cdef7d59a490fd14f41dff2b6ee8972057833a0a3e6" }, "downloads": -1, "filename": "wnget-0.2.0.tar.gz", "has_sig": false, "md5_digest": "9c7263fd68bc2021ae183e8e123c4ba3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8922, "upload_time": "2016-01-31T18:08:57", "url": "https://files.pythonhosted.org/packages/ad/33/fef8b788e8cc2c65e022ae6eefac77defcaaea721903ec71bbf7fd4a0387/wnget-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "a46c1d3d01d7a8c00a1fffaf94a2d6ff", "sha256": "af2201bfa2c75182dfbe7a94e8ea37eead78b7c0e4c1c8656ab9ecc509ecf2c3" }, "downloads": -1, "filename": "wnget-0.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a46c1d3d01d7a8c00a1fffaf94a2d6ff", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 11053, "upload_time": "2016-02-02T21:40:27", "url": "https://files.pythonhosted.org/packages/34/c6/e54b505251dad297960399df9d43b073c4cc25101b54568603ec2f4badce/wnget-0.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a1c86bb58f41248da3955bd015f3cfee", "sha256": "270226aaa2ce647fd14c72e2460613cb6ce73d2ef94e4dfb2489c56780c58f13" }, "downloads": -1, "filename": "wnget-0.3.0.tar.gz", "has_sig": false, "md5_digest": "a1c86bb58f41248da3955bd015f3cfee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10328, "upload_time": "2016-02-02T21:40:32", "url": "https://files.pythonhosted.org/packages/a7/19/85abd15355602770cff4bf6be588eec21f67cb80603e671392b44a6e8818/wnget-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "058dbb8f6bb780ced82512f5148c4305", "sha256": "1c3943ed16633d8d10905ddf5b905b36c708a9568affff8af7e3fbd61103e9e2" }, "downloads": -1, "filename": "wnget-0.3.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "058dbb8f6bb780ced82512f5148c4305", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15160, "upload_time": "2016-02-10T13:54:25", "url": "https://files.pythonhosted.org/packages/47/60/bd5c051d83f71be8f5534cdce21f3e711d4d66b17ac93da9d74f897306f4/wnget-0.3.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f9536d569a123fbb65aafe7e2be9a573", "sha256": "b347f4bf7289451935f034af763c542a4141b88c15002d51a3a54b1a77f29907" }, "downloads": -1, "filename": "wnget-0.3.1.tar.gz", "has_sig": false, "md5_digest": "f9536d569a123fbb65aafe7e2be9a573", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14401, "upload_time": "2016-02-10T13:54:35", "url": "https://files.pythonhosted.org/packages/7c/b9/cc592973acc5c1a1a11f9e187f8e5b2a21f8107c761678d55f3e584072a4/wnget-0.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "058dbb8f6bb780ced82512f5148c4305", "sha256": "1c3943ed16633d8d10905ddf5b905b36c708a9568affff8af7e3fbd61103e9e2" }, "downloads": -1, "filename": "wnget-0.3.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "058dbb8f6bb780ced82512f5148c4305", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15160, "upload_time": "2016-02-10T13:54:25", "url": "https://files.pythonhosted.org/packages/47/60/bd5c051d83f71be8f5534cdce21f3e711d4d66b17ac93da9d74f897306f4/wnget-0.3.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f9536d569a123fbb65aafe7e2be9a573", "sha256": "b347f4bf7289451935f034af763c542a4141b88c15002d51a3a54b1a77f29907" }, "downloads": -1, "filename": "wnget-0.3.1.tar.gz", "has_sig": false, "md5_digest": "f9536d569a123fbb65aafe7e2be9a573", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14401, "upload_time": "2016-02-10T13:54:35", "url": "https://files.pythonhosted.org/packages/7c/b9/cc592973acc5c1a1a11f9e187f8e5b2a21f8107c761678d55f3e584072a4/wnget-0.3.1.tar.gz" } ] }