{ "info": { "author": "Pawe\u0142 Adamczak", "author_email": "pawel.ad@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: End Users/Desktop", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Utilities" ], "description": "simple-site-crawler\n===================\n\n|Build status| |Test coverage| |PyPI version| |Python versions|\n|License|\n\nSimple website crawler that asynchronously crawls a website and all\nsubpages that it can find, along with static content that they rely on.\nYou can either use it as a library, inside your Python project or check\nout the provided CLI that can currently show you the crawled data\n(links, images, CSS and Javascript files) for each found site and create\na ``sitemap.xml`` file.\n\nCreated primarily to play with ``asyncio``, ``aiohttp`` and the new\n``async/await`` syntax, so:\n\n- it requires Python 3.5 or higher\n- new features are not planned at the moment; feel free to suggest them\n though, as I'm happy to implement them if someone will actually use\n them ; -)\n\nFull disclosure - halfway through the project I found\n`this `__\narticle (and code) which does pretty much exactly what I wanted and is\nco-written by the BDFL himself. Oh well. I still finished the project\nand didn't copy anything explicitly but it did influence some of my\nchoices. After all, if it's good enough for the creator of the language\nI'm using, it's probably good enough for me.\n\nInstallation\n------------\n\n>From PyPI:\n\n::\n\n $ pip3 install simple-site-crawler\n\nWith git clone:\n\n::\n\n $ git clone https://github.com/pawelad/simple-site-crawler\n $ pip3 install -r simple-site-crawler/requirements.txt\n $ cd simple-site-crawler/bin\n\nUsage\n-----\n\n::\n\n $ simple-site-crawler --help \n Usage: simple-site-crawler [OPTIONS] URL\n\n Simple website crawler that generates its sitemap and can either print it\n (and its static content) or export it to standard XML format.\n\n See https://github.com/pawelad/simple-site-crawler for more info.\n\n Options:\n -t, --max-tasks INTEGER Maximum allowed number of async tasks.\n -e, --export-to-xml Export sitemap to XML file.\n -s, --suppress Suppress printing output to stdout.\n --help Show this message and exit.\n\nAPI\n---\n\nThere's no proper documentation as of now, but the code is commented and\n*should* be pretty straightforward to use.\n\nThat said - feel free to ask me either via\n`email `__ or `GitHub\nissues `__ if\nanything is unclear.\n\nTests\n-----\n\nPackage was tested with the help of ``py.test`` and ``tox`` on Python\n3.5 and 3.6 (see ``tox.ini``).\n\nCode coverage is available at\n`Coveralls `__.\n\nTo run tests yourself you need to run ``tox`` inside the repository:\n\n.. code:: shell\n\n $ pip install -r requirements/dev.txt\n $ tox\n\nContributions\n-------------\n\nPackage source code is available at\n`GitHub `__.\n\nFeel free to use, ask, fork, star, report bugs, fix them, suggest\nenhancements, add functionality and point out any mistakes. Thanks!\n\nAuthors\n-------\n\nDeveloped and maintained by `Pawe\u0142\nAdamczak `__.\n\nReleased under `MIT\nLicense `__.\n\n.. |Build status| image:: https://img.shields.io/travis/pawelad/simple-site-crawler.svg\n :target: https://travis-ci.org/pawelad/simple-site-crawler\n.. |Test coverage| image:: https://img.shields.io/coveralls/pawelad/simple-site-crawler.svg\n :target: https://coveralls.io/github/pawelad/simple-site-crawler\n.. |PyPI version| image:: https://img.shields.io/pypi/v/simple-site-crawler.svg\n :target: https://pypi.python.org/pypi/simple-site-crawler\n.. |Python versions| image:: https://img.shields.io/pypi/pyversions/simple-site-crawler.svg\n :target: https://pypi.python.org/pypi/simple-site-crawler\n.. |License| image:: https://img.shields.io/github/license/pawelad/simple-site-crawler.svg\n :target: https://github.com/pawelad/simple-site-crawler/blob/master/LICENSE\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/pawelad/simple-site-crawler/releases/latest", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pawelad/simple-site-crawler", "keywords": "website crawler sitemap", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "simple-site-crawler", "package_url": "https://pypi.org/project/simple-site-crawler/", "platform": "", "project_url": "https://pypi.org/project/simple-site-crawler/", "project_urls": { "Download": "https://github.com/pawelad/simple-site-crawler/releases/latest", "Homepage": "https://github.com/pawelad/simple-site-crawler" }, "release_url": "https://pypi.org/project/simple-site-crawler/0.1.1/", "requires_dist": [ "aiodns (>=1.1.1)", "aiohttp (>=1.2.0)", "beautifulsoup4 (>=4.5.3)", "cchardet (>=1.1.2)", "click (>=6.7)", "html5lib (>=1.0b10)", "pytest; extra == 'testing'" ], "requires_python": "", "summary": "Simple website crawler that asynchronously crawls a website and all subpages that it can find, along with static content that they rely on.", "version": "0.1.1" }, "last_serial": 2607621, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "37152544d3a07b9279db4e63ab2548bb", "sha256": "7e4317fa3d85675560d56c05083904b361a0c92b00e3b7ca3c2ad036a5943b44" }, "downloads": -1, "filename": "simple_site_crawler-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "37152544d3a07b9279db4e63ab2548bb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14505, "upload_time": "2017-01-18T23:41:01", "url": "https://files.pythonhosted.org/packages/e1/62/0eafac1c281d00feeb3cca3c48263571694da2f5bf11ca622b03c59651e5/simple_site_crawler-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "96c39c1f7e9892df66607cbb4b946098", "sha256": "3009f6de271528996f34b9421c2ef41fae2c1c584967bd8665cbee52425005f4" }, "downloads": -1, "filename": "simple-site-crawler-0.1.0.tar.gz", "has_sig": false, "md5_digest": "96c39c1f7e9892df66607cbb4b946098", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12069, "upload_time": "2017-01-18T23:41:02", "url": "https://files.pythonhosted.org/packages/af/63/8a33754eb08318edceccd2d0d03f39fb70f38b942ca3e917bf3603e8f889/simple-site-crawler-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "eba58b782dbcc23c1283ac1207df50d8", "sha256": "523ac3b8ff62bf695864e8c7cebbddc9fdf868c56d33c4dfd621010397afb33d" }, "downloads": -1, "filename": "simple_site_crawler-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "eba58b782dbcc23c1283ac1207df50d8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14858, "upload_time": "2017-01-30T21:04:36", "url": "https://files.pythonhosted.org/packages/4f/37/5d1f6af51327ba580468c23600a00af01b59b51ff652e05b984547a7353b/simple_site_crawler-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fea01c46736e64371de1a0a1b4d9e5f1", "sha256": "28089a9c70d322e289a8a6d8added7c0c65cd2faee124731bb421be0c8fc9255" }, "downloads": -1, "filename": "simple-site-crawler-0.1.1.tar.gz", "has_sig": false, "md5_digest": "fea01c46736e64371de1a0a1b4d9e5f1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12227, "upload_time": "2017-01-30T21:04:38", "url": "https://files.pythonhosted.org/packages/cc/06/eb99ecea4e45d52761a525ee84b75282fb18d109b3442465185bc0c7d407/simple-site-crawler-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "eba58b782dbcc23c1283ac1207df50d8", "sha256": "523ac3b8ff62bf695864e8c7cebbddc9fdf868c56d33c4dfd621010397afb33d" }, "downloads": -1, "filename": "simple_site_crawler-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "eba58b782dbcc23c1283ac1207df50d8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14858, "upload_time": "2017-01-30T21:04:36", "url": "https://files.pythonhosted.org/packages/4f/37/5d1f6af51327ba580468c23600a00af01b59b51ff652e05b984547a7353b/simple_site_crawler-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fea01c46736e64371de1a0a1b4d9e5f1", "sha256": "28089a9c70d322e289a8a6d8added7c0c65cd2faee124731bb421be0c8fc9255" }, "downloads": -1, "filename": "simple-site-crawler-0.1.1.tar.gz", "has_sig": false, "md5_digest": "fea01c46736e64371de1a0a1b4d9e5f1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12227, "upload_time": "2017-01-30T21:04:38", "url": "https://files.pythonhosted.org/packages/cc/06/eb99ecea4e45d52761a525ee84b75282fb18d109b3442465185bc0c7d407/simple-site-crawler-0.1.1.tar.gz" } ] }