{ "info": { "author": "Grzeoorz Szczudlik", "author_email": "2914011+szczad@users.noreply.github.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.2" ], "description": "Night Crawler\n=============\n\nDescription\n-----------\n\nThe NightCrawler is site crawling/spider tool to gather links at the given domain by walking through\nthe whole site and generating simple sitemap.\n\nLimitations\n-----------\n\nThis tools is just a demo. It's single-threaded script that walks every page it gets and it's\nnot optimized for speed.\n\nThe script sticks to the url provided and does not dive into subdomains of the given domain\neven if encounters internal redirect like `example.com` -> `www.example.com`\n\nPossible enhancements\n---------------------\n\n* Use multi-threading with thread pools\n* Use generators to lower memory footprint and gain a bit more speed\n* Make preliminary HEAD request to distinguish between text and binary files\n* Check Content-Type and exclude files that are not HTMLs\n* Add matchers and sitemap generators for additional sitemap flavour (images, videos, etc.)\n* More tests (already included tests are only for the most critical classes)\n\nInstallation\n------------\n\n1. Requirements\n~~~~~~~~~~~~~~~\n\n1. Python >= 3.2\n2. PIP\n\n2a. Installation without virtualenv\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nRun the following command in shell:\n\n.. code-block:: bash\n\n pip install NightCrawler\n\n2b. Installation in virtualenv\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRun the following command in shell:\n\n.. code-block:: bash\n\n virtualenv .env\n . .env/bin/activate\n pip install NightCrawler\n\n2c. Installation from source (development)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo install the package from source one have to create virtualenv after cloning the repository\n\n.. code-block:: bash\n\n git clone https://github.com/szczad/NightCrawler.git\n cd NightCrawler\n virtualenv .env\n . .env/bin/activate\n pip install -e ./\n\n3. (optional) Testing\n~~~~~~~~~~~~~~~~~~~~~\n\nWhen installed from sources in development mode the script can be tested with the following command\n\n.. code-block:: bash\n\n . .env/bin/activate\n python setup.py test\n\nRunning the script\n------------------\n\n0. Help\n~~~~~~~\n\n.. code-block:: bash\n\n nightcrawler --help\n\n1. Running the script installed globally\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: bash\n\n nightcrawler \n\n2. Running the script installed in virtualenv\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: bash\n\n /bin/nightcrawler \n\nor\n\n.. code-block:: bash\n\n . .env/bin/activate\n nightcrawler \n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/szczad/NightCrawler", "keywords": "crawler spider website", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "NightCrawler", "package_url": "https://pypi.org/project/NightCrawler/", "platform": "", "project_url": "https://pypi.org/project/NightCrawler/", "project_urls": { "Homepage": "https://github.com/szczad/NightCrawler" }, "release_url": "https://pypi.org/project/NightCrawler/0.1.6/", "requires_dist": [ "beautifulsoup4", "requests", "lxml", "validators" ], "requires_python": "", "summary": "Website crawling bot", "version": "0.1.6" }, "last_serial": 4909712, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "4a76383d534dfaf0ad07ff4783dd3768", "sha256": "c546005f5dd7f1c40ee35fd7625de707d842478bbee2adc949ef69f443ea13d9" }, "downloads": -1, "filename": "NightCrawler-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4a76383d534dfaf0ad07ff4783dd3768", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8271, "upload_time": "2019-03-06T12:06:31", "url": "https://files.pythonhosted.org/packages/a5/22/4147a1de85b401a2c2b90a0a0ed2841ac983619e0aaa06c8156f1d6edd34/NightCrawler-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a1b8e0154f1ce9508b0944f7fc73364a", "sha256": "87d86a0f581fcc7d83d391c8b3275bba011ba47e99bbf95eb9a4fd3fbc195e73" }, "downloads": -1, "filename": "NightCrawler-0.1.1.tar.gz", "has_sig": false, "md5_digest": "a1b8e0154f1ce9508b0944f7fc73364a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4975, "upload_time": "2019-03-06T12:07:44", "url": "https://files.pythonhosted.org/packages/b7/72/988fb314c6491f1f7a840b09d5a13823c167afaba06035c7f358bf53a7e8/NightCrawler-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "1090609ffcf747dfce06ef4db3233e83", "sha256": "08265a5474a56c02f654c22d693fef0da03a50db25a4cab9291967ef5c9607cd" }, "downloads": -1, "filename": "NightCrawler-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "1090609ffcf747dfce06ef4db3233e83", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8263, "upload_time": "2019-03-06T12:13:58", "url": "https://files.pythonhosted.org/packages/d1/0e/1c92e677dc8d5f57d6ff63a421253eb73d381c8994f13215ab7cd694ca69/NightCrawler-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6bcb98037d23bbdbece08135d2fe3761", "sha256": "833e9b5ef2ed41a8e36f0ce19598dc1c2d39a01e516188c02c6c5756ef890933" }, "downloads": -1, "filename": "NightCrawler-0.1.2.tar.gz", "has_sig": false, "md5_digest": "6bcb98037d23bbdbece08135d2fe3761", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4975, "upload_time": "2019-03-06T12:13:59", "url": "https://files.pythonhosted.org/packages/50/af/e8d9cb722d69e5ec679c41e414a326eb2a714dabbf6580d7b6ef051e03da/NightCrawler-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "a232e59bd40355e98565d2e99a8f8ba3", "sha256": "f52007adb02123bf317b62d3f5754d2108d439266573661148cc8e0680c46f9b" }, "downloads": -1, "filename": "NightCrawler-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "a232e59bd40355e98565d2e99a8f8ba3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8269, "upload_time": "2019-03-06T12:19:38", "url": "https://files.pythonhosted.org/packages/31/70/26b9d9293a4913d469dcde048db3742de5a17dc3c8a29c3db4fda5a2d9f8/NightCrawler-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "adbafee7898f6dba37472a06332b1624", "sha256": "326cea59402f6244bea0dfbc3b5b0befbb3be3a3fbec719103eb207474bf7393" }, "downloads": -1, "filename": "NightCrawler-0.1.3.tar.gz", "has_sig": false, "md5_digest": "adbafee7898f6dba37472a06332b1624", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4974, "upload_time": "2019-03-06T12:19:39", "url": "https://files.pythonhosted.org/packages/9a/8a/88f1a0ddf87f1d7e8f5326ec3f908df5984c44d508660669a8c4b618afe4/NightCrawler-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "331eb1eccad3bbce374ff0ffd9bb48d4", "sha256": "4d4c7f67a8017d1dbfcdab3b1a1daef7252d0ef82670ce10fb932f313244cc00" }, "downloads": -1, "filename": "NightCrawler-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "331eb1eccad3bbce374ff0ffd9bb48d4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8260, "upload_time": "2019-03-06T12:21:02", "url": "https://files.pythonhosted.org/packages/9a/11/0a2d94cb049de1b321cd02946885b07551e7bc7249c07bf3faa1d1f8458a/NightCrawler-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bb5b0830aff58aaec5418441a0755a6d", "sha256": "f27bbf7f1390fa68baee283f271633c78d14e3432d2ea0d1816fbea547b8d5e8" }, "downloads": -1, "filename": "NightCrawler-0.1.4.tar.gz", "has_sig": false, "md5_digest": "bb5b0830aff58aaec5418441a0755a6d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4967, "upload_time": "2019-03-06T12:21:03", "url": "https://files.pythonhosted.org/packages/36/67/b1485af98e4949272c19cf9ac10b7fe00e29aaf510ef3cb30c2cbe975f51/NightCrawler-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "ae8059e7a4a7dd40d84e85590704e970", "sha256": "b48b392bcc80ef5b971d2141c7d8aba1ec290a3586a181bbb029d0543423e0b0" }, "downloads": -1, "filename": "NightCrawler-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "ae8059e7a4a7dd40d84e85590704e970", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8262, "upload_time": "2019-03-07T10:31:36", "url": "https://files.pythonhosted.org/packages/b1/66/906bd228118650ef53ad828ad6a1df1f162d52479416c79551a0a51dd6ed/NightCrawler-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1a654451a056e0183cc96ac136e62f5e", "sha256": "a3081be82fb909c95eae7a76293db70a30a9a50e9bd7a801c9f15754afa2284c" }, "downloads": -1, "filename": "NightCrawler-0.1.5.tar.gz", "has_sig": false, "md5_digest": "1a654451a056e0183cc96ac136e62f5e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4975, "upload_time": "2019-03-07T10:31:38", "url": "https://files.pythonhosted.org/packages/43/dd/5b0248d467d3ae8ff7668f768edc165caccbd7f2951a1c64590dfa2e2429/NightCrawler-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "4b4c296921da39941b268435ace11227", "sha256": "8f7ef6472322297c6faa4cee16e74688bd791844dfc8ed924c1e53638dd1b11c" }, "downloads": -1, "filename": "NightCrawler-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "4b4c296921da39941b268435ace11227", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8449, "upload_time": "2019-03-07T11:35:35", "url": "https://files.pythonhosted.org/packages/76/1c/3f936abbd8adda5863dbcf7d2e432ccd9d544481e8a0a661521599e47014/NightCrawler-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d741471fa100b57519f521aff7cfc62f", "sha256": "85bbcaeeb7817c88c542278047ce7d92628f29905a19b8b1b1cf0798e5488ef9" }, "downloads": -1, "filename": "NightCrawler-0.1.6.tar.gz", "has_sig": false, "md5_digest": "d741471fa100b57519f521aff7cfc62f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5141, "upload_time": "2019-03-07T11:35:37", "url": "https://files.pythonhosted.org/packages/e0/e7/e87f6beee9f249e9ddcae247c835762b598685607036f6ccec61fc41efbc/NightCrawler-0.1.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4b4c296921da39941b268435ace11227", "sha256": "8f7ef6472322297c6faa4cee16e74688bd791844dfc8ed924c1e53638dd1b11c" }, "downloads": -1, "filename": "NightCrawler-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "4b4c296921da39941b268435ace11227", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8449, "upload_time": "2019-03-07T11:35:35", "url": "https://files.pythonhosted.org/packages/76/1c/3f936abbd8adda5863dbcf7d2e432ccd9d544481e8a0a661521599e47014/NightCrawler-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d741471fa100b57519f521aff7cfc62f", "sha256": "85bbcaeeb7817c88c542278047ce7d92628f29905a19b8b1b1cf0798e5488ef9" }, "downloads": -1, "filename": "NightCrawler-0.1.6.tar.gz", "has_sig": false, "md5_digest": "d741471fa100b57519f521aff7cfc62f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5141, "upload_time": "2019-03-07T11:35:37", "url": "https://files.pythonhosted.org/packages/e0/e7/e87f6beee9f249e9ddcae247c835762b598685607036f6ccec61fc41efbc/NightCrawler-0.1.6.tar.gz" } ] }