{ "info": { "author": "Linas Valiukas, Hal Roberts, Media Cloud project", "author_email": "linas@media.mit.edu, hroberts@cyber.law.harvard.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Internet :: WWW/HTTP :: Indexing/Search", "Topic :: Text Processing :: Indexing", "Topic :: Text Processing :: Markup :: XML" ], "description": ".. image:: https://travis-ci.org/berkmancenter/mediacloud-ultimate_sitemap_parser.svg?branch=develop\n :target: https://travis-ci.org/berkmancenter/mediacloud-ultimate_sitemap_parser\n :alt: Build Status\n\n.. image:: https://readthedocs.org/projects/ultimate-sitemap-parser/badge/?version=latest\n :target: https://ultimate-sitemap-parser.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n.. image:: https://coveralls.io/repos/github/berkmancenter/mediacloud-ultimate_sitemap_parser/badge.svg?branch=develop\n :target: https://coveralls.io/github/berkmancenter/mediacloud-ultimate_sitemap_parser?branch=develop\n :alt: Coverage Status\n\n.. image:: https://badge.fury.io/py/ultimate-sitemap-parser.svg\n :target: https://badge.fury.io/py/ultimate-sitemap-parser\n :alt: PyPI package\n\n\nWebsite sitemap parser for Python 3.5+.\n\n\nFeatures\n========\n\n- Supports all sitemap formats:\n\n - `XML sitemaps `_\n - `Google News sitemaps `_\n - `plain text sitemaps `_\n - `RSS 2.0 / Atom 0.3 / Atom 1.0 sitemaps `_\n - `Sitemaps linked from robots.txt `_\n\n- Field-tested with ~1 million URLs as part of the `Media Cloud project `_\n- Error-tolerant with more common sitemap bugs\n- Tries to find sitemaps not listed in ``robots.txt``\n- Uses fast and memory efficient Expat XML parsing\n- Doesn't consume much memory even with massive sitemap hierarchies\n- Provides a generated sitemap tree as easy to use object tree\n- Supports using a custom web client\n- Uses a small number of actively maintained third-party modules\n- Reasonably tested\n\n\nInstallation\n============\n\n.. code:: sh\n\n pip install ultimate_sitemap_parser\n\n\nUsage\n=====\n\n.. code:: python\n\n from usp.tree import sitemap_tree_for_homepage\n\n tree = sitemap_tree_for_homepage('https://www.nytimes.com/')\n print(tree)\n\n``sitemap_tree_for_homepage()`` will return a tree of ``AbstractSitemap`` subclass objects that represent the sitemap\nhierarchy found on the website; see a `reference of AbstractSitemap subclasses `_.\n\nIf you'd like to just list all the pages found in all of the sitemaps within the website, consider using ``all_pages()`` method:\n\n.. code:: python\n\n # all_pages() returns an Iterator\n for page in tree.all_pages():\n print(page)\n\n``all_pages()`` method will return an iterator yielding ``SitemapPage`` objects; see a `reference of SitemapPage `_.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser", "keywords": "sitemap sitemap-xml parser", "license": "GPLv3+", "maintainer": "", "maintainer_email": "", "name": "ultimate-sitemap-parser", "package_url": "https://pypi.org/project/ultimate-sitemap-parser/", "platform": "", "project_url": "https://pypi.org/project/ultimate-sitemap-parser/", "project_urls": { "Homepage": "https://github.com/berkmancenter/mediacloud-ultimate_sitemap_parser" }, "release_url": "https://pypi.org/project/ultimate-sitemap-parser/0.5/", "requires_dist": [ "python-dateutil (<3.0.0,>=2.1)", "requests (>=2.2.1)", "requests-mock (<2.0,>=1.6.0) ; extra == 'test'", "pytest (>=2.8) ; extra == 'test'" ], "requires_python": ">=3.5", "summary": "Ultimate Sitemap Parser", "version": "0.5" }, "last_serial": 5612859, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "692a04b92c908b92ede07de796606f49", "sha256": "52a9da9de8cbd9c04e3b1e001dfd21b0620954062070f8d3ea2992f3228f06e6" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "692a04b92c908b92ede07de796606f49", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 15996, "upload_time": "2018-11-29T18:10:18", "url": "https://files.pythonhosted.org/packages/f1/fb/a7c2d10935fdba98fa2b940fae9840db95c53744a8881bd1a33b58942e02/ultimate_sitemap_parser-0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4a89ff1aa74e4ec8437d17815bd16af9", "sha256": "43241b5538cd48297dc34e01aa168d10d99386e3771a5959d80238a8b56f7086" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.1.tar.gz", "has_sig": false, "md5_digest": "4a89ff1aa74e4ec8437d17815bd16af9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 13500, "upload_time": "2018-11-29T18:10:20", "url": "https://files.pythonhosted.org/packages/dc/c9/83a28a62dbbf8e1ef86d4d5db76204fc9d4bd062b2af3750596ea44fc537/ultimate_sitemap_parser-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "2fca11af6eda5392c7fa31bd1a70821c", "sha256": "10be1de9dd2e9869f7f5a79922689f5dd518809252e2b286743f0b5584f60a14" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2fca11af6eda5392c7fa31bd1a70821c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 16014, "upload_time": "2019-07-16T12:50:11", "url": "https://files.pythonhosted.org/packages/e3/ae/82cfe45970c11629dfc25440d9d8338e3fba12e399b9aceba85e5081dff4/ultimate_sitemap_parser-0.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "59bf10af69949ffe1a9af72ac0b2c744", "sha256": "9135b901bfa23274df2e7957af92cb165da78552a4bb5157ca008a39099021e6" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.2.tar.gz", "has_sig": false, "md5_digest": "59bf10af69949ffe1a9af72ac0b2c744", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 13804, "upload_time": "2019-07-16T12:50:13", "url": "https://files.pythonhosted.org/packages/80/6a/8a448face917dd965475a26974da003725a2a9e10f4a8b2b995708822986/ultimate_sitemap_parser-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "1a89cc65d4c6bbafe1b37fbbc86183ba", "sha256": "6cc351ca21613da1e9e8d2c3eed6dcc52267c6a9c7f3b9df9e9209177f6b4704" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "1a89cc65d4c6bbafe1b37fbbc86183ba", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 18061, "upload_time": "2019-07-17T15:07:20", "url": "https://files.pythonhosted.org/packages/fa/98/8f9277d607c6edf8cb3ac99ffaac6c954d1662e310d2c35d2240cbf68e61/ultimate_sitemap_parser-0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ff77d1fcc6e697e7a324f015ee670b11", "sha256": "7efde5726354e908e97f733c0fa3e7bc6cf7ca260e8775cb11eddd9ac1ef1434" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.3.tar.gz", "has_sig": false, "md5_digest": "ff77d1fcc6e697e7a324f015ee670b11", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 16039, "upload_time": "2019-07-17T15:07:22", "url": "https://files.pythonhosted.org/packages/ba/56/b55f5ed4373e551e1895e7e1f38cd4c7c2e38830f21304e30208ab309560/ultimate_sitemap_parser-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "b71a0dbe6ad974a245bda18575a98aaf", "sha256": "937f5d6e568fa5f3c655d932ab57c359bafde686b5f160af6ef6e268cbbbffb5" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "b71a0dbe6ad974a245bda18575a98aaf", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 21134, "upload_time": "2019-07-18T16:33:48", "url": "https://files.pythonhosted.org/packages/1b/ca/f98d7366f6fbeafe49494cf3e259681320e0c5c1bfbb0331479b9e3e69c4/ultimate_sitemap_parser-0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3a72a0c2ef4023fa661c97ea451bece0", "sha256": "8ea602a6b43964aab5707d57b2fee0d0dbbec74638f64e27f6e3cba7c95bcc9d" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.4.tar.gz", "has_sig": false, "md5_digest": "3a72a0c2ef4023fa661c97ea451bece0", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 18759, "upload_time": "2019-07-18T16:33:50", "url": "https://files.pythonhosted.org/packages/bc/83/91b87b3edde59c4b930bd376bc64f8193cd61f18c2de44aee1eeb3d0e7cc/ultimate_sitemap_parser-0.4.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "5479eb21fc1626a54642dc06ae9613de", "sha256": "806e723eeb0293c38e111822d651e987b1494ae9c08be82e73172ade667418a6" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "5479eb21fc1626a54642dc06ae9613de", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 23208, "upload_time": "2019-07-31T11:15:46", "url": "https://files.pythonhosted.org/packages/ee/58/a6394d980bda84c44b442a3bab5ceb49626d01d4b17fbc7fe6d41b90c496/ultimate_sitemap_parser-0.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "362e6e5d4b993d6e89eb4a259ccd029e", "sha256": "9825fefcdf515e2748addc7ec5dcdb6430dfdd4ef5de4a54e39de1e7613d0ece" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.5.tar.gz", "has_sig": false, "md5_digest": "362e6e5d4b993d6e89eb4a259ccd029e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 20229, "upload_time": "2019-07-31T11:15:47", "url": "https://files.pythonhosted.org/packages/21/44/04eada3b1b1f825eb18b93e385ff652778c96902788b87a9b1e0a141ccff/ultimate_sitemap_parser-0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5479eb21fc1626a54642dc06ae9613de", "sha256": "806e723eeb0293c38e111822d651e987b1494ae9c08be82e73172ade667418a6" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "5479eb21fc1626a54642dc06ae9613de", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5", "size": 23208, "upload_time": "2019-07-31T11:15:46", "url": "https://files.pythonhosted.org/packages/ee/58/a6394d980bda84c44b442a3bab5ceb49626d01d4b17fbc7fe6d41b90c496/ultimate_sitemap_parser-0.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "362e6e5d4b993d6e89eb4a259ccd029e", "sha256": "9825fefcdf515e2748addc7ec5dcdb6430dfdd4ef5de4a54e39de1e7613d0ece" }, "downloads": -1, "filename": "ultimate_sitemap_parser-0.5.tar.gz", "has_sig": false, "md5_digest": "362e6e5d4b993d6e89eb4a259ccd029e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 20229, "upload_time": "2019-07-31T11:15:47", "url": "https://files.pythonhosted.org/packages/21/44/04eada3b1b1f825eb18b93e385ff652778c96902788b87a9b1e0a141ccff/ultimate_sitemap_parser-0.5.tar.gz" } ] }