{ "info": { "author": "Colin Carroll", "author_email": "ccarroll@mit.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6" ], "description": "===========\nFeed Seeker\n===========\n*It slant rhymes with \"heat seeker\"*\n\n|Build Status| |Coverage|\n\nA library for finding atom, rss, rdf, and xml feeds from web pages. Produced at the `mediacloud `_ project. An incremental improvement over `feedfinder2 `_, which was itself based on `feedfinder `_, written by Mark Pilgrim, and maintained by Aaron Swartz until his untimely death. \n\n\nInstallation\n------------\n\nThe library is available on `PyPI `_:\n\n.. code-block:: bash\n\n pip install feed_seeker\n\nQuickstart\n----------\nBy default, the library uses :code:`requests` to grab html and inspect it and find the most\nlikely feed url:\n\n.. code-block:: python\n\n from feed_seeker import find_feed_url\n\n >>> find_feed_url('https://github.com/mitmedialab/feed_seeker') \n 'https://github.com/mitmedialab/feed_seeker/commits/master.atom'\n\n\nTo do a more thorough search, use :code:`generate_feed_urls`, which returns more likely candidates first.\n\n.. code-block:: python\n\n from feed_seeker import generate_feed_urls\n\n >>> for url in generate_feed_urls('https://xkcd.com'):\n ... print(url)\n ... \n https://xkcd.com/atom.xml\n https://xkcd.com/rss.xml\n\n\nFor the most thorough search, add a :code:`spider` argument to do depth-first spidering of urls on the same hostname. Note the below call takes nearly four minutes, compared to 0.5 seconds for :code:`find_feed_url`.\n\n\n.. code-block:: python\n\n >>> for url in generate_feed_urls('https://github.com/mitmedialab/feed_seeker', spider=1):\n ... print(url)\n ... \n\thttps://github.com/mitmedialab/feed_seeker/commits/master.atom,\n\thttps://github.com/mitmedialab/feed_seeker/commits/95cf320796c487df8b70f9c42281d8f26452cc31.atom,\n\thttps://github.com/mitmedialab/feed_seeker/commits/3e93490cb91f7652325c2fe41ef29a5be4558d6a.atom,\n\thttps://github.com/mitmedialab/feed_seeker/commits/659311b8853c4c4a67e3b4bc67a78461d825a064.atom,\n\thttps://github.com/mitmedialab/feed_seeker/commits/a8f7b86eac2cedd9209ac5d2ddcceb293d2404c9.atom,\n\thttps://github.com/index.atom,\n\thttps://github.com/articles.atom,\n\thttps://github.com/dfm/feedfinder2/commits/master.atom,\n\thttps://github.com/blog.atom,\n\thttps://github.com/blog/all.atom,\n\thttps://github.com/blog/broadcasts.atom,\n\thttps://github.com/ColCarroll.atom\n\nIn a hurry?\n-----------\n\nIf you have a long list of urls, you might want to set a timeout with :code:`max_time`:\n\n.. code-block:: python\n\n\t>>> for url in ('https://httpstat.us/200?sleep=5000', 'https://github.com/mitmedialab/feed_seeker'):\n\t ... try:\n\t ... print('found feed:\\t{}'.format(find_feed_url(url, max_time=3)))\n\t ... except TimeoutError:\n\t ... print('skipping {}'.format(url))\n\t skipping https://httpstat.us/200?sleep=5000\n found feed:\thttps://github.com/mitmedialab/feed_seeker/commits/master.atom\n\n\nDifferences with :code:`feedfinder2`\n====================================\nThe biggest difference is that all functions are implemented as generators, and are evaluated lazily. Candidate feed links are actually accessed and inspected to determine whether or not they are a feed, which can be quite time consuming. We expose a function to find the most likely feed link, and another to lazily generate links in rough order from most prominent to least.\n\nThere are also a few more heuristics based on our experience at `mediacloud `_.\n\n.. |Build Status| image:: https://travis-ci.org/mitmedialab/feed_seeker.png?branch=master\n :target: https://travis-ci.org/mitmedialab/feed_seeker\n.. |Coverage| image:: https://coveralls.io/repos/github/mitmedialab/feed_seeker/badge.svg?branch=master\n :target: https://coveralls.io/github/mitmedialab/feed_seeker?branch=master\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mitmedialab/feed_seeker", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "feed-seeker", "package_url": "https://pypi.org/project/feed-seeker/", "platform": "", "project_url": "https://pypi.org/project/feed-seeker/", "project_urls": { "Homepage": "https://github.com/mitmedialab/feed_seeker" }, "release_url": "https://pypi.org/project/feed-seeker/1.0.0/", "requires_dist": [ "beautifulsoup4 (>=4.6.0)", "lxml (>=4.1.1)", "requests (>=2.18.4)" ], "requires_python": "", "summary": "Extract rss, atom, and other feeds from webpages", "version": "1.0.0" }, "last_serial": 4694813, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "660b7254fd3b59324edca81f318cdd0b", "sha256": "b90ac6a0b25e94df865d24dc04d88e9ccbda708405b9ff41a1ff6bbe257e2c76" }, "downloads": -1, "filename": "feed_seeker-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "660b7254fd3b59324edca81f318cdd0b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9634, "upload_time": "2018-01-17T23:48:48", "url": "https://files.pythonhosted.org/packages/31/8d/136edb21436b54c56224aebc6cfdf62ac2b0d1e66ee70abbcfab385e387d/feed_seeker-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf9db97133cf26bb6bef25ff96113fb7", "sha256": "a70e38e72e7c4aa69101d7d00f61060dcf756ae48d261530a482f4a55ab61ded" }, "downloads": -1, "filename": "feed_seeker-0.0.1.tar.gz", "has_sig": false, "md5_digest": "bf9db97133cf26bb6bef25ff96113fb7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8641, "upload_time": "2018-01-17T23:48:50", "url": "https://files.pythonhosted.org/packages/cb/0e/17b911821c6406ae66aace1c7ab7fa163cf0487e57ef6fc118c8a03cdcef/feed_seeker-0.0.1.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "25d07d4a9b175577dcd09c12acc45a32", "sha256": "0fb4769d9081dd64b6262b4bc362a9b651b0e61c5adc045d3b27e9f2654656e7" }, "downloads": -1, "filename": "feed_seeker-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "25d07d4a9b175577dcd09c12acc45a32", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9580, "upload_time": "2018-01-18T20:51:30", "url": "https://files.pythonhosted.org/packages/c9/9a/f415118b580c4c759e637baa94c062ee8f9fc624fa3e6c383e020450eb0c/feed_seeker-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "02d874a653cd60eb6eeb21006d9dff03", "sha256": "9bb3bd705a21772124013b74b5012abd6558f641c4a4c267daa0196e75dbdb88" }, "downloads": -1, "filename": "feed_seeker-1.0.0.tar.gz", "has_sig": false, "md5_digest": "02d874a653cd60eb6eeb21006d9dff03", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8840, "upload_time": "2018-01-18T20:51:31", "url": "https://files.pythonhosted.org/packages/fa/5e/ec1666a581b15829bbf2d4f83013e47f8ad89d99de82f905c73f13befe15/feed_seeker-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "25d07d4a9b175577dcd09c12acc45a32", "sha256": "0fb4769d9081dd64b6262b4bc362a9b651b0e61c5adc045d3b27e9f2654656e7" }, "downloads": -1, "filename": "feed_seeker-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "25d07d4a9b175577dcd09c12acc45a32", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9580, "upload_time": "2018-01-18T20:51:30", "url": "https://files.pythonhosted.org/packages/c9/9a/f415118b580c4c759e637baa94c062ee8f9fc624fa3e6c383e020450eb0c/feed_seeker-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "02d874a653cd60eb6eeb21006d9dff03", "sha256": "9bb3bd705a21772124013b74b5012abd6558f641c4a4c267daa0196e75dbdb88" }, "downloads": -1, "filename": "feed_seeker-1.0.0.tar.gz", "has_sig": false, "md5_digest": "02d874a653cd60eb6eeb21006d9dff03", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8840, "upload_time": "2018-01-18T20:51:31", "url": "https://files.pythonhosted.org/packages/fa/5e/ec1666a581b15829bbf2d4f83013e47f8ad89d99de82f905c73f13befe15/feed_seeker-1.0.0.tar.gz" } ] }