{ "info": { "author": "Ivan Begtin", "author_email": "ivan@begtin.tech", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "====================================================\nnewsworker -- automatic news extractor using HTML scraping\n====================================================\n\n.. image:: https://img.shields.io/travis/ivbeg/qddate/master.svg?style=flat-square\n :target: https://travis-ci.org/ivbeg/qddate\n :alt: travis build status\n\n.. image:: https://img.shields.io/pypi/v/qddate.svg?style=flat-square\n :target: https://pypi.python.org/pypi/qddate\n :alt: pypi version\n\n.. image:: https://readthedocs.org/projects/qddate/badge/?version=latest\n :target: http://qddate.readthedocs.org/en/latest/?badge=latest\n :alt: Documentation Status\n\n.. image:: https://codecov.io/gh/scrapinghub/dateparser/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/ivbeg/qddate\n :alt: Code Coverage\n\n.. image:: https://badges.gitter.im/scrapinghub/dateparser.svg\n :alt: Join the chat at https://gitter.im/ivbeg/qddate\n :target: https://gitter.im/ivbeg/qddate?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge\n\n\n`newsworker` is a Python 3 lib that extracts feeds from html pages. It's useful when you need to subscribe to a news\nfrom website that doesn't publish RSS/ATOM feeds and you don't want to use page change monitoring tools since it's not\nso accurate.\n\n\nUsage examples\n---------------\n\nExtract news from html page from EIB website and Bulgarian government website\n\n >>> feed, session = f.get_feed(url=\"http://government.bg/bg/prestsentar/novini\")\n >>> feed\n ...\n\n\n >>> from newsworker.extractor import FeedExtractor\n >>> f = FeedExtractor(filtered_text_length=150)\n >>> feed, session = f.get_feed(url=\"http://www.eib.org/en/index.htm?lang=en\")\n >>> feed\n {'title': 'European Investment Bank (EIB)', 'language': 'en', 'link': 'http://www.eib.org/en/index.htm?lang=en', 'description': 'European Investment Bank (EIB)', 'items': [{'title': 'Blockchain Challenge: coders at the EIB', 'description': 'Blockchain Challenge: coders at the EIB', 'pubdate': datetime.datetime(2018, 6, 18, 0, 0), 'unique_id': 'f9d359f76118076c5331ffec3cdb82eb', 'raw_html': b'
', 'extra': {'links': ['https://www.youtube.com/watch?v=YlKa2LZgxhE?autoplay=1'], 'images': ['http://www.eib.org/img/site/play.png']}, 'link': 'https://www.youtube.com/watch?v=YlKa2LZgxhE?autoplay=1'}, {'title': 'A brighter life for Kenyan women', 'description': 'Jujuy Verde \u00e2\u20ac\u201c new horizons for women waste-pickers in Argentina', 'pubdate': datetime.datetime(2018, 6, 5, 0, 0), 'unique_id': '9caef61535352d2734d122c0e092b011', 'raw_html': b'