{ "info": { "author": "Grant Atkins", "author_email": "gatkins@cs.odu.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Topic :: Text Processing :: Markup :: HTML" ], "description": "# Top-news-selectors (tns)\n\nA static HTML site parser for parsing the top story titles and URIs for the following websites:\n\n- https://www.washingtonpost.com/\n- http://www.foxnews.com\n- http://abcnews.go.com/\n- https://www.nytimes.com/\n- https://www.usatoday.com/\n- https://www.cbsnews.com/\n- http://www.chicagotribune.com/\n- https://www.nbcnews.com/\n- http://www.latimes.com/\n- https://www.npr.org/\n- https://www.wsj.com/\n\nThis parser is built on [Beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) based on the CSS selectors found inside the respective HTML documents.\nThese selectors were chosen based on the selectors that were present during the 30 days for the month of **November 2016**.\n\nThis package has not been tested beyond the month of November 2016, 11/01/2016 - 11/30/2016.\nUse at your own risk when going beyond that range.\n\n## Install\n\nThis package is available via pip:\n\n```\n$ pip install tns\n```\n\n## Usage\n\n```python\n>>> from tns import SiteParser\n>>> html = open('npr.html').read()\n>>> parser = SiteParser(html)\n>>> parser.npr()\n{\n'hero_text': \"foo\",\n'hero_link': 'http://bar.com',\n'headlines': [{'splash_title': 'baz', 'link': 'http://qux.com'}]\n}\n>>> parser.washingtonpost()\n{'hero_text': '', 'hero_link': '', 'headlines': []}\n```\n\nThe keys `hero_text` and `hero_link` refer to the top most identified news post, often identified by a enlarged picture or text across the top of the page.\nHeadlines refer to the subsequent headlines after the hero article, where `splash_title` refers to the title found on the home page of the site not the actual title of the article.\n\nYou can see that the second function call with parser, `parser.washingtonpost()`, has zero entries because the document passed to SiteParser was not intended for that parser.\n\nEach of the sites are paired to a function:\n\n```python\n\"http://abcnews.go.com/\": parser.abcnews()\n\"https://www.cbsnews.com/\": parser.cbsnews()\n\"https://www.nbcnews.com/\": parser.nbcnews()\n\"https://www.washingtonpost.com/\": parser.washingtonpost()\n\"https://www.npr.org/\": parser.npr()\n\"http://www.latimes.com/\": parser.latimes()\n\"https://www.usatoday.com/\": parser.usatoday()\n\"https://www.wsj.com/\": parser.wsj()\n\"https://www.nytimes.com/\": parser.nytimes()\n\"http://www.foxnews.com\": parser.foxnews()\n\"http://www.chicagotribune.com/\": parser.chicagotribune()\n```\n\n## Debugging\n\nWhen the parser fails to find specific headlines or returns no headlines at all, this could be due to:\n\n- Iframes loading content dynamically\n- Headlines being injected via Javascript\n- The wrong parser is being used for the specified site\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/oduwsdl/top-news-selectors", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/oduwsdl/top-news-selectors", "keywords": "html web tns odu memento", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "tns", "package_url": "https://pypi.org/project/tns/", "platform": "", "project_url": "https://pypi.org/project/tns/", "project_urls": { "Download": "https://github.com/oduwsdl/top-news-selectors", "Homepage": "https://github.com/oduwsdl/top-news-selectors" }, "release_url": "https://pypi.org/project/tns/1.0.0/", "requires_dist": [ "beautifulsoup4" ], "requires_python": "", "summary": "Top News Selectors (tns): Top news parsing from select websites", "version": "1.0.0" }, "last_serial": 3549500, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "8229415658d549f3377dfd0acc435867", "sha256": "a745237ef5a5662af7b9d3e0299a81c36afaeb85718982a8bec1b67c01cec0bf" }, "downloads": -1, "filename": "tns-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8229415658d549f3377dfd0acc435867", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7388, "upload_time": "2018-02-04T02:03:41", "url": "https://files.pythonhosted.org/packages/76/61/64cc2486f92de0051560147b8df7b2b8ec4469e72096b6c9476a4419d788/tns-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c7f5d5691175ec6e40bf86126a1616fd", "sha256": "7bd90549e71231a95675b2d76760bfc43ca9fb4b335fd762db8ddef17a51f4c9" }, "downloads": -1, "filename": "tns-1.0.0.tar.gz", "has_sig": false, "md5_digest": "c7f5d5691175ec6e40bf86126a1616fd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5297, "upload_time": "2018-02-04T02:03:42", "url": "https://files.pythonhosted.org/packages/36/a8/d19acc83571840969a39fb8e18c1f8b444755bbebd7afb81c191d0787228/tns-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8229415658d549f3377dfd0acc435867", "sha256": "a745237ef5a5662af7b9d3e0299a81c36afaeb85718982a8bec1b67c01cec0bf" }, "downloads": -1, "filename": "tns-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8229415658d549f3377dfd0acc435867", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7388, "upload_time": "2018-02-04T02:03:41", "url": "https://files.pythonhosted.org/packages/76/61/64cc2486f92de0051560147b8df7b2b8ec4469e72096b6c9476a4419d788/tns-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c7f5d5691175ec6e40bf86126a1616fd", "sha256": "7bd90549e71231a95675b2d76760bfc43ca9fb4b335fd762db8ddef17a51f4c9" }, "downloads": -1, "filename": "tns-1.0.0.tar.gz", "has_sig": false, "md5_digest": "c7f5d5691175ec6e40bf86126a1616fd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5297, "upload_time": "2018-02-04T02:03:42", "url": "https://files.pythonhosted.org/packages/36/a8/d19acc83571840969a39fb8e18c1f8b444755bbebd7afb81c191d0787228/tns-1.0.0.tar.gz" } ] }