{ "info": { "author": "Colin Carroll", "author_email": "ccarroll@mit.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6" ], "description": "Date Guesser\n============\n\n|Build Status| |Coverage| \n\nA library to extract a publication date from a web page, along with a measure of the accuracy.\nThis was produced as a part of the `mediacloud project `_, in order to accurately extract dates from content. \n\nInstallation\n------------\n\nThe library is available `on PyPI `_, and may be installed with \n\n.. code-block:: bash\n\n pip install date_guesser\n\nQuickstart\n----------\nThe date guesser uses both the url and the html to work, and uses some heuristics to decide which of many possible dates might be the best one.\n\n.. code-block:: python\n\n from date_guesser import guess_date, Accuracy\n\n # Uses url slugs when available\n guess = guess_date(url='https://www.nytimes.com/2017/10/13/some_news.html', \n html='')\n\n # Returns a Guess object with three properties\n guess.date # datetime.datetime(2017, 10, 13, 0, 0, tzinfo=)\n guess.accuracy # Accuracy.DATE\n guess.method # 'Found /2017/10/13/ in url'\n\nIn case there are two trustworthy sources of dates, :code:`date_guesser` prefers the more accurate one\n\n.. code-block:: python\n\n html = ''' \n \n \n '''\n guess = guess_date(url='https://www.nytimes.com/2017/10/some_news.html',\n html=html)\n guess.date # datetime.datetime(2017, 10, 13, 4, 56, 54, tzinfo=tzoffset(None, -14400))\n guess.accuracy is Accuracy.DATETIME # True\n\nBut :code:`date_guesser` is not led astray by more accurate, less trustworthy sources of information\n\n.. code-block:: python\n\n html = ''' \n \n \n '''\n guess = guess_date(url='https://www.nytimes.com/2017/10/some_news.html',\n html=html)\n guess.date # datetime.datetime(2017, 10, 15, 0, 0, tzinfo=)\n guess.accuracy is Accuracy.PARTIAL # True \n\n\nFuture Work\n-----------\n\nLanguages\n^^^^^^^^^\n\nThe code does quite poorly on foreign news sources. This page is Ukranian and has a date on it that \na non-Ukranian could identify, but it is not extracted:\n\n.. code-block:: python\n\n import requests\n\n guess = guess_date(url='https://www.dw.com/uk/\u043a\u043e\u043c\u0435\u043d\u0442\u0430\u0440-\u043d\u0430\u0446\u0456\u043e\u043d\u0430\u043b\u0456\u0437\u043c-\u0440\u043e\u0434\u043e\u043c-\u0437\u0456-\u0441\u0445\u0456\u0434\u043d\u043e\u0457-\u0454\u0432\u0440\u043e\u043f\u0438/a-42081385',\n html=requests.get(url).text)\n guess.date # None\n guess.accuracy is Accuracy.NONE # True\n guess.method == 'Did not find anything' # True\n\n\nReckless Mode\n^^^^^^^^^^^^^\n\nWe keep track of the accuracy of extracted dates, but we do not keep track of the confidence of extracted \ndates being accurate. This may be a way to do more tuning given a particular use case. For example, one\nstrategy we do *not* employ is a regex for all the date patterns we recognize, since that was far too\nerror-prone. Such an approach might be preferable to returning :code:`None` in certain cases.\n\n\nPerformance\n-----------\nWe benchmarked the accuracy against the wonderful :code:`newspaper` library, using one hundred urls gathered from each of four very different topics in the :code:`mediacloud` system. This includes blogs and news articles, as well as many urls that have no date (in which case a guess is marked correct only if it returns :code:`None`). \n\nVaccines\n^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **57** | 48 |\n+---------+--------------+------------+\n| 7 days | **61** | 51 |\n+---------+--------------+------------+\n| 15 days | **66** | 53 |\n+---------+--------------+------------+\n\nAadhar Card in India\n^^^^^^^^^^^^^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **73** | 44 |\n+---------+--------------+------------+\n| 7 days | **74** | 44 |\n+---------+--------------+------------+\n| 15 days | **74** | 44 |\n+---------+--------------+------------+\n\nDonald Trump in 2017\n^^^^^^^^^^^^^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **79** | 60 |\n+---------+--------------+------------+\n| 7 days | **83** | 61 |\n+---------+--------------+------------+\n| 15 days | **85** | 61 |\n+---------+--------------+------------+\n\nRecipes for desserts and chocolate\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **83** | 65 |\n+---------+--------------+------------+\n| 7 days | **85** | 69 |\n+---------+--------------+------------+\n| 15 days | **87** | 69 |\n+---------+--------------+------------+\n\n\n\n.. |Build Status| image:: https://travis-ci.org/mitmedialab/date_guesser.png?branch=master\n :target: https://travis-ci.org/mitmedialab/date_guesser\n.. |Coverage| image:: https://coveralls.io/repos/github/mitmedialab/date_guesser/badge.svg?branch=master\n :target: https://coveralls.io/github/mitmedialab/date_guesser?branch=master\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mitmedialab/date_guesser", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "date-guesser", "package_url": "https://pypi.org/project/date-guesser/", "platform": "", "project_url": "https://pypi.org/project/date-guesser/", "project_urls": { "Homepage": "https://github.com/mitmedialab/date_guesser" }, "release_url": "https://pypi.org/project/date-guesser/2.1.4/", "requires_dist": [ "arrow (>=0.12.0)", "beautifulsoup4 (>=4.6.0)", "lxml (>=4.1.1)", "pytz (>=2017.3)" ], "requires_python": "", "summary": "Extract publication dates from web pages", "version": "2.1.4" }, "last_serial": 5672687, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "f340b440384160858dcfc66cb8ad3621", "sha256": "4756aeb682f5075ef334b20b37374ce1d298de2eda1a1235f0bb12ff7b73ae97" }, "downloads": -1, "filename": "date_guesser-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f340b440384160858dcfc66cb8ad3621", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11737, "upload_time": "2018-01-16T15:46:19", "url": "https://files.pythonhosted.org/packages/83/bc/bc8ade8ff3ec93f6b22b55158f44d86c8fd1178b960e6ceaeb44fae4f996/date_guesser-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9ef3f9b4808f4a618f847f7cbfc5a37a", "sha256": "77d0949f155cdfb21625675f97631a4d234b8942cc84349c51181029c64ad2a3" }, "downloads": -1, "filename": "date_guesser-0.0.1.tar.gz", "has_sig": false, "md5_digest": "9ef3f9b4808f4a618f847f7cbfc5a37a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11379, "upload_time": "2018-01-16T15:46:23", "url": "https://files.pythonhosted.org/packages/9a/14/d3a060d453b1c6a13766d52dc6b8fc535b7820489ad4fe600a4a1c5c9596/date_guesser-0.0.1.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "55b7db4e538fa6a66063c50d394d63ba", "sha256": "d3d830e2a7ef0ada8d9ef4f1746a69560117d80abf004ae71c302d965202240e" }, "downloads": -1, "filename": "date_guesser-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "55b7db4e538fa6a66063c50d394d63ba", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11033, "upload_time": "2018-01-16T19:30:15", "url": "https://files.pythonhosted.org/packages/d1/cd/5f2fd6e601b48b52ba2a1715afe2c410d1e0c3479acb98cbb8c8ea2ad352/date_guesser-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bfde2bcac714eb69ccad069457b67fd3", "sha256": "f6389bf9b218871605a00ccfd70c43727a8564b5f3dea90058c28a48be0cb602" }, "downloads": -1, "filename": "date_guesser-1.0.0.tar.gz", "has_sig": false, "md5_digest": "bfde2bcac714eb69ccad069457b67fd3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10638, "upload_time": "2018-01-16T19:30:16", "url": "https://files.pythonhosted.org/packages/84/ab/e3b2e1fae0e9cbca0e4809b4678177322a6f24b9780ed4b743cfc65c3efc/date_guesser-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "2bb4cb30673d5861c7dcb0b598677ff5", "sha256": "fabcfacd648b973e12bf366c9c8adf25deb4e1d45a5d7874dd8a533452715569" }, "downloads": -1, "filename": "date_guesser-1.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "2bb4cb30673d5861c7dcb0b598677ff5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12085, "upload_time": "2018-01-16T20:42:37", "url": "https://files.pythonhosted.org/packages/2f/a7/9026a624d1d98720736b440e524fe0110bb3c8f0c19209d9f7b50704f790/date_guesser-1.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4a1ee636c646a7c20c90d2faf7b307b", "sha256": "00f5868e5f1b4ad7a1be48f066636b083d73f7941b7cd59f157879e26d9f6fc6" }, "downloads": -1, "filename": "date_guesser-1.1.0.tar.gz", "has_sig": false, "md5_digest": "c4a1ee636c646a7c20c90d2faf7b307b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11775, "upload_time": "2018-01-16T20:42:39", "url": "https://files.pythonhosted.org/packages/a7/b3/38b3147b96d4108923d203b4e65975c2bcf1a6b3f81f6120e01827c8128c/date_guesser-1.1.0.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "b2311e54e293bf7df873127a44a5a189", "sha256": "203d5b4e3d3ed57505b9f4c23cf3133d32eb9ec5b55af117e3b824a911f37488" }, "downloads": -1, "filename": "date_guesser-2.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "b2311e54e293bf7df873127a44a5a189", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12246, "upload_time": "2018-01-25T03:40:26", "url": "https://files.pythonhosted.org/packages/98/25/547213f17e48bf9b17414039229ea1d2aab895b29fc1de5eeb90601d0d99/date_guesser-2.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c2eced8fa868cdf36d290f1a371d68b1", "sha256": "8848dc40352735b1c54665107f44f684134d6b0e2d32d62b00aa2f347e09dd62" }, "downloads": -1, "filename": "date_guesser-2.0.0.tar.gz", "has_sig": false, "md5_digest": "c2eced8fa868cdf36d290f1a371d68b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11928, "upload_time": "2018-01-25T03:40:28", "url": "https://files.pythonhosted.org/packages/51/72/506a23f1eeeeb82935f34097193689fbeb000747dc8f6b9ab2af80c779be/date_guesser-2.0.0.tar.gz" } ], "2.1.0": [ { "comment_text": "", "digests": { "md5": "82908b9227b1c8568d14ee836c366057", "sha256": "d462ef4979566413d4c98ecb1b5cbd10658ee727f36ebb34d2acd954cc2701a0" }, "downloads": -1, "filename": "date_guesser-2.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "82908b9227b1c8568d14ee836c366057", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12294, "upload_time": "2018-01-27T16:27:08", "url": "https://files.pythonhosted.org/packages/c6/f6/f958a5dfb882fbf44a4726081b585afda8f25e09d37c0d5dc562778cbc79/date_guesser-2.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e28bd0c37a051bab02af9534ecf51c3c", "sha256": "d960ab90064be7f3e0a57b30fd15b025635de30b07a02071051ce681bb42b40a" }, "downloads": -1, "filename": "date_guesser-2.1.0.tar.gz", "has_sig": false, "md5_digest": "e28bd0c37a051bab02af9534ecf51c3c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12182, "upload_time": "2018-01-27T16:27:10", "url": "https://files.pythonhosted.org/packages/4f/d3/3b6309fc9c4fad7056b026d70a6093349066ca4fcddb0347ff5fdcfc57a0/date_guesser-2.1.0.tar.gz" } ], "2.1.1": [ { "comment_text": "", "digests": { "md5": "82934914acd1de0ef0074decf8462f9d", "sha256": "1947a0ee8fa0f2ad2d5950f0470939bc8cae6364b13b535f6ec816d63181b2b0" }, "downloads": -1, "filename": "date_guesser-2.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "82934914acd1de0ef0074decf8462f9d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12290, "upload_time": "2018-01-27T17:59:36", "url": "https://files.pythonhosted.org/packages/81/62/9472d4d93b5fbef6143dd4295586c29d01d00a6b228fa6689255225d7d25/date_guesser-2.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2c3b7e0b7099288f70db16403c611e1a", "sha256": "fc91643d667ce7ebcf0f0fe42b97459adab6d7c6d11976b3c9b51a1b56a623cb" }, "downloads": -1, "filename": "date_guesser-2.1.1.tar.gz", "has_sig": false, "md5_digest": "2c3b7e0b7099288f70db16403c611e1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12164, "upload_time": "2018-01-27T17:59:36", "url": "https://files.pythonhosted.org/packages/38/41/70ddf579bb8627111bbb0c64af6d62da3c71beacfee84ffb61960178db18/date_guesser-2.1.1.tar.gz" } ], "2.1.2": [ { "comment_text": "", "digests": { "md5": "8ca670cae83d4482d766efd053dda517", "sha256": "f2d8b1b3c32de0c1ebf3b9b234ff49f2533092722c5d3d7d1aaca7c868723ee6" }, "downloads": -1, "filename": "date_guesser-2.1.2-py2.7.egg", "has_sig": false, "md5_digest": "8ca670cae83d4482d766efd053dda517", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 17692, "upload_time": "2019-08-02T22:12:36", "url": "https://files.pythonhosted.org/packages/b0/67/08f9f1ae7623eda8111a0aaac40323cb1d4c5d90d2c9a1a2fbf8dd8e5672/date_guesser-2.1.2-py2.7.egg" } ], "2.1.3": [ { "comment_text": "", "digests": { "md5": "308bdda12a984a6384e4d741960b56bd", "sha256": "e31d7e86069ee8d177fbe313528d911c8b2534824c784a46f48626018a45d75f" }, "downloads": -1, "filename": "date_guesser-2.1.3-py2-none-any.whl", "has_sig": false, "md5_digest": "308bdda12a984a6384e4d741960b56bd", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 12271, "upload_time": "2019-08-02T22:12:34", "url": "https://files.pythonhosted.org/packages/db/20/1505fc30e9041e12194c2be5cc312898557bb055d4928da2966b4b3aeaa4/date_guesser-2.1.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "01cd7fcc80f7a4dd568b78431a4ac8b1", "sha256": "45c72def6f359db301ae6d77516b7ffc4df0d2a5617ad56ab88e7acb25ab51ac" }, "downloads": -1, "filename": "date_guesser-2.1.3.tar.gz", "has_sig": false, "md5_digest": "01cd7fcc80f7a4dd568b78431a4ac8b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11582, "upload_time": "2019-08-02T22:12:37", "url": "https://files.pythonhosted.org/packages/7c/49/b310d1b262cb9f2783013f5b2d1b8651a2ff68775810c24bb1ab21713211/date_guesser-2.1.3.tar.gz" } ], "2.1.4": [ { "comment_text": "", "digests": { "md5": "dcc2caa8244a6b0cf621be46e51d9746", "sha256": "18ae2bd52ba4201c093f26822d702c92b610212f5aa2aeb4bc381b96193599cf" }, "downloads": -1, "filename": "date_guesser-2.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "dcc2caa8244a6b0cf621be46e51d9746", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10322, "upload_time": "2019-08-13T16:48:23", "url": "https://files.pythonhosted.org/packages/73/40/e7936042280e0c648acb84ced42b500f28865f1ffc81a842753a5ffd067b/date_guesser-2.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a73b347409f6fb00fe338dbe8d77326a", "sha256": "4ad354f447a2c4f4bd65d1882baf9c0aad0bf84b5ec3324bf936c736d095bb93" }, "downloads": -1, "filename": "date_guesser-2.1.4.tar.gz", "has_sig": false, "md5_digest": "a73b347409f6fb00fe338dbe8d77326a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11683, "upload_time": "2019-08-13T16:48:27", "url": "https://files.pythonhosted.org/packages/ba/3b/1dc91e03e58697e0167145f7f738047105e6901d65072994eff0d8e1980a/date_guesser-2.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dcc2caa8244a6b0cf621be46e51d9746", "sha256": "18ae2bd52ba4201c093f26822d702c92b610212f5aa2aeb4bc381b96193599cf" }, "downloads": -1, "filename": "date_guesser-2.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "dcc2caa8244a6b0cf621be46e51d9746", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10322, "upload_time": "2019-08-13T16:48:23", "url": "https://files.pythonhosted.org/packages/73/40/e7936042280e0c648acb84ced42b500f28865f1ffc81a842753a5ffd067b/date_guesser-2.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a73b347409f6fb00fe338dbe8d77326a", "sha256": "4ad354f447a2c4f4bd65d1882baf9c0aad0bf84b5ec3324bf936c736d095bb93" }, "downloads": -1, "filename": "date_guesser-2.1.4.tar.gz", "has_sig": false, "md5_digest": "a73b347409f6fb00fe338dbe8d77326a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11683, "upload_time": "2019-08-13T16:48:27", "url": "https://files.pythonhosted.org/packages/ba/3b/1dc91e03e58697e0167145f7f738047105e6901d65072994eff0d8e1980a/date_guesser-2.1.4.tar.gz" } ] }