{
"info": {
"author": "Colin Carroll",
"author_email": "ccarroll@mit.edu",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6"
],
"description": "Date Guesser\n============\n\n|Build Status| |Coverage| \n\nA library to extract a publication date from a web page, along with a measure of the accuracy.\nThis was produced as a part of the `mediacloud project `_, in order to accurately extract dates from content. \n\nInstallation\n------------\n\nThe library is available `on PyPI `_, and may be installed with \n\n.. code-block:: bash\n\n pip install date_guesser\n\nQuickstart\n----------\nThe date guesser uses both the url and the html to work, and uses some heuristics to decide which of many possible dates might be the best one.\n\n.. code-block:: python\n\n from date_guesser import guess_date, Accuracy\n\n # Uses url slugs when available\n guess = guess_date(url='https://www.nytimes.com/2017/10/13/some_news.html', \n html='')\n\n # Returns a Guess object with three properties\n guess.date # datetime.datetime(2017, 10, 13, 0, 0, tzinfo=)\n guess.accuracy # Accuracy.DATE\n guess.method # 'Found /2017/10/13/ in url'\n\nIn case there are two trustworthy sources of dates, :code:`date_guesser` prefers the more accurate one\n\n.. code-block:: python\n\n html = ''' \n \n \n '''\n guess = guess_date(url='https://www.nytimes.com/2017/10/some_news.html',\n html=html)\n guess.date # datetime.datetime(2017, 10, 13, 4, 56, 54, tzinfo=tzoffset(None, -14400))\n guess.accuracy is Accuracy.DATETIME # True\n\nBut :code:`date_guesser` is not led astray by more accurate, less trustworthy sources of information\n\n.. code-block:: python\n\n html = ''' \n \n \n '''\n guess = guess_date(url='https://www.nytimes.com/2017/10/some_news.html',\n html=html)\n guess.date # datetime.datetime(2017, 10, 15, 0, 0, tzinfo=)\n guess.accuracy is Accuracy.PARTIAL # True \n\n\nFuture Work\n-----------\n\nLanguages\n^^^^^^^^^\n\nThe code does quite poorly on foreign news sources. This page is Ukranian and has a date on it that \na non-Ukranian could identify, but it is not extracted:\n\n.. code-block:: python\n\n import requests\n\n guess = guess_date(url='https://www.dw.com/uk/\u043a\u043e\u043c\u0435\u043d\u0442\u0430\u0440-\u043d\u0430\u0446\u0456\u043e\u043d\u0430\u043b\u0456\u0437\u043c-\u0440\u043e\u0434\u043e\u043c-\u0437\u0456-\u0441\u0445\u0456\u0434\u043d\u043e\u0457-\u0454\u0432\u0440\u043e\u043f\u0438/a-42081385',\n html=requests.get(url).text)\n guess.date # None\n guess.accuracy is Accuracy.NONE # True\n guess.method == 'Did not find anything' # True\n\n\nReckless Mode\n^^^^^^^^^^^^^\n\nWe keep track of the accuracy of extracted dates, but we do not keep track of the confidence of extracted \ndates being accurate. This may be a way to do more tuning given a particular use case. For example, one\nstrategy we do *not* employ is a regex for all the date patterns we recognize, since that was far too\nerror-prone. Such an approach might be preferable to returning :code:`None` in certain cases.\n\n\nPerformance\n-----------\nWe benchmarked the accuracy against the wonderful :code:`newspaper` library, using one hundred urls gathered from each of four very different topics in the :code:`mediacloud` system. This includes blogs and news articles, as well as many urls that have no date (in which case a guess is marked correct only if it returns :code:`None`). \n\nVaccines\n^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **57** | 48 |\n+---------+--------------+------------+\n| 7 days | **61** | 51 |\n+---------+--------------+------------+\n| 15 days | **66** | 53 |\n+---------+--------------+------------+\n\nAadhar Card in India\n^^^^^^^^^^^^^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **73** | 44 |\n+---------+--------------+------------+\n| 7 days | **74** | 44 |\n+---------+--------------+------------+\n| 15 days | **74** | 44 |\n+---------+--------------+------------+\n\nDonald Trump in 2017\n^^^^^^^^^^^^^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **79** | 60 |\n+---------+--------------+------------+\n| 7 days | **83** | 61 |\n+---------+--------------+------------+\n| 15 days | **85** | 61 |\n+---------+--------------+------------+\n\nRecipes for desserts and chocolate\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n+---------+--------------+------------+\n| | date_guesser | newspaper |\n+=========+==============+============+\n| 1 days | **83** | 65 |\n+---------+--------------+------------+\n| 7 days | **85** | 69 |\n+---------+--------------+------------+\n| 15 days | **87** | 69 |\n+---------+--------------+------------+\n\n\n\n.. |Build Status| image:: https://travis-ci.org/mitmedialab/date_guesser.png?branch=master\n :target: https://travis-ci.org/mitmedialab/date_guesser\n.. |Coverage| image:: https://coveralls.io/repos/github/mitmedialab/date_guesser/badge.svg?branch=master\n :target: https://coveralls.io/github/mitmedialab/date_guesser?branch=master\n\n\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/mitmedialab/date_guesser",
"keywords": "",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "date-guesser",
"package_url": "https://pypi.org/project/date-guesser/",
"platform": "",
"project_url": "https://pypi.org/project/date-guesser/",
"project_urls": {
"Homepage": "https://github.com/mitmedialab/date_guesser"
},
"release_url": "https://pypi.org/project/date-guesser/2.1.4/",
"requires_dist": [
"arrow (>=0.12.0)",
"beautifulsoup4 (>=4.6.0)",
"lxml (>=4.1.1)",
"pytz (>=2017.3)"
],
"requires_python": "",
"summary": "Extract publication dates from web pages",
"version": "2.1.4"
},
"last_serial": 5672687,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "f340b440384160858dcfc66cb8ad3621",
"sha256": "4756aeb682f5075ef334b20b37374ce1d298de2eda1a1235f0bb12ff7b73ae97"
},
"downloads": -1,
"filename": "date_guesser-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f340b440384160858dcfc66cb8ad3621",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11737,
"upload_time": "2018-01-16T15:46:19",
"url": "https://files.pythonhosted.org/packages/83/bc/bc8ade8ff3ec93f6b22b55158f44d86c8fd1178b960e6ceaeb44fae4f996/date_guesser-0.0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "9ef3f9b4808f4a618f847f7cbfc5a37a",
"sha256": "77d0949f155cdfb21625675f97631a4d234b8942cc84349c51181029c64ad2a3"
},
"downloads": -1,
"filename": "date_guesser-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "9ef3f9b4808f4a618f847f7cbfc5a37a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11379,
"upload_time": "2018-01-16T15:46:23",
"url": "https://files.pythonhosted.org/packages/9a/14/d3a060d453b1c6a13766d52dc6b8fc535b7820489ad4fe600a4a1c5c9596/date_guesser-0.0.1.tar.gz"
}
],
"1.0.0": [
{
"comment_text": "",
"digests": {
"md5": "55b7db4e538fa6a66063c50d394d63ba",
"sha256": "d3d830e2a7ef0ada8d9ef4f1746a69560117d80abf004ae71c302d965202240e"
},
"downloads": -1,
"filename": "date_guesser-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "55b7db4e538fa6a66063c50d394d63ba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11033,
"upload_time": "2018-01-16T19:30:15",
"url": "https://files.pythonhosted.org/packages/d1/cd/5f2fd6e601b48b52ba2a1715afe2c410d1e0c3479acb98cbb8c8ea2ad352/date_guesser-1.0.0-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "bfde2bcac714eb69ccad069457b67fd3",
"sha256": "f6389bf9b218871605a00ccfd70c43727a8564b5f3dea90058c28a48be0cb602"
},
"downloads": -1,
"filename": "date_guesser-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "bfde2bcac714eb69ccad069457b67fd3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10638,
"upload_time": "2018-01-16T19:30:16",
"url": "https://files.pythonhosted.org/packages/84/ab/e3b2e1fae0e9cbca0e4809b4678177322a6f24b9780ed4b743cfc65c3efc/date_guesser-1.0.0.tar.gz"
}
],
"1.1.0": [
{
"comment_text": "",
"digests": {
"md5": "2bb4cb30673d5861c7dcb0b598677ff5",
"sha256": "fabcfacd648b973e12bf366c9c8adf25deb4e1d45a5d7874dd8a533452715569"
},
"downloads": -1,
"filename": "date_guesser-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2bb4cb30673d5861c7dcb0b598677ff5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12085,
"upload_time": "2018-01-16T20:42:37",
"url": "https://files.pythonhosted.org/packages/2f/a7/9026a624d1d98720736b440e524fe0110bb3c8f0c19209d9f7b50704f790/date_guesser-1.1.0-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "c4a1ee636c646a7c20c90d2faf7b307b",
"sha256": "00f5868e5f1b4ad7a1be48f066636b083d73f7941b7cd59f157879e26d9f6fc6"
},
"downloads": -1,
"filename": "date_guesser-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "c4a1ee636c646a7c20c90d2faf7b307b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11775,
"upload_time": "2018-01-16T20:42:39",
"url": "https://files.pythonhosted.org/packages/a7/b3/38b3147b96d4108923d203b4e65975c2bcf1a6b3f81f6120e01827c8128c/date_guesser-1.1.0.tar.gz"
}
],
"2.0.0": [
{
"comment_text": "",
"digests": {
"md5": "b2311e54e293bf7df873127a44a5a189",
"sha256": "203d5b4e3d3ed57505b9f4c23cf3133d32eb9ec5b55af117e3b824a911f37488"
},
"downloads": -1,
"filename": "date_guesser-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b2311e54e293bf7df873127a44a5a189",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12246,
"upload_time": "2018-01-25T03:40:26",
"url": "https://files.pythonhosted.org/packages/98/25/547213f17e48bf9b17414039229ea1d2aab895b29fc1de5eeb90601d0d99/date_guesser-2.0.0-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "c2eced8fa868cdf36d290f1a371d68b1",
"sha256": "8848dc40352735b1c54665107f44f684134d6b0e2d32d62b00aa2f347e09dd62"
},
"downloads": -1,
"filename": "date_guesser-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "c2eced8fa868cdf36d290f1a371d68b1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11928,
"upload_time": "2018-01-25T03:40:28",
"url": "https://files.pythonhosted.org/packages/51/72/506a23f1eeeeb82935f34097193689fbeb000747dc8f6b9ab2af80c779be/date_guesser-2.0.0.tar.gz"
}
],
"2.1.0": [
{
"comment_text": "",
"digests": {
"md5": "82908b9227b1c8568d14ee836c366057",
"sha256": "d462ef4979566413d4c98ecb1b5cbd10658ee727f36ebb34d2acd954cc2701a0"
},
"downloads": -1,
"filename": "date_guesser-2.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "82908b9227b1c8568d14ee836c366057",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12294,
"upload_time": "2018-01-27T16:27:08",
"url": "https://files.pythonhosted.org/packages/c6/f6/f958a5dfb882fbf44a4726081b585afda8f25e09d37c0d5dc562778cbc79/date_guesser-2.1.0-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "e28bd0c37a051bab02af9534ecf51c3c",
"sha256": "d960ab90064be7f3e0a57b30fd15b025635de30b07a02071051ce681bb42b40a"
},
"downloads": -1,
"filename": "date_guesser-2.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e28bd0c37a051bab02af9534ecf51c3c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12182,
"upload_time": "2018-01-27T16:27:10",
"url": "https://files.pythonhosted.org/packages/4f/d3/3b6309fc9c4fad7056b026d70a6093349066ca4fcddb0347ff5fdcfc57a0/date_guesser-2.1.0.tar.gz"
}
],
"2.1.1": [
{
"comment_text": "",
"digests": {
"md5": "82934914acd1de0ef0074decf8462f9d",
"sha256": "1947a0ee8fa0f2ad2d5950f0470939bc8cae6364b13b535f6ec816d63181b2b0"
},
"downloads": -1,
"filename": "date_guesser-2.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "82934914acd1de0ef0074decf8462f9d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 12290,
"upload_time": "2018-01-27T17:59:36",
"url": "https://files.pythonhosted.org/packages/81/62/9472d4d93b5fbef6143dd4295586c29d01d00a6b228fa6689255225d7d25/date_guesser-2.1.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "2c3b7e0b7099288f70db16403c611e1a",
"sha256": "fc91643d667ce7ebcf0f0fe42b97459adab6d7c6d11976b3c9b51a1b56a623cb"
},
"downloads": -1,
"filename": "date_guesser-2.1.1.tar.gz",
"has_sig": false,
"md5_digest": "2c3b7e0b7099288f70db16403c611e1a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12164,
"upload_time": "2018-01-27T17:59:36",
"url": "https://files.pythonhosted.org/packages/38/41/70ddf579bb8627111bbb0c64af6d62da3c71beacfee84ffb61960178db18/date_guesser-2.1.1.tar.gz"
}
],
"2.1.2": [
{
"comment_text": "",
"digests": {
"md5": "8ca670cae83d4482d766efd053dda517",
"sha256": "f2d8b1b3c32de0c1ebf3b9b234ff49f2533092722c5d3d7d1aaca7c868723ee6"
},
"downloads": -1,
"filename": "date_guesser-2.1.2-py2.7.egg",
"has_sig": false,
"md5_digest": "8ca670cae83d4482d766efd053dda517",
"packagetype": "bdist_egg",
"python_version": "2.7",
"requires_python": null,
"size": 17692,
"upload_time": "2019-08-02T22:12:36",
"url": "https://files.pythonhosted.org/packages/b0/67/08f9f1ae7623eda8111a0aaac40323cb1d4c5d90d2c9a1a2fbf8dd8e5672/date_guesser-2.1.2-py2.7.egg"
}
],
"2.1.3": [
{
"comment_text": "",
"digests": {
"md5": "308bdda12a984a6384e4d741960b56bd",
"sha256": "e31d7e86069ee8d177fbe313528d911c8b2534824c784a46f48626018a45d75f"
},
"downloads": -1,
"filename": "date_guesser-2.1.3-py2-none-any.whl",
"has_sig": false,
"md5_digest": "308bdda12a984a6384e4d741960b56bd",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 12271,
"upload_time": "2019-08-02T22:12:34",
"url": "https://files.pythonhosted.org/packages/db/20/1505fc30e9041e12194c2be5cc312898557bb055d4928da2966b4b3aeaa4/date_guesser-2.1.3-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "01cd7fcc80f7a4dd568b78431a4ac8b1",
"sha256": "45c72def6f359db301ae6d77516b7ffc4df0d2a5617ad56ab88e7acb25ab51ac"
},
"downloads": -1,
"filename": "date_guesser-2.1.3.tar.gz",
"has_sig": false,
"md5_digest": "01cd7fcc80f7a4dd568b78431a4ac8b1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11582,
"upload_time": "2019-08-02T22:12:37",
"url": "https://files.pythonhosted.org/packages/7c/49/b310d1b262cb9f2783013f5b2d1b8651a2ff68775810c24bb1ab21713211/date_guesser-2.1.3.tar.gz"
}
],
"2.1.4": [
{
"comment_text": "",
"digests": {
"md5": "dcc2caa8244a6b0cf621be46e51d9746",
"sha256": "18ae2bd52ba4201c093f26822d702c92b610212f5aa2aeb4bc381b96193599cf"
},
"downloads": -1,
"filename": "date_guesser-2.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dcc2caa8244a6b0cf621be46e51d9746",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 10322,
"upload_time": "2019-08-13T16:48:23",
"url": "https://files.pythonhosted.org/packages/73/40/e7936042280e0c648acb84ced42b500f28865f1ffc81a842753a5ffd067b/date_guesser-2.1.4-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "a73b347409f6fb00fe338dbe8d77326a",
"sha256": "4ad354f447a2c4f4bd65d1882baf9c0aad0bf84b5ec3324bf936c736d095bb93"
},
"downloads": -1,
"filename": "date_guesser-2.1.4.tar.gz",
"has_sig": false,
"md5_digest": "a73b347409f6fb00fe338dbe8d77326a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11683,
"upload_time": "2019-08-13T16:48:27",
"url": "https://files.pythonhosted.org/packages/ba/3b/1dc91e03e58697e0167145f7f738047105e6901d65072994eff0d8e1980a/date_guesser-2.1.4.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "dcc2caa8244a6b0cf621be46e51d9746",
"sha256": "18ae2bd52ba4201c093f26822d702c92b610212f5aa2aeb4bc381b96193599cf"
},
"downloads": -1,
"filename": "date_guesser-2.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dcc2caa8244a6b0cf621be46e51d9746",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 10322,
"upload_time": "2019-08-13T16:48:23",
"url": "https://files.pythonhosted.org/packages/73/40/e7936042280e0c648acb84ced42b500f28865f1ffc81a842753a5ffd067b/date_guesser-2.1.4-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "a73b347409f6fb00fe338dbe8d77326a",
"sha256": "4ad354f447a2c4f4bd65d1882baf9c0aad0bf84b5ec3324bf936c736d095bb93"
},
"downloads": -1,
"filename": "date_guesser-2.1.4.tar.gz",
"has_sig": false,
"md5_digest": "a73b347409f6fb00fe338dbe8d77326a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11683,
"upload_time": "2019-08-13T16:48:27",
"url": "https://files.pythonhosted.org/packages/ba/3b/1dc91e03e58697e0167145f7f738047105e6901d65072994eff0d8e1980a/date_guesser-2.1.4.tar.gz"
}
]
}