{ "info": { "author": "Paul Solbach", "author_email": "p@psolbach.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Web Environment", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 3.5", "Topic :: Internet :: WWW/HTTP" ], "description": "# Metadoc\n[![Build Status](https://travis-ci.org/psolbach/metadoc.svg?branch=master)](https://travis-ci.org/psolbach/metadoc)\n[![Coverage Status](https://coveralls.io/repos/github/psolbach/metadoc/badge.svg?branch=master)](https://coveralls.io/github/psolbach/metadoc?branch=master)\n\nMetadoc is a post-truth era news article metadata retrieval service. It does social media activity lookup, source authenticity rating, checksum creation, json-ld and metatag parsing as well as information extraction for named entities, pullquotes, fulltext and other useful things based off of arbitrary article URLs. Also, Metadoc is built to be relatively fast.\n\n## Example\n\nYou just throw it any news article URL, and Metadoc will yield.\n```python\nfrom metadoc import Metadoc\nurl = \"https://theintercept.com/2016/11/17/iphones-secretly-send-call-history-to-apple-security-firm-says\"\nmetadoc = Metadoc(url=url)\nres = metadoc.query()\n```\n=>\n```python\n{\n '__version__': '0.9.0',\n 'authors': ['Kim Zetter'],\n 'canonical_url': 'https://theintercept.com/2016/11/17/iphones-secretly-send-call-history-to-apple-security-firm-says/',\n 'domain': {\n 'credibility': {\n 'fake_confidence': '0.00',\n 'is_blacklisted': False\n },\n 'date_registered': None,\n 'favicon': 'https://logo.clearbit.com/theintercept.com?size=200',\n 'name': 'theintercept.com'},\n 'entities': {\n 'keywords': [\n 'cellebrite',\n 'fbi',\n 'skype',\n 'intercept'\n ...\n ]\n }\n },\n 'image': 'https://theintercept.imgix.net/wp-uploads/sites/1/2016/11/GettyImages-578052668-s.jpg?auto=compress%2Cformat&q=90&fit=crop&w=1200&h=800',\n 'language': 'en',\n 'modified_date': None,\n 'published_date': '2016-11-17T11:00:36+00:00',\n 'scraped_date': '2018-07-10T12:13:46+00:00',\n 'social': [{\n 'metrics': [{\n 'count': 7340, 'label': 'sharecount'\n }],\n 'provider': 'facebook'\n }],\n 'text': {\n 'contenthash': '940a62c70db255b4aec378529ae7a2c8',\n 'fulltext': 'a guardian of user privacy this year after fighting FBI\n demands to help crack into San Bernardino shooter Syed ...',\n 'reading_time': 439,\n 'summary': 'Your call logs get sent to Apple\u2019s servers whenever iCloud is on \u2014 something Apple does not disclose.'\n },\n 'title': 'iPhones Secretly Send Call\\xa0History to Apple, Security Firm Says',\n 'url': 'https://theintercept.com/2016/11/17/iphones-secretly-send-call-history-to-apple-security-firm-says'\n}\n```\n\n## Trustworthiness Check\nMetadoc does a basic background check on article sources. This means a simple blacklist-lookup via `whois` data on the domain. Blacklists taken into account include the controversial [PropOrNot](http://www.propornot.com/p/the-list.html). Thus, only if a domain is found on every blacklist do we spit out a `fake_confidence` of 1. The resulting metadata should be taken with a grain of salt.\n\n## Part-of-speech tagging\nFor speed and simplicity, we decided against `nltk` and instead rely on the Averaged Perceptron as imagined by Matthew Honnibal [@explosion](https://github.com/explosion). The pip install comes pre-trained with a [CoNLL 2000](http://www.cnts.ua.ac.be/conll2000/) training set which works reasonably well to detect proper nouns. Since training is non-deterministic, unwanted stopwords might slip through. If you want to try out other datasets, simply replace `metadoc/extract/data/training_set.txt` with your own and run `metadoc.extract.pos.do_train`.\n\n## Purpose\nThis library is used in the context of a news-related software undertaking called [Praise](https://praise.press). We're building the first social network dedicated to quality journalism recommendations. Synthesizing what we dub \"audience-evaluated content\" with automated metadata. If you're intrigued and might want to work with us, feel free to drop a line to [a@praise.press](a@praise.press). \n\n## Install\nRequires python 3.5.\n\n#### Using pip\n```shell\npip install metadoc\n```\n\n## Develop\n\n#### Mac OS\n```shell\nbrew install python3 libxml2 libxslt libtiff libjpeg webp little-cms2\n```\n#### Ubuntu\n```shell\napt-get install -y python3 libxml2-dev libxslt-dev libtiff-dev libjpeg-dev webp whois\n```\n#### Fedora/Redhat\n```shell\ndnf install libxml2-devel libxslt-devel libtiff-devel libjpeg-devel libjpeg-turbo-devel libwebp whois\n```\n#### Then\n```shell\npip3 install -r requirements-dev.txt\npython serve.py => serving @ 6060\n```\n\n## Test\n```shell\npy.test -v tests\n```\nIf you happen to run into an error with OSX 10.11 concerning a lazy bound library in PIL, \njust remove `/PIL/.dylibs/liblzma.5.dylib`.\n\n## Todo\n* Page concatenation is needed in order to properly calculate wordcount and reading time.\n* Authenticity heuristic with sharecount deviance detection (requires state).\n* ~~Perf: Worst offender is nltk's pos tagger. Roll own w/ Average Perceptron.~~\n* ~~Newspaper's summarize produces pullquotes, fulltext takes a while. Move to libextract?~~\n\n## Contributors\n[Martin Borho](https://github.com/mborho) \n[Paul Solbach](https://github.com/___paul) \n\n---\n\nMeteadoc is a software product of Praise Internet UG, Hamburg. \nMetadoc stems from a pedigree of nice libraries like [goose3](https://github.com/goose3/goose3/tree/master/goose3), [langdetect](https://github.com/Mimino666/langdetect) and [nltk](https://github.com/nltk/nltk). \nMetadoc leans on [this](https://github.com/hankcs/AveragedPerceptronPython) perceptron implementation inspired by Matthew Honnibal. \nMetadoc is a work-in-progress. \n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/praise-internet/metadoc", "keywords": "scraping,metadata,news article", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "metadoc", "package_url": "https://pypi.org/project/metadoc/", "platform": "", "project_url": "https://pypi.org/project/metadoc/", "project_urls": { "Homepage": "https://github.com/praise-internet/metadoc" }, "release_url": "https://pypi.org/project/metadoc/0.10.5/", "requires_dist": [ "aiohttp (==1.1.5)", "bottle (==0.12.10)", "python-dateutil (==2.6.1)", "jmespath (==0.9.0)", "langdetect (==1.0.7)", "goose3 (==3.0.9)", "nltk (==3.2.1)", "numpy (==1.13.3)", "requests (==2.18.4)", "tldextract (==2.0.2)", "whois (==0.7)" ], "requires_python": "", "summary": "Post-truth era news article metadata service.", "version": "0.10.5" }, "last_serial": 4240821, "releases": { "0.10.0": [ { "comment_text": "", "digests": { "md5": "187b2d335f8ad27371f348cd73ed98bf", "sha256": "cb6f1ac5799875f9b8648425fe127eaf3f92f427fb45c2d8aec2067d128fcec2" }, "downloads": -1, "filename": "metadoc-0.10.0.tar.gz", "has_sig": false, "md5_digest": "187b2d335f8ad27371f348cd73ed98bf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 41888284, "upload_time": "2018-09-05T08:51:09", "url": "https://files.pythonhosted.org/packages/70/2f/3dc0fe656f8e9a7b71e8576680010b8f9bd9ce4b48071509616f688bc8a8/metadoc-0.10.0.tar.gz" } ], "0.10.4": [ { "comment_text": "", "digests": { "md5": "77ae8f2f79102c1d9b488240684c25f9", "sha256": "db15036e0441409378601beb1112f349b4508e029047ee137029b714c1840f37" }, "downloads": -1, "filename": "metadoc-0.10.4.tar.gz", "has_sig": false, "md5_digest": "77ae8f2f79102c1d9b488240684c25f9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 41885137, "upload_time": "2018-09-05T10:39:23", "url": "https://files.pythonhosted.org/packages/21/cf/2523913484d6f4ac5a359bc13419afd0e5c223bb52bd0759e23810d3474c/metadoc-0.10.4.tar.gz" } ], "0.10.5": [ { "comment_text": "", "digests": { "md5": "19adfa37482db0719ca99196f3400158", "sha256": "74885374c83aa8a8624f5e3ef4613379757e8287b3a970004eb75e536f7b1b25" }, "downloads": -1, "filename": "metadoc-0.10.5-py3-none-any.whl", "has_sig": false, "md5_digest": "19adfa37482db0719ca99196f3400158", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42631473, "upload_time": "2018-09-05T12:11:28", "url": "https://files.pythonhosted.org/packages/84/08/86fbb4a63b942000a4e29301e02d2830b90f47bfc9e7eabc71a6c5a62ab4/metadoc-0.10.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "113aa2c7ea266bf6ed86d6355c5c8d82", "sha256": "960acab7dc692295a23676f02f584656fb9799ed6dc8354db243ee956f5159a0" }, "downloads": -1, "filename": "metadoc-0.10.5.tar.gz", "has_sig": false, "md5_digest": "113aa2c7ea266bf6ed86d6355c5c8d82", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 41935497, "upload_time": "2018-09-05T12:12:52", "url": "https://files.pythonhosted.org/packages/e8/86/cf35f2e843e8054da3173e0d18faf4cbb1e23fb1bf9c30ade0121e522512/metadoc-0.10.5.tar.gz" } ], "0.2.21": [ { "comment_text": "", "digests": { "md5": "5c559c7d141573dba18fbbe90b631ea1", "sha256": "bb4598b8aac70b36babb9a95f5a9685d75e20bbe44f4b3dbe63dae41a1df9a94" }, "downloads": -1, "filename": "metadoc-0.2.21.tar.gz", "has_sig": false, "md5_digest": "5c559c7d141573dba18fbbe90b631ea1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9807, "upload_time": "2016-12-11T20:42:55", "url": "https://files.pythonhosted.org/packages/f0/77/fa44bc35d8b04dadb06d6d26cb2a96e592a4de3365ea5e3e51445d4a9b12/metadoc-0.2.21.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "5df2f60cd4fc36003656d5b5c1cb6990", "sha256": "8e28bc8fde90eb8165b88d1d1a8eb3ea49ef4d954e7ca4f58850a917d2e2d5ba" }, "downloads": -1, "filename": "metadoc-0.3.1.tar.gz", "has_sig": false, "md5_digest": "5df2f60cd4fc36003656d5b5c1cb6990", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13203, "upload_time": "2016-12-12T20:13:15", "url": "https://files.pythonhosted.org/packages/c0/44/4df720e04e2c3285e071d9d1bed44ee3d11c3105648b4c187dc97b8036c3/metadoc-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "d8ae71c36c8d1f36eb500d0da98307fe", "sha256": "b791e803acaa4ee4f6ed29f02ed11216a867486e539e8457ce3aae886d13e8d0" }, "downloads": -1, "filename": "metadoc-0.3.2.tar.gz", "has_sig": false, "md5_digest": "d8ae71c36c8d1f36eb500d0da98307fe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13214, "upload_time": "2016-12-13T19:36:27", "url": "https://files.pythonhosted.org/packages/46/14/183636b69028243df29d4d2752570b0d9a73c7f2599731f0c087d5e09dd1/metadoc-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "7b459d924580fa135a37e88e94fa0e78", "sha256": "db4df3b86579f4d502ba402e63f1f22c5713729ba84f25cd5dac0139aaff8928" }, "downloads": -1, "filename": "metadoc-0.3.3.tar.gz", "has_sig": false, "md5_digest": "7b459d924580fa135a37e88e94fa0e78", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 717605, "upload_time": "2016-12-14T12:01:06", "url": "https://files.pythonhosted.org/packages/54/5e/120cdddc7c75a907a563cd7519395488083fcd2e1ee0bd629302ab93a975/metadoc-0.3.3.tar.gz" } ], "0.3.4": [ { "comment_text": "", "digests": { "md5": "c92dff9b41d478ccdbdd481619d6b079", "sha256": "3f26d7bb8a29f68bdcac708d466196db9388ad3381b04d4f0daffd97147a7daa" }, "downloads": -1, "filename": "metadoc-0.3.4.tar.gz", "has_sig": false, "md5_digest": "c92dff9b41d478ccdbdd481619d6b079", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 131002, "upload_time": "2016-12-14T12:41:19", "url": "https://files.pythonhosted.org/packages/a0/8d/d1f3ea62f8384d4f64633b5cca1eb1b3302537dd99c624f0d899ec8ab591/metadoc-0.3.4.tar.gz" } ], "0.3.5": [ { "comment_text": "", "digests": { "md5": "02b884e72cd99206eb1c78a74b0b677b", "sha256": "c0241cb92cc2f2af29fecba5e3f0d295d4330a24f5a13aa3568354d2fb7b42dd" }, "downloads": -1, "filename": "metadoc-0.3.5.tar.gz", "has_sig": false, "md5_digest": "02b884e72cd99206eb1c78a74b0b677b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 131117, "upload_time": "2017-01-02T15:28:50", "url": "https://files.pythonhosted.org/packages/12/59/2e3639702a0574e43f4fb581af08277a472307ab0348599e8f9ec213748f/metadoc-0.3.5.tar.gz" } ], "0.3.6": [ { "comment_text": "", "digests": { "md5": "45278ec9291cca53a0a89eaf18b288a9", "sha256": "347dd3248c3bed8fa0ab6aa24695bb8bca5188931862f759a9c7b4d3bf60b106" }, "downloads": -1, "filename": "metadoc-0.3.6.tar.gz", "has_sig": false, "md5_digest": "45278ec9291cca53a0a89eaf18b288a9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 131121, "upload_time": "2017-01-02T15:51:46", "url": "https://files.pythonhosted.org/packages/a8/fc/3f6e89cf5435115d95474f75441f7a781df60712450cef13e890140bad7b/metadoc-0.3.6.tar.gz" } ], "0.9.0": [ { "comment_text": "", "digests": { "md5": "76b5a4038477a5b1cf05517191494f02", "sha256": "9bd216b81779af8ea8555f854a88a1f93e5036ea1e22fd60d19e75af84fc8758" }, "downloads": -1, "filename": "metadoc-0.9.0.tar.gz", "has_sig": false, "md5_digest": "76b5a4038477a5b1cf05517191494f02", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 136726, "upload_time": "2018-07-10T13:26:55", "url": "https://files.pythonhosted.org/packages/20/e1/133993fba73401857ae9397f39c36a9473b2b994f8c0f50113d7559f298e/metadoc-0.9.0.tar.gz" } ], "0.9.1": [ { "comment_text": "", "digests": { "md5": "b3d74cda818cad5572477f45a8161f68", "sha256": "df6f88116fb1366326687a77d0c8405ff43ed31b0b51ccceb61fd586ffc5f34f" }, "downloads": -1, "filename": "metadoc-0.9.1-py3-none-any.whl", "has_sig": false, "md5_digest": "b3d74cda818cad5572477f45a8161f68", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42630548, "upload_time": "2018-07-11T13:47:20", "url": "https://files.pythonhosted.org/packages/f0/36/484ffc0e3be26d3a4a4ff1123e0658fad90291c75a59937c44124669931e/metadoc-0.9.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "820838bda617cb2c2edcb29008850ee9", "sha256": "2bee72cf774bb96f77b7f3985388c7bb49869d90a024a4927b805678d25ffb52" }, "downloads": -1, "filename": "metadoc-0.9.1.tar.gz", "has_sig": false, "md5_digest": "820838bda617cb2c2edcb29008850ee9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 41935572, "upload_time": "2018-07-11T13:47:43", "url": "https://files.pythonhosted.org/packages/db/93/4782321ff00022876ce2146335d3abe82820a17de77e926fdb5e0febba53/metadoc-0.9.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "19adfa37482db0719ca99196f3400158", "sha256": "74885374c83aa8a8624f5e3ef4613379757e8287b3a970004eb75e536f7b1b25" }, "downloads": -1, "filename": "metadoc-0.10.5-py3-none-any.whl", "has_sig": false, "md5_digest": "19adfa37482db0719ca99196f3400158", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42631473, "upload_time": "2018-09-05T12:11:28", "url": "https://files.pythonhosted.org/packages/84/08/86fbb4a63b942000a4e29301e02d2830b90f47bfc9e7eabc71a6c5a62ab4/metadoc-0.10.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "113aa2c7ea266bf6ed86d6355c5c8d82", "sha256": "960acab7dc692295a23676f02f584656fb9799ed6dc8354db243ee956f5159a0" }, "downloads": -1, "filename": "metadoc-0.10.5.tar.gz", "has_sig": false, "md5_digest": "113aa2c7ea266bf6ed86d6355c5c8d82", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 41935497, "upload_time": "2018-09-05T12:12:52", "url": "https://files.pythonhosted.org/packages/e8/86/cf35f2e843e8054da3173e0d18faf4cbb1e23fb1bf9c30ade0121e522512/metadoc-0.10.5.tar.gz" } ] }