{ "info": { "author": "Robert Stojnic", "author_email": "robert@atlasml.io", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Natural Language :: English", "Operating System :: POSIX", "Programming Language :: Python :: 3" ], "description": "# Automatic SOTA (state-of-the-art) extraction\n\nAggregate public SOTA tables that are shared under **a free licences**. \n\nDownload the scrapped data or run the scrappers yourself the get the latest data. \n\nIn the future, we are planning to automate the process of extracting tasks, datasets and results from papers. \n\n## Getting the data\n\nThe data is kept in the [data](data/) directory. All data is shared under the [CC-BY-SA-4](https://creativecommons.org/licenses/by-sa/4.0/) licence. \n\nThe data has been parsed into a consistent JSON format, described below. \n\n## JSON format description\n\nThe format consists of five primary data types: `Task`, `Dataset`, `Sota`, `SotaRow` and `Link`. \n\nA valid JSON file is a list of `Task` objects. You can see examples in the [data/tasks](https://github.com/atlasml/sota-extractor/tree/master/data/tasks) folder.\n\n#### `Task`\n\nA `Task` consists of the following fields:\n- `task` - name of the task (string)\n- `description` - short description of the task, in markdown (string)\n- `subtasks` - a list of zero or more `Task` objects that are children to this task (list)\n- `datasets` - a list of zero or more `Dataset` objects on which the tasks are evaluated (list)\n- `source_link` - an optional `Link` object to the original source of the task\n\n#### `Dataset`\n\nA `Dataset` consists of the following fields:\n- `dataset` - name of the dataset (string)\n- `description` - a short description in markdown (string)\n- `subdatasets` - zero or more children `Dataset` objects (e.g. dataset subsets or dataset partitions) (list)\n- `dataset_links` - zero or more `Link` objects, representing the links to the dataset download page or any other relevant external pages (list)\n- `dataset_citations\"` - zero or more `Link` objects, representing the papers that are the primary citations for the dataset. \n- `sota` - the `Sota` object representing the state-of-the-art table on this dataset. \n\n#### `Link`\n\nA `Link` object describes a URL, and has these two fields:\n- `title` - title of the link, i.e. anchor text (string)\n- `url` - target URL (string)\n\n#### `Sota`\n\nA `Sota` object represents one state-of-the-art table, with these fields:\n- `metrics` - a list of metric names used to evaluate the methods (list of strings)\n- `rows` a list of rows in the SOTA table, a list of `SotaRow` objects (list)\n\n#### `SotaRow`\n\nA `SotaRow` object represents one line of the SOTA table, it has these fields:\n- `model_name` - Name of the model evaluated (string)\n- `paper_title` - Primary paper's title (string)\n- `paper_url` - Primary paper's URL (string)\n- `paper_date` - Paper date of publishing, if available (string)\n- `code_links` - a list of zero or more `Link` objects, with links to relevant code implementations (list)\n- `model_links` - a list of zero or more `Link` objects, with links to relevant pretrained model files (list)\n- `metrics` - a dictionary of values, where the keys are string from the parent `Sota.rows` list, and the values are the measured performance. (dictionary)\n\n## Running the scrapers\n\n### Installation\n\nRequires Python 3.6+.\n\n```bash\npip install -r requirements.txt\n```\n\n### NLP-progress\n\n[NLP-progress](https://github.com/sebastianruder/NLP-progress) is a hand-annotated collection of SOTA results from NLP tasks. \n\nThe scraper [is part of the NLP-progress project](https://github.com/sebastianruder/NLP-progress/pull/186).\n\nLicence: MIT\n\n### EFF \n\nEFF has annotated a set of SOTA results on a small number of tasks, and produced this [great report](https://www.eff.org/ai/metrics).\n\nTo convert the current content run:\n\n```bash\npython -m scrapers.eff\n```\n\nLicence: CC-BY-SA-4\n\n### SQuAD\n\nThe [Stanford Question Answering Dataset](https://rajpurkar.github.io/SQuAD-explorer/) is an active project for evaluating the question answering task using a hidden test set. \n\nTo scrape the current content run:\n\n```bash\npython -m scrapers.squad\n```\n\nLicence: CC-BY-SA-4\n\n### RedditSota\n\nThe [RedditSota repository](https://github.com/RedditSota/state-of-the-art-result-for-machine-learning-problems) lists the best method for a variety of tasks across all of ML. \n\nTo scrape the current content run:\n\n```bash\npython -m scrapers.redditsota\n```\n\nLicence: Apache-2\n\n### SNLI\n\nThe [The Stanford Natural Language Inference (SNLI) Corpus](https://nlp.stanford.edu/projects/snli/) is an active project\nfor Natural Language Inference. \n\nTo scrape the current content run:\n\n```bash\npython -m scrapers.snli\n```\n\nLicence: CC-BY-SA\n\n### Cityscapes\n\n[Cityscapes](https://www.cityscapes-dataset.com/benchmarks/#pixel-level-results) is a benchmark for semantic segmentation.\n\nTo scrape the current content run:\n\n```bash\npython -m scrapers.cityscapes\n```\n\n## Evaluating the SOTA extraction performance\n\nIn the future, this repository will also contain the automatic SOTA extraction pipeline. The aim is to automatically extract tasks, datasets and results from papers. \n\nTo evaluate the current prediction performance for all tasks:\n\n```bash\npython -m extractor.eval_all\n```\n\nThe most current report can be seen here: [eval_all_report.csv](data/eval_all_report.csv).\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/paperswithcode/sota-extractor", "keywords": "sota table", "license": "Apache-2.0", "maintainer": "Viktor Kerkez", "maintainer_email": "viktor@atlasml.io", "name": "sota-extractor", "package_url": "https://pypi.org/project/sota-extractor/", "platform": "Windows", "project_url": "https://pypi.org/project/sota-extractor/", "project_urls": { "Homepage": "https://github.com/paperswithcode/sota-extractor" }, "release_url": "https://pypi.org/project/sota-extractor/0.0.9/", "requires_dist": [ "beautifulsoup4", "click", "dataclasses", "lxml", "marshmallow (==3.0.0rc4)", "nltk", "pandas", "requests", "soupsieve" ], "requires_python": "", "summary": "Automatic SOTA (state-of-the-art) extraction.", "version": "0.0.9" }, "last_serial": 5827302, "releases": { "0.0.7": [ { "comment_text": "", "digests": { "md5": "769270e8a991d8c1b29d8fefbd9271e0", "sha256": "52106e370c9b92d5f516344241cb18a1a72479df4b66503fb4cdd3debb568972" }, "downloads": -1, "filename": "sota_extractor-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "769270e8a991d8c1b29d8fefbd9271e0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 24331, "upload_time": "2019-08-26T16:15:07", "url": "https://files.pythonhosted.org/packages/d5/6f/2b3dbd50169111dd1dca6f645528e0cf7c862fcb12a7df049c268294edcd/sota_extractor-0.0.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0b1092e973a7144801c6854acf8574c7", "sha256": "e4703ca7db37a66db9764c51aec47ca802edec5b40cc45da854bbc7cd0769d2a" }, "downloads": -1, "filename": "sota-extractor-0.0.7.tar.gz", "has_sig": false, "md5_digest": "0b1092e973a7144801c6854acf8574c7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16525, "upload_time": "2019-08-26T16:15:09", "url": "https://files.pythonhosted.org/packages/fa/68/98a1715db99570905f51f634f9438e7e42b07800b85f0caa318895d9aa75/sota-extractor-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "e753f5a81bd2a5d84e98f5607a5bec83", "sha256": "dbd8bbb51a8e331336fa50aa6af5e5dc4180f508641e5ee27b7c1c724072cfd4" }, "downloads": -1, "filename": "sota_extractor-0.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "e753f5a81bd2a5d84e98f5607a5bec83", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 24276, "upload_time": "2019-08-26T16:19:48", "url": "https://files.pythonhosted.org/packages/ec/57/4d62aea786270671e625f63932392ed0433fd8342baf3467790e65483fe9/sota_extractor-0.0.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "60ce897864a99d114fa06c6c83d4b257", "sha256": "45283c64ad2a7709b1c024b8704d029f7796dd793bc573270e4da246dbe8be88" }, "downloads": -1, "filename": "sota-extractor-0.0.8.tar.gz", "has_sig": false, "md5_digest": "60ce897864a99d114fa06c6c83d4b257", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16475, "upload_time": "2019-08-26T16:19:50", "url": "https://files.pythonhosted.org/packages/8c/a6/ffca65073d93e26acad7706b8de6c0c1797e2a200f9a4cd7360b69981905/sota-extractor-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "2e599dbfe88aba7840c6996aa950eed2", "sha256": "4656032942c04601c86193308f5255be6441c0604cb79be8ba9b149aae16d549" }, "downloads": -1, "filename": "sota_extractor-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "2e599dbfe88aba7840c6996aa950eed2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 25930, "upload_time": "2019-09-13T18:55:01", "url": "https://files.pythonhosted.org/packages/1c/fc/59b50c482c3f346a57874ac93d27e909bf3137bd382c81b3ce023b5e01d0/sota_extractor-0.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9a52233467b456228a2489f0b8467e1a", "sha256": "4b1cc6f3c5fa7cbed1fbf1aed5d65fa8a31cef16f68eff22849e05f28cb72b49" }, "downloads": -1, "filename": "sota-extractor-0.0.9.tar.gz", "has_sig": false, "md5_digest": "9a52233467b456228a2489f0b8467e1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17343, "upload_time": "2019-09-13T18:55:02", "url": "https://files.pythonhosted.org/packages/eb/13/acca49d0e587cece2416d757daedc78b1b623131e99615ca59de9b55ab32/sota-extractor-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2e599dbfe88aba7840c6996aa950eed2", "sha256": "4656032942c04601c86193308f5255be6441c0604cb79be8ba9b149aae16d549" }, "downloads": -1, "filename": "sota_extractor-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "2e599dbfe88aba7840c6996aa950eed2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 25930, "upload_time": "2019-09-13T18:55:01", "url": "https://files.pythonhosted.org/packages/1c/fc/59b50c482c3f346a57874ac93d27e909bf3137bd382c81b3ce023b5e01d0/sota_extractor-0.0.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9a52233467b456228a2489f0b8467e1a", "sha256": "4b1cc6f3c5fa7cbed1fbf1aed5d65fa8a31cef16f68eff22849e05f28cb72b49" }, "downloads": -1, "filename": "sota-extractor-0.0.9.tar.gz", "has_sig": false, "md5_digest": "9a52233467b456228a2489f0b8467e1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17343, "upload_time": "2019-09-13T18:55:02", "url": "https://files.pythonhosted.org/packages/eb/13/acca49d0e587cece2416d757daedc78b1b623131e99615ca59de9b55ab32/sota-extractor-0.0.9.tar.gz" } ] }