{ "info": { "author": "Kevin Jacobs", "author_email": "kevin91nl@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# CiteXtract\n\n[![Read the Docs](https://img.shields.io/readthedocs/citextract.svg)](https://citextract.readthedocs.io/en/latest/)\n[![CircleCI](https://img.shields.io/circleci/build/github/kmjjacobs/citextract/master.svg)](https://circleci.com/gh/kmjjacobs/citextract/tree/master)\n[![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/kmjjacobs/citextract.svg)](https://cloud.docker.com/repository/docker/kmjjacobs/citextract)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/citextract.svg)](https://img.shields.io/pypi/pyversions/citextract.svg)\n\n[CiteXtract](https://www.citextract.com/) - Bringing structure to the papers on ArXiv.\n\n## Getting started\n\nIn order to install CiteXtract, run the following command:\n\n```bash\npip install citextract\n```\n\n### Extracting references\n\nThen, one can extract references from a text using the RefXtract model:\n\n```python\nfrom citextract.models.refxtract import RefXtractor\n\nrefxtractor = RefXtractor().load()\ntext = \"\"\"This is a test sentence.\\n[1] Jacobs, K. 2019. This is a test title. In Proceedings of Some Journal.\"\"\"\nrefs = refxtractor(text)\nprint(refs)\n```\n\nIt gives the following output:\n\n```python\n['[1] Jacobs, K. 2019. This is a test title. In Proceedings of Some Journal.']\n```\n\nUnder the hood, a trained neural network extracts reference boundaries and extracts the references by using these boundaries.\n\n### Extracting titles\n\nUsing the found references, titles can be extracted by using the TitleXtract model:\n\n```python\nfrom citextract.models.titlextract import TitleXtractor\n\ntitlextractor = TitleXtractor().load()\nref = \"\"\"[1] Jacobs, K. 2019. This is a test title. In Proceedings of Some Journal.\"\"\"\ntitle = titlextractor(ref)\nprint(title)\n```\n\nIt gives the following output:\n\n```python\n'This is a test title.'\n```\n\nHere, a trained neural network extracts the titles from the given reference.\n\n### Converting an arXiv PDF to text\n\nThere is a utility available which takes an arXiv URL and converts it to text:\n\n```python\nfrom citextract.utils.pdf import convert_pdf_url_to_text\n\npdf_url = 'https://arxiv.org/pdf/some_file.pdf'\ntext = convert_pdf_url_to_text(pdf_url)\nprint(text)\n```\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://www.citextract.com/", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "citextract", "package_url": "https://pypi.org/project/citextract/", "platform": "", "project_url": "https://pypi.org/project/citextract/", "project_urls": { "Homepage": "https://www.citextract.com/" }, "release_url": "https://pypi.org/project/citextract/0.0.4/", "requires_dist": null, "requires_python": "", "summary": "CiteXtract - Bringing structure to the papers on ArXiv.", "version": "0.0.4" }, "last_serial": 5535118, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "d3ff63b4decec36169ce74b4920dc0d3", "sha256": "3c632f9a7e54c007ea94a3e270058d3b5c03db4e35990cacede9a6a302aeb3bd" }, "downloads": -1, "filename": "citextract-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "d3ff63b4decec36169ce74b4920dc0d3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2238, "upload_time": "2019-07-14T13:19:06", "url": "https://files.pythonhosted.org/packages/d2/1a/239425b391dc8c479ea12d1900f33ddba349b0107406c9e72082e1afdac4/citextract-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6e6f55ce38e8127263abdc1219f91413", "sha256": "e9eb781273a025746a72984b3f19e192524fce99b4f5187d9b8b6dbe176aa29e" }, "downloads": -1, "filename": "citextract-0.0.1.tar.gz", "has_sig": false, "md5_digest": "6e6f55ce38e8127263abdc1219f91413", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1024, "upload_time": "2019-07-14T13:19:08", "url": "https://files.pythonhosted.org/packages/a1/95/a2424783e371e4b47482d0999b3850be1da4945b0e1f47ac3617b31b1129/citextract-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "ca88346c0f605bf8db4f70e132360402", "sha256": "465b9d6efe84cecba8f70e38528c4c4012c924dca27d6820ea1f0de1c8f9a4e3" }, "downloads": -1, "filename": "citextract-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ca88346c0f605bf8db4f70e132360402", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11848, "upload_time": "2019-07-15T09:30:38", "url": "https://files.pythonhosted.org/packages/f0/bd/d8f7f95ce582a8c9ed372d7670032838b8944c595207356339f1b984ec67/citextract-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3cd26f82f89eff7fdea61cf2a000488b", "sha256": "c883dab3ca368179de23720f71e373bbb33510c99e24e72d3edf3d2b480dd6ac" }, "downloads": -1, "filename": "citextract-0.0.2.tar.gz", "has_sig": false, "md5_digest": "3cd26f82f89eff7fdea61cf2a000488b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8015, "upload_time": "2019-07-15T09:30:39", "url": "https://files.pythonhosted.org/packages/62/0d/b792ed13b625edea4f4a908ae7dc605aed946686431b1eb4345d8df60026/citextract-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "296a106fddbb6c8b3c2fbfe259bf9341", "sha256": "06f346c06135af11e72b531597727359114c0054c04a3817ff25c669f6e009fa" }, "downloads": -1, "filename": "citextract-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "296a106fddbb6c8b3c2fbfe259bf9341", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11926, "upload_time": "2019-07-15T10:21:05", "url": "https://files.pythonhosted.org/packages/68/1a/4da368bc416c7444f410c39de805a5e4eb1ade4c2b546fd613010cc61e2b/citextract-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "984ac4790e877ff0969e4c5079e42815", "sha256": "03b24ca61b39dbb88fb2f23f1e1201ee92b3f7815e01ce37392506a3902f375a" }, "downloads": -1, "filename": "citextract-0.0.3.tar.gz", "has_sig": false, "md5_digest": "984ac4790e877ff0969e4c5079e42815", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8095, "upload_time": "2019-07-15T10:21:07", "url": "https://files.pythonhosted.org/packages/d3/f2/b38f11078f4b37af95fcb7cb98c46c12f1b4581f300c031175d75548da03/citextract-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "e0bb7c218cde941fec9fd3e452081481", "sha256": "7713eab1198908b888421265f1965c2bb95511ea3898220033c7d7d5055e3b24" }, "downloads": -1, "filename": "citextract-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "e0bb7c218cde941fec9fd3e452081481", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11954, "upload_time": "2019-07-15T14:08:33", "url": "https://files.pythonhosted.org/packages/c2/86/d983b816b5cef8622becc773fde91ba841beb46a1ff15f30e5c6057003bb/citextract-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2e872991e689e0e8a614fc0557df70c2", "sha256": "4959e72b34a88ad9744edbc5e266daef49dc324ff75da5703d834ca3002da0c6" }, "downloads": -1, "filename": "citextract-0.0.4.tar.gz", "has_sig": false, "md5_digest": "2e872991e689e0e8a614fc0557df70c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8088, "upload_time": "2019-07-15T14:08:35", "url": "https://files.pythonhosted.org/packages/2a/3c/268cfb67cb6a86ed880ab0687b5ba9c84d3d9dfda9d06e5a26c42826b43c/citextract-0.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e0bb7c218cde941fec9fd3e452081481", "sha256": "7713eab1198908b888421265f1965c2bb95511ea3898220033c7d7d5055e3b24" }, "downloads": -1, "filename": "citextract-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "e0bb7c218cde941fec9fd3e452081481", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11954, "upload_time": "2019-07-15T14:08:33", "url": "https://files.pythonhosted.org/packages/c2/86/d983b816b5cef8622becc773fde91ba841beb46a1ff15f30e5c6057003bb/citextract-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2e872991e689e0e8a614fc0557df70c2", "sha256": "4959e72b34a88ad9744edbc5e266daef49dc324ff75da5703d834ca3002da0c6" }, "downloads": -1, "filename": "citextract-0.0.4.tar.gz", "has_sig": false, "md5_digest": "2e872991e689e0e8a614fc0557df70c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8088, "upload_time": "2019-07-15T14:08:35", "url": "https://files.pythonhosted.org/packages/2a/3c/268cfb67cb6a86ed880ab0687b5ba9c84d3d9dfda9d06e5a26c42826b43c/citextract-0.0.4.tar.gz" } ] }