{ "info": { "author": "Chris Hager", "author_email": "chris@linuxuser.at", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: End Users/Desktop", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Scientific/Engineering", "Topic :: Software Development :: Build Tools", "Topic :: Software Development :: Libraries", "Topic :: Utilities" ], "description": "====\nPDFx\n====\n\n.. image:: https://badge.fury.io/py/pdfx.svg\n :target: https://pypi.python.org/pypi/pdfx\n\n.. image:: https://travis-ci.org/metachris/pdfx.svg?branch=master\n :target: https://travis-ci.org/metachris/pdfx\n\n.. image:: https://img.shields.io/badge/license-Apache-blue.svg\n :target: https://github.com/metachris/pdfx/blob/master/LICENSE\n\nIntroduction\n============\n\nExtract references (pdf, url, doi) and metadata from a PDF. Optionally download all referenced PDFs and check for broken links.\n\n**Features**\n\n* Extract references and metadata from a given PDF\n* Detects pdf, url, arxiv and doi references\n* **Fast, parallel download of all referenced PDFs**\n* **Check for broken links** (using the ``-c`` flag)\n* Output as text or JSON (using the ``-j`` flag)\n* Extract the PDF text (using the ``--text`` flag)\n* Use as command-line tool or Python package\n* Compatible with Python 2 and 3\n* Works with local and online pdfs\n\n\nGetting Started\n===============\n\nGrab a copy of the code with ``easy_install`` or ``pip``, and run it::\n\n $ sudo easy_install -U pdfx\n ...\n $ pdfx \n\nRun ``pdfx -h`` to see the help output::\n\n $ pdfx -h\n usage: pdfx [-h] [-d OUTPUT_DIRECTORY] [-c] [-j] [-v] [-t] [-o OUTPUT_FILE]\n [--version]\n pdf\n\n Extract metadata and references from a PDF, and optionally download all\n referenced PDFs. Visit https://www.metachris.com/pdfx for more information.\n\n positional arguments:\n pdf Filename or URL of a PDF file\n\n optional arguments:\n -h, --help show this help message and exit\n -d OUTPUT_DIRECTORY, --download-pdfs OUTPUT_DIRECTORY\n Download all referenced PDFs into specified directory\n -c, --check-links Check for broken links\n -j, --json Output infos as JSON (instead of plain text)\n -v, --verbose Print all references (instead of only PDFs)\n -t, --text Only extract text (no metadata or references)\n -o OUTPUT_FILE, --output-file OUTPUT_FILE\n Output to specified file instead of console\n --version show program's version number and exit\n\n\nExamples\n========\n\nLets take a look at this paper: https://weakdh.org/imperfect-forward-secrecy.pdf::\n\n $ pdfx https://weakdh.org/imperfect-forward-secrecy.pdf\n Document infos:\n - CreationDate = D:20150821110623-04'00'\n - Creator = LaTeX with hyperref package\n - ModDate = D:20150821110805-04'00'\n - PTEX.Fullbanner = This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea version 6.1.1\n - Pages = 13\n - Producer = pdfTeX-1.40.14\n - Title = Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice\n - Trapped = False\n - dc = {'title': {'x-default': 'Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice'}, 'creator': [None], 'description': {'x-default': None}, 'format': 'application/pdf'}\n - pdf = {'Keywords': None, 'Producer': 'pdfTeX-1.40.14', 'Trapped': 'False'}\n - pdfx = {'PTEX.Fullbanner': 'This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea version 6.1.1'}\n - xap = {'CreateDate': '2015-08-21T11:06:23-04:00', 'ModifyDate': '2015-08-21T11:08:05-04:00', 'CreatorTool': 'LaTeX with hyperref package', 'MetadataDate': '2015-08-21T11:08:05-04:00'}\n - xapmm = {'InstanceID': 'uuid:4e570f88-cd0f-4488-85ad-03f4435a4048', 'DocumentID': 'uuid:98988d37-b43d-4c1a-965b-988dfb2944b6'}\n\n References: 36\n - URL: 18\n - PDF: 18\n\n PDF References:\n - http://www.spiegel.de/media/media-35533.pdf\n - http://www.spiegel.de/media/media-35513.pdf\n - http://www.spiegel.de/media/media-35509.pdf\n - http://www.spiegel.de/media/media-35529.pdf\n - http://www.spiegel.de/media/media-35527.pdf\n - http://cr.yp.to/factorization/smoothparts-20040510.pdf\n - http://www.spiegel.de/media/media-35517.pdf\n - http://www.spiegel.de/media/media-35526.pdf\n - http://www.spiegel.de/media/media-35519.pdf\n - http://www.spiegel.de/media/media-35522.pdf\n - http://cryptome.org/2013/08/spy-budget-fy13.pdf\n - http://www.spiegel.de/media/media-35515.pdf\n - http://www.spiegel.de/media/media-35514.pdf\n - http://www.hyperelliptic.org/tanja/SHARCS/talks06/thorsten.pdf\n - http://www.spiegel.de/media/media-35528.pdf\n - http://www.spiegel.de/media/media-35671.pdf\n - http://www.spiegel.de/media/media-35520.pdf\n - http://www.spiegel.de/media/media-35551.pdf\n\nYou can use the ``-v`` flag to output all references instead of just the PDFs.\n\n**Download all referenced pdfs** with ``-d`` (for ``download-pdfs``) to the specified directory (eg. to ``/tmp/``)::\n\n $ pdfx https://weakdh.org/imperfect-forward-secrecy.pdf -d /tmp/\n ...\n\nTo **extract text**, you can use the ``-t`` flag::\n\n # Extract text to console\n $ pdfx https://weakdh.org/imperfect-forward-secrecy.pdf -t\n\n # Extract text to file\n $ pdfx https://weakdh.org/imperfect-forward-secrecy.pdf -t -o pdf-text.txt\n\nTo **check for broken links** use the ``-c`` flag::\n\n $ pdfx https://weakdh.org/imperfect-forward-secrecy.pdf -c\n\nExample video of checking for broken links: http://recordit.co/PsigiMaooH\n\n\nUsage as Python library\n=======================\n\n::\n\n >>> import pdfx\n >>> pdf = pdfx.PDFx(\"filename-or-url.pdf\")\n >>> metadata = pdf.get_metadata()\n >>> references_list = pdf.get_references()\n >>> references_dict = pdf.get_references_as_dict()\n >>> pdf.download_pdfs(\"target-directory\")\n\n\nVarious\n=======\n\n* Author: Chris Hager \n* Homepage: https://www.metachris.com/pdfx\n* License: Apache\n\nFeedback, ideas and pull requests are welcome!", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://www.metachris.com/pdfx", "keywords": "pdf extract download urls", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "pdfx", "package_url": "https://pypi.org/project/pdfx/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/pdfx/", "project_urls": { "Homepage": "http://www.metachris.com/pdfx" }, "release_url": "https://pypi.org/project/pdfx/1.3.0/", "requires_dist": [ "pdfminer2", "chardet", "check-manifest; extra == 'dev'", "tox; extra == 'test'" ], "requires_python": "", "summary": "Extract metadata and URLs from PDF files, and download all referenced PDFs", "version": "1.3.0" }, "last_serial": 2015308, "releases": { "1.0.0": [], "1.0.1": [ { "comment_text": "", "digests": { "md5": "3c864e0b16cca6a0a1d9a6e663fb4852", "sha256": "00e78ea4e358edf85260ca27d1cd0ee4e221d68d6db93bc9f514e63fd386c27f" }, "downloads": -1, "filename": "pdfx-1.0.1-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "3c864e0b16cca6a0a1d9a6e663fb4852", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 70499, "upload_time": "2015-10-25T21:34:03", "url": "https://files.pythonhosted.org/packages/f9/98/876e2b81c161c512d712e1b2fccc95f94ade40fa9395699c58a5f4fa5fba/pdfx-1.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1cff4051027dfa8aa1b98798cb5495f0", "sha256": "a48c003bf3d0f7246673afbcf11e5e0d20b73c3e1f00403f171b4b11a92bdd86" }, "downloads": -1, "filename": "pdfx-1.0.1.tar.gz", "has_sig": true, "md5_digest": "1cff4051027dfa8aa1b98798cb5495f0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 62944, "upload_time": "2015-10-25T21:34:17", "url": "https://files.pythonhosted.org/packages/97/68/d6e49f017abfc1aceb6d7138e4bb0639a96db539d4ef3667f02531b1878f/pdfx-1.0.1.tar.gz" } ], "1.0.3": [ { "comment_text": "", "digests": { "md5": "21266dcf4e375f8c19555492e22b70ca", "sha256": "dbecfe0dde848102c553d90a61eb8dd7c67c31b6a30eb03522d8e0f8414b6594" }, "downloads": -1, "filename": "pdfx-1.0.3-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "21266dcf4e375f8c19555492e22b70ca", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 71189, "upload_time": "2015-10-29T08:00:05", "url": "https://files.pythonhosted.org/packages/a0/16/3ec7632e9b55836bfada853def146611187d7b352ca1f344c73b6df56b13/pdfx-1.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "54d275dd1ca3284ee7a7634eaeee0d6b", "sha256": "dde28c55b398e9b0e2631d244191696cdc1808e8d754c50225c4280073639bc7" }, "downloads": -1, "filename": "pdfx-1.0.3.tar.gz", "has_sig": true, "md5_digest": "54d275dd1ca3284ee7a7634eaeee0d6b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 63394, "upload_time": "2015-10-29T08:00:30", "url": "https://files.pythonhosted.org/packages/a1/ef/c08915f0cde0571d5f3c32c290abc470c16dbee6784a582735cefb6e30e0/pdfx-1.0.3.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "e679620a389e521be0d8f00ba61e9e86", "sha256": "832ff8364e84789efd0dff624b41c2595a13ec112cfbbfac86309cc1bf2d0895" }, "downloads": -1, "filename": "pdfx-1.2.0-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "e679620a389e521be0d8f00ba61e9e86", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 76950, "upload_time": "2015-12-05T23:55:43", "url": "https://files.pythonhosted.org/packages/23/f9/c950730c86db5101c1435304cafd9dbd41e365b1c4f5d8de235d5fe15af4/pdfx-1.2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b7d798454a325616e46841d8fee1fb16", "sha256": "f817619746a0b1e98452ed0a4aa10f752a84bfc0fd76c7c8cc0f062e22554755" }, "downloads": -1, "filename": "pdfx-1.2.0.tar.gz", "has_sig": true, "md5_digest": "b7d798454a325616e46841d8fee1fb16", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12924, "upload_time": "2015-12-05T23:55:51", "url": "https://files.pythonhosted.org/packages/98/eb/f44d71b094db4e929c9f65cccc47ed829c851b356ca765048bce6c9b484b/pdfx-1.2.0.tar.gz" } ], "1.2.1": [ { "comment_text": "", "digests": { "md5": "e298ab31ff378bbb791e82f9c289c77c", "sha256": "85417e0c25f44d42dd1a0edb39871bc92e0114d4ca2333bc1b0a65a4f6c229a9" }, "downloads": -1, "filename": "pdfx-1.2.1-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "e298ab31ff378bbb791e82f9c289c77c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 76955, "upload_time": "2015-12-06T00:11:25", "url": "https://files.pythonhosted.org/packages/75/ae/7f73abfa22b24f86863ca1370ea0a6b8b58a243ded2de5e2f4625b6b0aa7/pdfx-1.2.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "eec59cd029df7c137b2f9c0bc685c3ae", "sha256": "e1ee0c4042a64a48eb851ed61bf404467a9a3b201e517e22d2b147b540c39c4c" }, "downloads": -1, "filename": "pdfx-1.2.1.tar.gz", "has_sig": true, "md5_digest": "eec59cd029df7c137b2f9c0bc685c3ae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12926, "upload_time": "2015-12-06T00:11:36", "url": "https://files.pythonhosted.org/packages/26/1b/e8acd1b3f2636949e960f9200f15a553e28545141241b722a3e4d61610a9/pdfx-1.2.1.tar.gz" } ], "1.2.4": [ { "comment_text": "", "digests": { "md5": "a3cfc2f4849536c44da632dfcb84d591", "sha256": "6d650ac6c2e1b9dfd0e7dfa91e7d5a9b9fe7d8613cb12773216bf74e2b548624" }, "downloads": -1, "filename": "pdfx-1.2.4-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "a3cfc2f4849536c44da632dfcb84d591", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 77080, "upload_time": "2015-12-17T15:02:32", "url": "https://files.pythonhosted.org/packages/47/c3/cb0dcaf1cc18887257446fff0077a3701ef97fbc7cb23ab92eab1c8c3913/pdfx-1.2.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9ab630c120a19a67ba8fe14649c28a87", "sha256": "f82a8d03030cee0f1ddc1a472269f04747c728aa5789e390802c453707a21b2e" }, "downloads": -1, "filename": "pdfx-1.2.4.tar.gz", "has_sig": true, "md5_digest": "9ab630c120a19a67ba8fe14649c28a87", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12993, "upload_time": "2015-12-17T15:02:45", "url": "https://files.pythonhosted.org/packages/03/9d/9a4cd25c8318b62b83be51ea647ded5917bd1b24442e67b381367de77dd4/pdfx-1.2.4.tar.gz" } ], "1.2.6": [ { "comment_text": "", "digests": { "md5": "65d391a0a187d13dc91f95c82dde46a6", "sha256": "9bb81e7cedd42ebe046e63c12f212ae08189dee9311a224846775d91af1045ef" }, "downloads": -1, "filename": "pdfx-1.2.6-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "65d391a0a187d13dc91f95c82dde46a6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 77295, "upload_time": "2016-03-16T10:11:01", "url": "https://files.pythonhosted.org/packages/2b/4e/2eb36ccb0f858613e3778f469c74019945d5640276437c1b381decf744b8/pdfx-1.2.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2ad35f9428d0f15eda31a37d461100d4", "sha256": "6a1bf373197286163a9f4721fc7a37c64f50698360f071d17e86f64f6cc97c38" }, "downloads": -1, "filename": "pdfx-1.2.6.tar.gz", "has_sig": true, "md5_digest": "2ad35f9428d0f15eda31a37d461100d4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13230, "upload_time": "2016-03-16T10:13:04", "url": "https://files.pythonhosted.org/packages/c7/1c/3da166dd48d94894a4b30d817d3181a26df8faef6269b4d1b13bd5c57e94/pdfx-1.2.6.tar.gz" } ], "1.2.7": [ { "comment_text": "", "digests": { "md5": "adf2edbae634aefe47c489bf9df617cd", "sha256": "134d7c0f7a6c6f60b6fb10f2338c333b70bb82c2d1ef3352bb0665ef3546bcaa" }, "downloads": -1, "filename": "pdfx-1.2.7-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "adf2edbae634aefe47c489bf9df617cd", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 80646, "upload_time": "2016-03-19T00:02:25", "url": "https://files.pythonhosted.org/packages/49/87/1f96c0758d0c89af2003e7806a7d67d933e743204a7e56da6948848205bd/pdfx-1.2.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b980517622b8a0d9a2cd0c6338a03c98", "sha256": "700010754544cb3ab079c17b2fed9db52e3597b0812249a3d79f5980a6844449" }, "downloads": -1, "filename": "pdfx-1.2.7.tar.gz", "has_sig": true, "md5_digest": "b980517622b8a0d9a2cd0c6338a03c98", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14966, "upload_time": "2016-03-19T00:02:49", "url": "https://files.pythonhosted.org/packages/f5/b5/d34d1cb9da59122851d7ea2342789a94c4f245fb19e35ec7ee61590dc4da/pdfx-1.2.7.tar.gz" } ], "1.3.0": [ { "comment_text": "", "digests": { "md5": "f196b360234a7f33053c03ae1c1bdaf0", "sha256": "26d35f1f05a5a272a7fc3de5ec524acb5ac36931ed1df459530dab7166f3d1ac" }, "downloads": -1, "filename": "pdfx-1.3.0-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "f196b360234a7f33053c03ae1c1bdaf0", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 80715, "upload_time": "2016-03-19T00:06:12", "url": "https://files.pythonhosted.org/packages/2f/82/926a2eef8023114c0f55f42cd5335c33676a8d8cfcd76571fd1cd1fc7189/pdfx-1.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b0e4629d553679157327d761c20c515a", "sha256": "e3b296491879e4cf074fc42b50e9f86f6f8e1ab2628969520837ad348668d8b3" }, "downloads": -1, "filename": "pdfx-1.3.0.tar.gz", "has_sig": true, "md5_digest": "b0e4629d553679157327d761c20c515a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15011, "upload_time": "2016-03-19T00:06:18", "url": "https://files.pythonhosted.org/packages/a5/17/607291a65fae00859ea87e23687fc2f190bc67817ef2ec14ff39e6bd1e05/pdfx-1.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f196b360234a7f33053c03ae1c1bdaf0", "sha256": "26d35f1f05a5a272a7fc3de5ec524acb5ac36931ed1df459530dab7166f3d1ac" }, "downloads": -1, "filename": "pdfx-1.3.0-py2.py3-none-any.whl", "has_sig": true, "md5_digest": "f196b360234a7f33053c03ae1c1bdaf0", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 80715, "upload_time": "2016-03-19T00:06:12", "url": "https://files.pythonhosted.org/packages/2f/82/926a2eef8023114c0f55f42cd5335c33676a8d8cfcd76571fd1cd1fc7189/pdfx-1.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b0e4629d553679157327d761c20c515a", "sha256": "e3b296491879e4cf074fc42b50e9f86f6f8e1ab2628969520837ad348668d8b3" }, "downloads": -1, "filename": "pdfx-1.3.0.tar.gz", "has_sig": true, "md5_digest": "b0e4629d553679157327d761c20c515a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15011, "upload_time": "2016-03-19T00:06:18", "url": "https://files.pythonhosted.org/packages/a5/17/607291a65fae00859ea87e23687fc2f190bc67817ef2ec14ff39e6bd1e05/pdfx-1.3.0.tar.gz" } ] }