{ "info": { "author": "WeAreDevelopers", "author_email": "liad@wearedevelopers.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# extractpdf\nA python package focused on extracting content out of PDF files.\n\nThere seems to be [many options out there](https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file), but no single solution that is easy to install, even on Windows, and focus specifically on PDF files. So we have created this extractpdf package.\n\nIt is based on [Textract](https://github.com/deanmalmgren/textract) structure, but focuses on PDF only, and adds also other tools to the pipline, such as [PyPDF2](https://pythonhosted.org/PyPDF2/) and [Camelot](https://camelot-py.readthedocs.io/en/master/).\n\n\n# Usage:\nTo use this package, install it from pypi using:\n```\npip install extractpdf\n```\n\nThen use it like so:\n```python\nimport extractpdf as epdf\n\n# local file\ncontent = epdf.process('my_file.pdf')\n# url:\ncontent = epdf.process('http://www.example.com/some_file.pdf')\n```\n\n# Advanced Usage:\nTo control more features, one can use the PDFExtractor itself:\n```python\nfrom extractpdf import PDFExtractor\nepdf = PDFExtractor()\ncontent = epdf.get_content('http://www.example.com/some_file.pdf', keep_download=True)\nf = epdf.filename # f = some_file.pdf\nepdf.delete_file()\n```\n\n# Development\nWe welcome contributers warmly!\n\nFor running this project locally, you need first to install the dependency packages.\nTo install them, you can use [pipenv](https://docs.pipenv.org/):\n\n#### Installation using pipenv (which combines virtualenv with pip)\n\nInstall pipenv\n\n```bash\n# if you haven't installed pip\nsudo easy_install pip\n\n# install pipenv\npip install pipenv\n```\n\nOn MacOS - you can use homebrew:\n```\nbrew install pipenv\n```\n\nSet the pipenv to be local in the project:\nOn Windows:\n```bash\nset PIPENV_VENV_IN_PROJECT=true \n```\n\nOn Mac/Linux:\n```bash\nexport PIPENV_VENV_IN_PROJECT=true \n```\n\n... and then, install the packages and run the server\n```\n # install all packages\npipenv install\n```\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/WeAreDevelopers-com/extractpdf", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "extractpdf", "package_url": "https://pypi.org/project/extractpdf/", "platform": "", "project_url": "https://pypi.org/project/extractpdf/", "project_urls": { "Homepage": "https://github.com/WeAreDevelopers-com/extractpdf" }, "release_url": "https://pypi.org/project/extractpdf/0.0.4/", "requires_dist": null, "requires_python": "", "summary": "A tool to extract text from PDF files.", "version": "0.0.4" }, "last_serial": 4469744, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "ba91f97f54fac98c07b3b4fe74b1c507", "sha256": "53015a72a6c6aaacf4a471e1fae3e738814ae1c3d2faef0acf4bfa244dd12867" }, "downloads": -1, "filename": "extractpdf-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ba91f97f54fac98c07b3b4fe74b1c507", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19899, "upload_time": "2018-10-30T11:07:09", "url": "https://files.pythonhosted.org/packages/4e/d8/bb334e4c9f50a724b75d746b99c34d58fd49f779f5442a834ab6fe83a0fa/extractpdf-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5699f623f4b1fb35555ae527eefa91a0", "sha256": "9b42c82277f1445767ab9c31226ec51da4735ab1345aa884678b977553efc730" }, "downloads": -1, "filename": "extractpdf-0.0.2.tar.gz", "has_sig": false, "md5_digest": "5699f623f4b1fb35555ae527eefa91a0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7245, "upload_time": "2018-10-30T11:07:10", "url": "https://files.pythonhosted.org/packages/92/fd/a09026cc6e69c15252f26d44d81dd3aed63507028f0ade898dc57da8fb42/extractpdf-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "e86f4c6fb96d3700b19df0d14988fb92", "sha256": "1cdf131b0678c9e3406c91d85df0f6049fe49984de2e60351ad0b2415679fb50" }, "downloads": -1, "filename": "extractpdf-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "e86f4c6fb96d3700b19df0d14988fb92", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20128, "upload_time": "2018-11-06T14:42:26", "url": "https://files.pythonhosted.org/packages/dd/e7/9156445c53e36355eeecff91fb4668214546cb1313334b1714abc96c0dd9/extractpdf-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6fc89fb2621552501049274de24c1e5f", "sha256": "bd0cf13d259b477318b0a4c69892c9523e5386673a724f707a6180e89757d251" }, "downloads": -1, "filename": "extractpdf-0.0.3.tar.gz", "has_sig": false, "md5_digest": "6fc89fb2621552501049274de24c1e5f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7424, "upload_time": "2018-11-06T14:42:28", "url": "https://files.pythonhosted.org/packages/51/b8/61c491ae41967079880a66aa962b615abc49963aff59a19ec770c96e81b6/extractpdf-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "df83014e1ad537291bb680f0100c1c5f", "sha256": "08ce7a29bbd88a2c4bcfd175c690ac2ad2cd0a76d0884ef4e11bea80916eb1c8" }, "downloads": -1, "filename": "extractpdf-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "df83014e1ad537291bb680f0100c1c5f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 22610, "upload_time": "2018-11-09T14:45:31", "url": "https://files.pythonhosted.org/packages/8b/b6/e5d89dd613136096631bc0ed47513025e431c767d69a516900939a685436/extractpdf-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5f4c6d83d8b693a6ea38bf58e54e5be0", "sha256": "6a94e12dea1ce7b33e3016e0f4d00f2150b9850952cf107e0db441844b442c59" }, "downloads": -1, "filename": "extractpdf-0.0.4.tar.gz", "has_sig": false, "md5_digest": "5f4c6d83d8b693a6ea38bf58e54e5be0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7689, "upload_time": "2018-11-09T14:45:32", "url": "https://files.pythonhosted.org/packages/ce/2b/ac1cd6ddd8a6a6e9c606bf9b83cf06e95598b7b8d54a740eccc5ecf937ca/extractpdf-0.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "df83014e1ad537291bb680f0100c1c5f", "sha256": "08ce7a29bbd88a2c4bcfd175c690ac2ad2cd0a76d0884ef4e11bea80916eb1c8" }, "downloads": -1, "filename": "extractpdf-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "df83014e1ad537291bb680f0100c1c5f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 22610, "upload_time": "2018-11-09T14:45:31", "url": "https://files.pythonhosted.org/packages/8b/b6/e5d89dd613136096631bc0ed47513025e431c767d69a516900939a685436/extractpdf-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5f4c6d83d8b693a6ea38bf58e54e5be0", "sha256": "6a94e12dea1ce7b33e3016e0f4d00f2150b9850952cf107e0db441844b442c59" }, "downloads": -1, "filename": "extractpdf-0.0.4.tar.gz", "has_sig": false, "md5_digest": "5f4c6d83d8b693a6ea38bf58e54e5be0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7689, "upload_time": "2018-11-09T14:45:32", "url": "https://files.pythonhosted.org/packages/ce/2b/ac1cd6ddd8a6a6e9c606bf9b83cf06e95598b7b8d54a740eccc5ecf937ca/extractpdf-0.0.4.tar.gz" } ] }