{ "info": { "author": "Vinayak Mehta", "author_email": "vmehta94@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "

\n \n

\n\n# Camelot: PDF Table Extraction for Humans\n\n[![Build Status](https://travis-ci.org/camelot-dev/camelot.svg?branch=master)](https://travis-ci.org/camelot-dev/camelot) [![Documentation Status](https://readthedocs.org/projects/python-camelot/badge/?version=master)](https://python-camelot.readthedocs.io/en/master/)\n [![codecov.io](https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master&service=github)](https://codecov.io/github/camelot-dev/camelot?branch=master)\n [![image](https://img.shields.io/pypi/v/python-camelot.svg)](https://pypi.org/project/python-camelot/) [![image](https://img.shields.io/pypi/l/python-camelot.svg)](https://pypi.org/project/python-camelot/) [![image](https://img.shields.io/pypi/pyversions/python-camelot.svg)](https://pypi.org/project/python-camelot/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby)\n\n**Camelot** is a Python library that makes it easy for *anyone* to extract tables from PDF files!\n\n**Note:** You can also check out [Excalibur](https://github.com/camelot-dev/excalibur), which is a web interface for Camelot!\n\n---\n\n**Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).\n\n
\n>>> import camelot\n>>> tables = camelot.read_pdf('foo.pdf')\n>>> tables\n<TableList n=1>\n>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, sqlite\n>>> tables[0]\n<Table shape=(7, 7)>\n>>> tables[0].parsing_report\n{\n    'accuracy': 99.02,\n    'whitespace': 12.24,\n    'order': 1,\n    'page': 1\n}\n>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_sqlite\n>>> tables[0].df # get a pandas DataFrame!\n
\n\n| Cycle Name | KI (1/km) | Distance (mi) | Percent Fuel Savings | | | |\n|------------|-----------|---------------|----------------------|-----------------|-----------------|----------------|\n| | | | Improved Speed | Decreased Accel | Eliminate Stops | Decreased Idle |\n| 2012_2 | 3.30 | 1.3 | 5.9% | 9.5% | 29.2% | 17.4% |\n| 2145_1 | 0.68 | 11.2 | 2.4% | 0.1% | 9.5% | 2.7% |\n| 4234_1 | 0.59 | 58.7 | 8.5% | 1.3% | 8.5% | 3.3% |\n| 2032_2 | 0.17 | 57.8 | 21.7% | 0.3% | 2.7% | 1.2% |\n| 4171_1 | 0.07 | 173.9 | 58.1% | 1.6% | 2.1% | 0.5% |\n\nThere's a [command-line interface](https://python-camelot.readthedocs.io/en/master/user/cli.html) too!\n\n**Note:** Camelot only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), \"If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based\".)\n\n## Why Camelot?\n\n- **You are in control.**: Unlike other libraries and tools which either give a nice output or fail miserably (with no in-between), Camelot gives you the power to tweak table extraction. (This is important since everything in the real world, including PDF table extraction, is fuzzy.)\n- *Bad* tables can be discarded based on **metrics** like accuracy and whitespace, without ever having to manually look at each table.\n- Each table is a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873).\n- **Export** to multiple formats, including JSON, Excel, HTML and Sqlite.\n\nSee [comparison with other PDF table extraction libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).\n\n## Installation\n\n### Using conda\n\nThe easiest way to install Camelot is to install it with [conda](https://conda.io/docs/), which is a package manager and environment management system for the [Anaconda](http://docs.continuum.io/anaconda/) distribution.\n\n
\n$ conda install -c conda-forge python-camelot\n
\n\n### Using pip\n\nAfter [installing the dependencies](https://python-camelot.readthedocs.io/en/master/user/install-deps.html) ([tk](https://packages.ubuntu.com/trusty/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can simply use pip to install Camelot:\n\n
\n$ pip install python-camelot[cv]\n
\n\n### From the source code\n\nAfter [installing the dependencies](https://python-camelot.readthedocs.io/en/master/user/install.html#using-pip), clone the repo using:\n\n
\n$ git clone https://www.github.com/camelot-dev/camelot\n
\n\nand install Camelot using pip:\n\n
\n$ cd camelot\n$ pip install \".[cv]\"\n
\n\n## Documentation\n\nGreat documentation is available at [http://python-camelot.readthedocs.io/](http://python-camelot.readthedocs.io/).\n\n## Development\n\nThe [Contributor's Guide](https://python-camelot.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.\n\n### Source code\n\nYou can check the latest sources with:\n\n
\n$ git clone https://www.github.com/camelot-dev/camelot\n
\n\n### Setting up a development environment\n\nYou can install the development dependencies easily, using pip:\n\n
\n$ pip install python-camelot[dev]\n
\n\n### Testing\n\nAfter installation, you can run tests using:\n\n
\n$ python setup.py test\n
\n\n## Versioning\n\nCamelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/camelot/blob/master/HISTORY.md).\n\n## License\n\nThis project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/camelot/blob/master/LICENSE) file for details.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://python-camelot.readthedocs.io/", "keywords": "", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "python-camelot", "package_url": "https://pypi.org/project/python-camelot/", "platform": "", "project_url": "https://pypi.org/project/python-camelot/", "project_urls": { "Homepage": "http://python-camelot.readthedocs.io/" }, "release_url": "https://pypi.org/project/python-camelot/0.7.2/", "requires_dist": [ "chardet (>=3.0.4)", "click (>=6.7)", "numpy (>=1.13.3)", "openpyxl (>=2.5.8)", "pandas (>=0.23.4)", "pdfminer.six (>=20170720)", "PyPDF2 (>=1.26.0)", "opencv-python (>=3.4.2.17) ; extra == 'all'", "matplotlib (>=2.2.3) ; extra == 'all'", "opencv-python (>=3.4.2.17) ; extra == 'cv'", "codecov (>=2.0.15) ; extra == 'dev'", "pytest (>=3.8.0) ; extra == 'dev'", "pytest-cov (>=2.6.0) ; extra == 'dev'", "pytest-mpl (>=0.10) ; extra == 'dev'", "pytest-runner (>=4.2) ; extra == 'dev'", "Sphinx (>=1.7.9) ; extra == 'dev'", "opencv-python (>=3.4.2.17) ; extra == 'dev'", "matplotlib (>=2.2.3) ; extra == 'dev'", "matplotlib (>=2.2.3) ; extra == 'plot'" ], "requires_python": "", "summary": "PDF Table Extraction for Humans.", "version": "0.7.2" }, "last_serial": 5475629, "releases": { "0.7.2": [ { "comment_text": "", "digests": { "md5": "4bb257ded4ebd01d230dfa51f5f9ff5a", "sha256": "a9849eae39d925272d11853e41769966ecd9773ed474235728528d1d57f3bf13" }, "downloads": -1, "filename": "python_camelot-0.7.2-py3-none-any.whl", "has_sig": false, "md5_digest": "4bb257ded4ebd01d230dfa51f5f9ff5a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42236, "upload_time": "2019-07-02T08:07:19", "url": "https://files.pythonhosted.org/packages/38/b8/8eb0ab996a49954248658968cca84505ee15b36623b19274c9fa34448623/python_camelot-0.7.2-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4bb257ded4ebd01d230dfa51f5f9ff5a", "sha256": "a9849eae39d925272d11853e41769966ecd9773ed474235728528d1d57f3bf13" }, "downloads": -1, "filename": "python_camelot-0.7.2-py3-none-any.whl", "has_sig": false, "md5_digest": "4bb257ded4ebd01d230dfa51f5f9ff5a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42236, "upload_time": "2019-07-02T08:07:19", "url": "https://files.pythonhosted.org/packages/38/b8/8eb0ab996a49954248658968cca84505ee15b36623b19274c9fa34448623/python_camelot-0.7.2-py3-none-any.whl" } ] }