{ "info": { "author": "Vinayak Mehta", "author_email": "vmehta94@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "

\n \n

\n\n# Excalibur: A web interface to extract tabular data from PDFs\n\n[![Documentation Status](https://readthedocs.org/projects/excalibur-py/badge/?version=master)](https://excalibur-py.readthedocs.io/en/master/) [![image](https://img.shields.io/pypi/v/excalibur-py.svg)](https://pypi.org/project/excalibur-py/) [![image](https://img.shields.io/pypi/l/excalibur-py.svg)](https://pypi.org/project/excalibur-py/) [![image](https://img.shields.io/pypi/pyversions/excalibur-py.svg)](https://pypi.org/project/excalibur-py/) [![Gitter chat](https://badges.gitter.im/camelot-dev/Lobby.png)](https://gitter.im/camelot-dev/Lobby)\n\n**Excalibur** is a web interface to extract tabular data from PDFs, written in **Python 3**! It is powered by [Camelot](https://camelot-py.readthedocs.io/).\n\n**Note:** Excalibur only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), \"If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based\".)\n\n## Using Excalibur\n\n**Note:** You need to [install ghostscript](https://camelot-py.readthedocs.io/en/master/user/install-deps.html) before moving forward.\n\nAfter [installing Excalibur with pip](https://excalibur-py.readthedocs.io/en/master/user/install.html), you need to initialize the metadata database using:\n\n
\n$ excalibur initdb\n
\n\nAnd then start the webserver using:\n\n
\n$ excalibur webserver\n
\n\nThat's it! Now you can go to http://localhost:5000 and start extracting tabular data from your PDFs.\n\n\n1. **Upload** a PDF and enter the page numbers you want to extract tables from.\n\n2. Go to each page and select the table by drawing a box around it. (You can choose to skip this step since Excalibur can automatically detect tables on its own. Click on \"**Autodetect tables**\" to see what Excalibur sees.)\n\n3. Choose a flavor (Lattice or Stream) from \"**Advanced**\".\n\n a. **Lattice**: For tables formed with lines.\n\n b. **Stream**: For tables formed with whitespaces.\n\n4. Click on \"**View and download data**\" to see the extracted tables.\n\n5. Select your favorite format (CSV/Excel/JSON/HTML) and click on \"**Download**\"!\n\n**Note:** You can also download executables for Windows and Linux from the [releases page](https://github.com/camelot-dev/excalibur/releases) and run them directly!\n\n![usage.gif](https://excalibur-py.readthedocs.io/en/master/_images/usage.gif)\n\n## Why Excalibur?\n\n- Extracting tables from PDFs is hard. A simple copy-and-paste from a PDF into an Excel doesn't preserve table structure. **Excalibur makes PDF table extraction very easy**, by automatically detecting tables in PDFs and letting you save them into CSVs and Excel files.\n- Excalibur uses [Camelot](https://camelot-py.readthedocs.io/) under the hood, which gives you additional settings to tweak table extraction and get the best results. You can see how it performs better than other open-source tools and libraries [in this comparison](https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).\n- You can save table extraction [settings](https://excalibur-py.readthedocs.io/en/master/user/faq.html#faq) (like table areas) for a PDF once, and apply them on new PDFs to extract tables with similar structures.\n- You get complete control over your data. All file storage and processing happens on your own local or remote machine.\n- Excalibur can be configured with MySQL and Celery for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.\n\n## Installation\n\n### Using pip\n\nAfter installing [ghostscript](https://www.ghostscript.com/), which is one of the requirements for Camelot (See [install instructions](https://camelot-py.readthedocs.io/en/master/user/install-deps.html)), you can simply use pip to install Excalibur:\n\n
\n$ pip install excalibur-py\n
\n\n### From the source code\n\nAfter installing ghostscript, clone the repo using:\n\n
\n$ git clone https://www.github.com/camelot-dev/excalibur\n
\n\nand install Excalibur using pip:\n\n
\n$ cd excalibur\n$ pip install .\n
\n\n## Documentation\n\nFantastic documentation is available at [http://excalibur-py.readthedocs.io/](http://excalibur-py.readthedocs.io/).\n\n## Development\n\nThe [Contributor's Guide](https://excalibur-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.\n\n### Source code\n\nYou can check the latest sources with:\n\n
\n$ git clone https://www.github.com/camelot-dev/excalibur\n
\n\n### Setting up a development environment\n\nYou can install the development dependencies easily, using pip:\n\n
\n$ pip install excalibur-py[dev]\n
\n\n### Testing (soon)\n\nAfter installation, you can run tests using:\n\n
\n$ python setup.py test\n
\n\n## Versioning\n\nExcalibur uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/excalibur/blob/master/HISTORY.md).\n\n## License\n\nThis project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/excalibur/blob/master/LICENSE) file for details.\n\n## Support the development\n\nYou can support our work on Excalibur with a one-time or monthly donation [on OpenCollective](https://opencollective.com/excalibur). Organizations who use Excalibur can also sponsor the project for an acknowledgement on [our official site](https://www.tryexcalibur.com/) and this README.\n\nSpecial thanks to all the users and organizations that support Excalibur!\n\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://excalibur-py.readthedocs.io/", "keywords": "", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "excalibur-py", "package_url": "https://pypi.org/project/excalibur-py/", "platform": "", "project_url": "https://pypi.org/project/excalibur-py/", "project_urls": { "Homepage": "https://excalibur-py.readthedocs.io/" }, "release_url": "https://pypi.org/project/excalibur-py/0.4.2/", "requires_dist": [ "camelot-py[cv] (>=0.7.1)", "celery (>=4.1.1)", "Click (>=7.0)", "configparser (<3.6.0,>=3.5.0)", "Flask (>=1.0.2)", "SQLAlchemy (>=1.2.12)", "camelot-py[cv] (>=0.7.1); extra == 'all'", "celery (>=4.1.1); extra == 'all'", "Click (>=7.0); extra == 'all'", "configparser (<3.6.0,>=3.5.0); extra == 'all'", "Flask (>=1.0.2); extra == 'all'", "SQLAlchemy (>=1.2.12); extra == 'all'", "mysqlclient (>=1.3.6); extra == 'all'", "codecov (>=2.0.15); extra == 'dev'", "pytest (>=3.8.0); extra == 'dev'", "pytest-cov (>=2.6.0); extra == 'dev'", "pytest-runner (>=4.2); extra == 'dev'", "Sphinx (>=1.8.1); extra == 'dev'", "mysqlclient (>=1.3.6); extra == 'mysql'" ], "requires_python": "", "summary": "A web interface to extract tabular data from PDFs.", "version": "0.4.2" }, "last_serial": 4676099, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "f4a63da5c5774e665e044ee7c60a159b", "sha256": "236d1a1dd0a009f0423a419f9a416260fde020694a2a7eca6cb95271cb2fe5c0" }, "downloads": -1, "filename": "excalibur_py-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "f4a63da5c5774e665e044ee7c60a159b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 1597835, "upload_time": "2018-10-21T03:20:20", "url": "https://files.pythonhosted.org/packages/5d/89/3f466afc88ce2f51e08f73a0d0e7892e2bbe340ba331d6858005b818e2e0/excalibur_py-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "07c2cb5d15ed2ee5f1762063814c1c41", "sha256": "0edb00f1a12f4a8e4e2d66023fd170e1676bcb9ee81060ef3311880e75b97e65" }, "downloads": -1, "filename": "excalibur-py-0.1.0.tar.gz", "has_sig": false, "md5_digest": "07c2cb5d15ed2ee5f1762063814c1c41", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1572644, "upload_time": "2018-10-21T03:20:23", "url": "https://files.pythonhosted.org/packages/83/71/1915694852ae3d6a7b63f20f06e186ebac75ecfa08412eac8485f0f39945/excalibur-py-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "b964eb4f78fb53ed11f72a72f7075399", "sha256": "cb97c1c5db6bb6270d8c93403e9fb7c6f98cceb08c8df4703a783ef07145c6cc" }, "downloads": -1, "filename": "excalibur_py-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "b964eb4f78fb53ed11f72a72f7075399", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 180417, "upload_time": "2018-10-22T11:32:24", "url": "https://files.pythonhosted.org/packages/2e/ee/d03197da76432827504347bbde71ea6efac03654f923bf392cad317b630b/excalibur_py-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "321bd0a55054410edb1ab69caee2bc64", "sha256": "6d9c6995807214cd93ef462922fdc765b9b8433b8ffefe2d500c1b4cb139b01f" }, "downloads": -1, "filename": "excalibur-py-0.1.1.tar.gz", "has_sig": false, "md5_digest": "321bd0a55054410edb1ab69caee2bc64", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 163965, "upload_time": "2018-10-22T11:32:26", "url": "https://files.pythonhosted.org/packages/98/af/96ffc7a648c1498ac641133921361463eb6a5a8556bfeb7a465ee93a986b/excalibur-py-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "b29a22c72409d4aeb8df11d9487abdde", "sha256": "c44df8957f2c08ae5355829378c425b105c5b0c6feb4628b2665145e20e47ae3" }, "downloads": -1, "filename": "excalibur_py-0.2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "b29a22c72409d4aeb8df11d9487abdde", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 186835, "upload_time": "2018-11-05T02:18:02", "url": "https://files.pythonhosted.org/packages/44/54/ee0bef9b4e1ba52856109978389e98ef7dd97bea19178e1079909413d13a/excalibur_py-0.2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4749d370c04067fb598c478395d2d045", "sha256": "740b2760df6e2c98f4cab749845a844e8e8cccb58a917d02b861c44f8e3afc7c" }, "downloads": -1, "filename": "excalibur-py-0.2.0.tar.gz", "has_sig": false, "md5_digest": "4749d370c04067fb598c478395d2d045", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 168094, "upload_time": "2018-11-05T02:18:06", "url": "https://files.pythonhosted.org/packages/3c/da/0d3622f20b94180f888c5fb5b26010acb2dda6e407ad3d103e65eca745a9/excalibur-py-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "0c09fd2edaa59dfe69ee44688d3f2de3", "sha256": "3f766d527564a2f95d5c1eb26ffced9571a323eaee580cbf85adb66e7b7c3014" }, "downloads": -1, "filename": "excalibur_py-0.2.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0c09fd2edaa59dfe69ee44688d3f2de3", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 186981, "upload_time": "2018-11-06T23:56:45", "url": "https://files.pythonhosted.org/packages/92/87/f93da9eef0dcafbdb939a16ee860b16ab519f1103a2c8ddfc8ff98917254/excalibur_py-0.2.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d6534dc04060326602e75e6b3eb9b538", "sha256": "ea6cf84e3f4366a2beb8d53956720f33bae84484fbaa49d7bcbcc28cdd4db44f" }, "downloads": -1, "filename": "excalibur-py-0.2.1.tar.gz", "has_sig": false, "md5_digest": "d6534dc04060326602e75e6b3eb9b538", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 168259, "upload_time": "2018-11-06T23:56:47", "url": "https://files.pythonhosted.org/packages/6c/0e/9d7391502e35182d70b87b6e84b6c7a7a48156a5551450805455bd618632/excalibur-py-0.2.1.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "05b93a07c5f5991d9043d40137c45084", "sha256": "938a1cdb1ab0bddbe50f8c39ad2c0b26706f378d74242ac8c55fa1b7de797ed8" }, "downloads": -1, "filename": "excalibur_py-0.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "05b93a07c5f5991d9043d40137c45084", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 190233, "upload_time": "2018-11-12T01:48:19", "url": "https://files.pythonhosted.org/packages/37/5d/f588c96c32208f64d4394d3198e7e6f5bc552e89c1b7e5db20c4489fb86f/excalibur_py-0.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d030bfefa035676935831159bb760437", "sha256": "a0e6ba743a3bbdc910a3b1212b7248fd8d5811129e6a1c1869670a818871834e" }, "downloads": -1, "filename": "excalibur-py-0.3.0.tar.gz", "has_sig": false, "md5_digest": "d030bfefa035676935831159bb760437", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 170261, "upload_time": "2018-11-12T01:48:21", "url": "https://files.pythonhosted.org/packages/07/ba/15c1d964e29d376cfaf62baf58e4dafe8dae93f2ae2fdf7dc5fce09b107a/excalibur-py-0.3.0.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "a9dc3a3d794b14aa3eb86da197950275", "sha256": "01966854f1b64170b1c4018059214c9bd96b21a1b1022acc8e9f4f60d8ebf125" }, "downloads": -1, "filename": "excalibur_py-0.4.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a9dc3a3d794b14aa3eb86da197950275", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 1494754, "upload_time": "2018-11-25T18:46:32", "url": "https://files.pythonhosted.org/packages/9e/fe/6d60ad37075c89136e614cefa494aff4701e97e5121193c09a70d3c827b0/excalibur_py-0.4.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "625a52ec716086e83dfc0a126529a4d9", "sha256": "a3f43dcb9b0d830d17a59fb369ca31e999cbb416e69b5c343acbd3096a007d1e" }, "downloads": -1, "filename": "excalibur-py-0.4.0.tar.gz", "has_sig": false, "md5_digest": "625a52ec716086e83dfc0a126529a4d9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1475264, "upload_time": "2018-11-25T18:46:36", "url": "https://files.pythonhosted.org/packages/3c/0a/85ad8d834d0511208b5b774d25b85f33990760f91216edf0da80cb8f9804/excalibur-py-0.4.0.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "fe672222a57a8ad71a3e15c1504bf0aa", "sha256": "318c8e4bd58805388fb71de8518d9c4ff07d591370aaa09cca041204ee43f183" }, "downloads": -1, "filename": "excalibur_py-0.4.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fe672222a57a8ad71a3e15c1504bf0aa", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 2331415, "upload_time": "2018-12-27T15:07:19", "url": "https://files.pythonhosted.org/packages/6a/ec/f2035076e58ce37724c059375d0f23f0ae1d4487827bcf74f000e383c7e2/excalibur_py-0.4.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4e87fbf244675699f35051f604befd6", "sha256": "730b7293bbd6c1db62811b1c095c9252ffb408bcc8f38bad01eb34eed3ac96f5" }, "downloads": -1, "filename": "excalibur-py-0.4.1.tar.gz", "has_sig": false, "md5_digest": "c4e87fbf244675699f35051f604befd6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2302278, "upload_time": "2018-12-27T15:07:23", "url": "https://files.pythonhosted.org/packages/05/22/24eb1981f92b3fab8cc41b43734c561924810aeb4f22eedc32a02f70470d/excalibur-py-0.4.1.tar.gz" } ], "0.4.2": [ { "comment_text": "", "digests": { "md5": "cfac58601d83a216cbb5297dc352f5e1", "sha256": "15330f59fca6b9da304e312e4350e1c48b197f7f5b0af7d286bc853cc592fd2c" }, "downloads": -1, "filename": "excalibur_py-0.4.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "cfac58601d83a216cbb5297dc352f5e1", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15020898, "upload_time": "2019-01-09T09:49:31", "url": "https://files.pythonhosted.org/packages/35/46/47bc9d4f1cd551832cba0a2de75c36d47fff1e684d031addd5c4674a0428/excalibur_py-0.4.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "845ee93c337178762319880806de1032", "sha256": "084dfb038d066545d70c619e3689205861a5ac93b032cb853a329c38995cfc6d" }, "downloads": -1, "filename": "excalibur-py-0.4.2.tar.gz", "has_sig": false, "md5_digest": "845ee93c337178762319880806de1032", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14962716, "upload_time": "2019-01-09T09:49:58", "url": "https://files.pythonhosted.org/packages/f9/52/ff23291a522fe1a17dc276b6de780dd1fa59a36ee99c30c9cae7ea79ee19/excalibur-py-0.4.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cfac58601d83a216cbb5297dc352f5e1", "sha256": "15330f59fca6b9da304e312e4350e1c48b197f7f5b0af7d286bc853cc592fd2c" }, "downloads": -1, "filename": "excalibur_py-0.4.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "cfac58601d83a216cbb5297dc352f5e1", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15020898, "upload_time": "2019-01-09T09:49:31", "url": "https://files.pythonhosted.org/packages/35/46/47bc9d4f1cd551832cba0a2de75c36d47fff1e684d031addd5c4674a0428/excalibur_py-0.4.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "845ee93c337178762319880806de1032", "sha256": "084dfb038d066545d70c619e3689205861a5ac93b032cb853a329c38995cfc6d" }, "downloads": -1, "filename": "excalibur-py-0.4.2.tar.gz", "has_sig": false, "md5_digest": "845ee93c337178762319880806de1032", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14962716, "upload_time": "2019-01-09T09:49:58", "url": "https://files.pythonhosted.org/packages/f9/52/ff23291a522fe1a17dc276b6de780dd1fa59a36ee99c30c9cae7ea79ee19/excalibur-py-0.4.2.tar.gz" } ] }