{ "info": { "author": "Vinayak Mehta", "author_email": "vmehta94@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "
\n
\n
\n$ excalibur initdb\n\n\nAnd then start the webserver using:\n\n
\n$ excalibur webserver\n\n\nThat's it! Now you can go to http://localhost:5000 and start extracting tabular data from your PDFs.\n\n\n1. **Upload** a PDF and enter the page numbers you want to extract tables from.\n\n2. Go to each page and select the table by drawing a box around it. (You can choose to skip this step since Excalibur can automatically detect tables on its own. Click on \"**Autodetect tables**\" to see what Excalibur sees.)\n\n3. Choose a flavor (Lattice or Stream) from \"**Advanced**\".\n\n a. **Lattice**: For tables formed with lines.\n\n b. **Stream**: For tables formed with whitespaces.\n\n4. Click on \"**View and download data**\" to see the extracted tables.\n\n5. Select your favorite format (CSV/Excel/JSON/HTML) and click on \"**Download**\"!\n\n**Note:** You can also download executables for Windows and Linux from the [releases page](https://github.com/camelot-dev/excalibur/releases) and run them directly!\n\n\n\n## Why Excalibur?\n\n- Extracting tables from PDFs is hard. A simple copy-and-paste from a PDF into an Excel doesn't preserve table structure. **Excalibur makes PDF table extraction very easy**, by automatically detecting tables in PDFs and letting you save them into CSVs and Excel files.\n- Excalibur uses [Camelot](https://camelot-py.readthedocs.io/) under the hood, which gives you additional settings to tweak table extraction and get the best results. You can see how it performs better than other open-source tools and libraries [in this comparison](https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).\n- You can save table extraction [settings](https://excalibur-py.readthedocs.io/en/master/user/faq.html#faq) (like table areas) for a PDF once, and apply them on new PDFs to extract tables with similar structures.\n- You get complete control over your data. All file storage and processing happens on your own local or remote machine.\n- Excalibur can be configured with MySQL and Celery for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.\n\n## Installation\n\n### Using pip\n\nAfter installing [ghostscript](https://www.ghostscript.com/), which is one of the requirements for Camelot (See [install instructions](https://camelot-py.readthedocs.io/en/master/user/install-deps.html)), you can simply use pip to install Excalibur:\n\n
\n$ pip install excalibur-py\n\n\n### From the source code\n\nAfter installing ghostscript, clone the repo using:\n\n
\n$ git clone https://www.github.com/camelot-dev/excalibur\n\n\nand install Excalibur using pip:\n\n
\n$ cd excalibur\n$ pip install .\n\n\n## Documentation\n\nFantastic documentation is available at [http://excalibur-py.readthedocs.io/](http://excalibur-py.readthedocs.io/).\n\n## Development\n\nThe [Contributor's Guide](https://excalibur-py.readthedocs.io/en/master/dev/contributing.html) has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.\n\n### Source code\n\nYou can check the latest sources with:\n\n
\n$ git clone https://www.github.com/camelot-dev/excalibur\n\n\n### Setting up a development environment\n\nYou can install the development dependencies easily, using pip:\n\n
\n$ pip install excalibur-py[dev]\n\n\n### Testing (soon)\n\nAfter installation, you can run tests using:\n\n
\n$ python setup.py test\n\n\n## Versioning\n\nExcalibur uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/excalibur/blob/master/HISTORY.md).\n\n## License\n\nThis project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/excalibur/blob/master/LICENSE) file for details.\n\n## Support the development\n\nYou can support our work on Excalibur with a one-time or monthly donation [on OpenCollective](https://opencollective.com/excalibur). Organizations who use Excalibur can also sponsor the project for an acknowledgement on [our official site](https://www.tryexcalibur.com/) and this README.\n\nSpecial thanks to all the users and organizations that support Excalibur!\n\n