{ "info": { "author": "Akshowhini", "author_email": "brain@extracttable.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "[![image](https://i.imgur.com/YIHmXue.png?1)](https://extracttable.com?ref=github-TP)\n\n# TabulaPro: Pro-version of Tabula-py \n[![image](https://img.shields.io/pypi/v/tabulapro.svg?maxAge=3600)](https://pypi.org/project/tabulapro/) [![image](https://img.shields.io/github/license/extracttable/tabulapro)]() [![image](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue)]() \n\n**TabulaPro** is a layer on the tabula-py library to extract tables from **Scan PDFs and Images**. \n\n\n## TabulaPro vs Tabula\n\n**TabulaPro** is no different from the original Tabula to code. Turn your current tabula-py code to TabulaPro compatible with **`flavor=\"TabulaPro\"`** or **`tabulapro=True`** in read_pdf() to process images or scanned PDFs\".\n\n\n## Installation \n> \u00f0\u0178\u2019\u00a1 ***ProTip**: [ExtractTable-py](https://github.com/ExtractTable/ExtractTable-py) is the official library, FASTER than this wrapper, has NO software dependencies.* \n\n\nAs the library itself is dependent on Tabula which has software dependencies, the developer is expected to install [them](https://github.com/chezou/tabula-py#requirements)*, to use the regular Tabula flavors *(\"stream\", \"lattice\")* along with \"TabulaPro\". \n\n\n### Using pip \nAfter installing software dependencies, you can simply use pip to install TabulaPro: \n\n $ pip install -U TabulaPro \n\n\n## Prerequisites\n\nThe developer needs an **api_key** ([free credits here](https://extracttable.com/tabulapro.html)) to use TabulaPro. Each Image file or one PDF page consumes one credit to trigger the process.\n\n**api_key** should be passed through `pro_kwargs`, a `dict` type argument that accepts *api_key*, *job_id*, *dup_check*, *wait_for_output* as keys, can be used as below\n\n {\n \"api_key\": str,\n Mandatory, to trigger \"TabulaPro\" flavor, to process Scan PDFs and images, also text PDF files\n\n \"job_id\": str,\n optional, if processing a new file\n Mandatory, to retrieve the result of the already submitted file\n\n \"dup_check\": bool, default: False - to bypass the duplicate check\n Useful to handle duplicate requests, check based on the FileName\n\n \"max_wait_time\": int, default: 300\n Checks for the output every 15 seconds until successfully processed or for a maximum of 300 seconds.\n }\n\n\n\n## Let's code\n\n**Quickly validate the API key and see the number of credits attached to it**\n```python\napi_key = YOUR_API_KEY_HERE\n\nfrom tabula_pro import check_usage\nprint(check_usage(api_key))\n```\n*No error from the above code snippet run implies API Key is valid*\n\n\n**Here's how you can extract tables from Image files.** \n\n\nThe example image (*tabula-data-page-1.**PNG***) used in the code below, can be found [here](https://github.com/extracttable/tabulapro/blob/master/samples/tabula-data-page-1.PNG). Notice that *tabula-data-page-1.PNG* is the image version of the first page of Tabula's PDF example, [data.pdf](https://github.com/chezou/tabula-py/blob/master/examples/data.pdf).\n\n```python\nfrom tabula_pro import read_pdf\npro_tables = read_pdf(\n 'foo-image.jpg', \n flavor=\"tabulapro\", \n pro_kwargs={\"api_key\": api_key}\n)\n```\n\n`pro_tables` is a list of dataframes that are found in the file\n\n```python\npro_tables[0]\n```\n\n| mpg \t| cyl \t| disp \t| hp \t| drat \t| wt \t| gsec \t| VS \t| am \t| gear \t| carb \t| \t|\n|---------------------\t|------\t|------\t|-------\t|------\t|------\t|-------\t|-------\t|----\t|------\t|------\t|---\t|\n| Mazda RX4 \t| 21.0 \t| 6 \t| 160.0 \t| 110 \t| 3.90 \t| 2.620 \t| 16.46 \t| 0 \t| 1 \t| 4 \t| 4 \t|\n| Mazda RX4 Wag \t| 21.0 \t| 6 \t| 160.0 \t| 110 \t| 3.90 \t| 2.875 \t| 17.02 \t| 0 \t| 1 \t| 4 \t| 4 \t|\n| Datsun 710 \t| 22.8 \t| 4 \t| 108.0 \t| 93 \t| 3.85 \t| 2.320 \t| 18.61 \t| 1 \t| 1 \t| 4 \t| 1 \t|\n| Hornet 4 Drive \t| 21.4 \t| 6 \t| 258.0 \t| 110 \t| 3.08 \t| 3.215 \t| 19.44 \t| L \t| 0 \t| 3 \t| L \t|\n| Hornet Sportabout \t| 18.7 \t| 8 \t| 360.0 \t| 175 \t| 3.15 \t| 3.440 \t| 17.02 \t| 0 \t| 0 \t| 3 \t| 2 \t|\n| Valiant \t| 18.1 \t| 6 \t| 225.0 \t| 105 \t| 2.76 \t| 3.460 \t| 20.22 \t| 1 \t| 0 \t| 3 \t| 1 \t|\n| Duster 360 \t| 14.3 \t| 8 \t| 360.0 \t| 245 \t| 3.21 \t| 3.570 \t| 15.84 \t| 0 \t| 0 \t| 3 \t| 4 \t|\n| Mere 240D \t| 24.4 \t| 4 \t| 146.7 \t| 62 \t| 3.69 \t| 3.190 \t| 20.00 \t| 1 \t| 0 \t| 4 \t| 2 \t|\n| Mere 230 \t| 22.8 \t| 4 \t| 140.8 \t| 95 \t| 3.92 \t| 3.150 \t| 22.90 \t| 1 \t| 0 \t| 4 \t| 2 \t|\n| Merc 280 \t| 19.2 \t| 6 \t| 167.6 \t| 123 \t| 3.92 \t| 3.440 \t| 18.30 \t| 1 \t| 0 \t| 4 \t| 4 \t|\n| Merc 280C \t| 17.8 \t| 6 \t| 167.6 \t| 123 \t| 3.92 \t| 3.440 \t| 18.90 \t| 1 \t| 0 \t| 4 \t| 4 \t|\n| Mere 450SE \t| 16.4 \t| 8 \t| 275.8 \t| 180 \t| 3.07 \t| 4.070 \t| 17.40 \t| 0 \t| 0 \t| 3 \t| 3 \t|\n| Merc 450SL \t| 17.3 \t| 8 \t| 275.8 \t| 180 \t| 3.07 \t| 3.730 \t| 17.60 \t| 0 \t| 0 \t| 3 \t| 3 \t|\n| Merc 450SLC \t| 15.2 \t| 8 \t| 275.8 \t| 180 \t| 3.07 \t| 3.780 \t| 18.00 \t| 0 \t| 0 \t| 3 \t| 3 \t|\n| Cadillac Fleetwood \t| 10.4 \t| 8 \t| 472.0 \t| 205 \t| 2.93 \t| 5.250 \t| 17.98 \t| 0 \t| 0 \t| 3 \t| 4 \t|\n| Lincoln Continental \t| 10.4 \t| 8 \t| 460.0 \t| 215 \t| 3.00 \t| 5.424 \t| 17.82 \t| 0 \t| 0 \t| 3 \t| 4 \t|\n| Chrysler Imperial \t| 14.7 \t| 8 \t| 440.0 \t| 230 \t| 3.23 \t| 5.345 \t| 17.42 \t| 0 \t| 0 \t| 3 \t| 4 \t|\n| Fiat 128 \t| 32.4 \t| 4 \t| 78.7 \t| 66 \t| 4.08 \t| 2.200 \t| 19.47 \t| 1 \t| 1 \t| 4 \t| 1 \t|\n| Honda Civic \t| 30.4 \t| 4 \t| 75.7 \t| 52 \t| 4.93 \t| 1.615 \t| 18.52 \t| 1 \t| 1 \t| 4 \t| 2 \t|\n| Toyota Corolla \t| 33.9 \t| 4 \t| 71.1 \t| 65 \t| 4.22 \t| 1.835 \t| 19.90 \t| L \t| 1 \t| 4 \t| L \t|\n| Toyota Corona \t| 21.5 \t| 4 \t| 120.1 \t| 97 \t| 3.70 \t| 2.465 \t| 20.01 \t| 1 \t| 0 \t| 3 \t| 1 \t|\n| Dodge Challenger \t| 15.5 \t| 8 \t| 318.0 \t| 150 \t| 2.76 \t| 3.520 \t| 16.87 \t| 0 \t| 0 \t| 3 \t| 2 \t|\n| AMC Javelin \t| 15.2 \t| 8 \t| 304.0 \t| 150 \t| 3.15 \t| 3.435 \t| 17.30 \t| 0 \t| 0 \t| 3 \t| 2 \t|\n| Camaro 728 \t| 13.3 \t| 8 \t| 350.0 \t| 245 \t| 3.73 \t| 3.840 \t| 15.41 \t| 0 \t| 0 \t| 3 \t| 4 \t|\n| Pontiac Firebird \t| 19.2 \t| 8 \t| 400.0 \t| 175 \t| 3.08 \t| 3.845 \t| 17.05 \t| 0 \t| 0 \t| 3 \t| 2 \t|\n| Fiat X1-9 \t| 27.3 \t| 4 \t| 79.0 \t| 66 \t| 4.08 \t| 1.935 \t| 18.90 \t| 1 \t| 1 \t| 4 \t| 1 \t|\n| Porsche 914-2 \t| 26.0 \t| 4 \t| 120.3 \t| 91 \t| 4.43 \t| 2.140 \t| 16.70 \t| 0 \t| 1 \t| 5 \t| 2 \t|\n| Lotus Europa \t| 30.4 \t| 4 \t| 95.1 \t| 113 \t| 3.77 \t| 1.513 \t| 16.90 \t| L \t| 1 \t| 5 \t| 2 \t|\n| Ford Pantera L \t| 15.8 \t| 8 \t| 351.0 \t| 264 \t| 4.22 \t| 3.170 \t| 14.50 \t| 0 \t| 1 \t| 5 \t| 4 \t|\n| Ferrari Dino \t| 19.7 \t| 6 \t| 145.0 \t| 175 \t| 3.62 \t| 2.770 \t| 15.50 \t| 0 \t| 1 \t| 5 \t| 6 \t|\n| Maserati Bora \t| 15.0 \t| 8 \t| 301.0 \t| 335 \t| 3.54 \t| 3.570 \t| 14.60 \t| 0 \t| 1 \t| 5 \t| 8 \t|\n| Volyo 142F \t| 21.4 \t| 4 \t| 121.0 \t| 109 \t| 4.11 \t| 2.780 \t| 18.60 \t| 1 \t| 1 \t| 4 \t| 2 \t|\n\n\nMost of the image files are processed under 5 seconds. At times a blurry/big/bad image processing may take up to 15 seconds and the PDF file depends on the page count. In these cases, the process waits for a maximum of 300 seconds to check the job status every 15 seconds until a process ends successfully to return a final response.\n\n\n> ***ProTip**: To have more control on the process wait time checkout [ExtractTable-py](https://github.com/ExtractTable/ExtractTable-py)*\n\n\n## Pull Requests & Rewards\n\nPull requests are most welcome and greatly appreciated with API credits. \n\n\n## License \n\nThis project is licensed under the Apache License 2.0, see the [LICENSE](https://github.com/extracttable/tabulapro/blob/master/LICENSE) file for details.\n\n\n## Credits\n\nLast but not least, we want to be thankful to the contributors of [tabula-py](https://github.com/chezou/tabula-py#contributors)\n\n\n## Social Media\nFollow us on Social media for library updates and free credits.\n\n[![Image](https://cdn3.iconfinder.com/data/icons/socialnetworking/32/linkedin.png)](https://www.linkedin.com/company/extracttable)\n    \n[![Image](https://abs.twimg.com/favicons/twitter.ico)](https://twitter.com/extracttable)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ExtractTable/tabulapro", "keywords": "", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "TabulaPro", "package_url": "https://pypi.org/project/TabulaPro/", "platform": "", "project_url": "https://pypi.org/project/TabulaPro/", "project_urls": { "Homepage": "https://github.com/ExtractTable/tabulapro" }, "release_url": "https://pypi.org/project/TabulaPro/1.2.0/", "requires_dist": [ "ExtractTable (>=1.2.0)", "tabula-py (>=1.4.1)" ], "requires_python": "", "summary": "TabulaPro is a layer on tabula-py library to extract tables from Scan PDFs and Images.", "version": "1.2.0" }, "last_serial": 6004899, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "9e5210604cd04156cd1bbb2557847eca", "sha256": "65b73f8ca9b1a3d16afea7972fa79f056b7a719245a1e072d9c8051b88225cbd" }, "downloads": -1, "filename": "TabulaPro-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9e5210604cd04156cd1bbb2557847eca", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12944, "upload_time": "2019-10-20T17:53:34", "url": "https://files.pythonhosted.org/packages/26/94/8e44d8c01c321087646f879526e014355e79328bc927532da4e260afdcc8/TabulaPro-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1807c1a6916da3c4bf6f45e2d2d08342", "sha256": "3dffc0586b51f0bcf7ab4c2a51b19aa70da9a7cc68e445be4b07e54d3c00099d" }, "downloads": -1, "filename": "TabulaPro-0.0.1.tar.gz", "has_sig": false, "md5_digest": "1807c1a6916da3c4bf6f45e2d2d08342", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8032, "upload_time": "2019-10-20T17:53:39", "url": "https://files.pythonhosted.org/packages/01/c9/54f08a5f541aad139bba9ae6d3b4c743966f2001f95074fd91fd2e899ab1/TabulaPro-0.0.1.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "bdcb792739e40ad4b58061bafdac22ac", "sha256": "406116a065bc11b2ca6a32e0061ff09221ce81cde196959362d7039db8232763" }, "downloads": -1, "filename": "TabulaPro-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "bdcb792739e40ad4b58061bafdac22ac", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12941, "upload_time": "2019-10-20T17:59:02", "url": "https://files.pythonhosted.org/packages/8b/61/222a5c89726ef2beffe59880266adf0221dc1148691683613aaaffbbff09/TabulaPro-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c6ad94b51218da0c4c663a5fe5b6368c", "sha256": "f335acc7ab6c5386240e8252771261c1c20f88e165199ad6dcb6ffbe6a91d67a" }, "downloads": -1, "filename": "TabulaPro-1.0.0.tar.gz", "has_sig": false, "md5_digest": "c6ad94b51218da0c4c663a5fe5b6368c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8031, "upload_time": "2019-10-20T17:59:03", "url": "https://files.pythonhosted.org/packages/8f/bf/b8cc388d0deb4d148585dc2357cc18c2d10642000241436d3d1a532f3c6d/TabulaPro-1.0.0.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "16bd472f708008e7c8f4c8fa2051b8dc", "sha256": "d9fde0850e27023b042b17c097a33667678a89b47b1b1b09eb74b280e5fa0912" }, "downloads": -1, "filename": "TabulaPro-1.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "16bd472f708008e7c8f4c8fa2051b8dc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12988, "upload_time": "2019-10-20T22:41:18", "url": "https://files.pythonhosted.org/packages/b2/02/d0653f30fd9b3ec0759f64380dc821c80f8ffe576477e41956c7fe2a376d/TabulaPro-1.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fb41c5fd2aea051017dc6d9492b19e47", "sha256": "595984b83b97c08f45e1a318619226877901f8e045d6cfd2e35da6fdaef20ef2" }, "downloads": -1, "filename": "TabulaPro-1.2.0.tar.gz", "has_sig": false, "md5_digest": "fb41c5fd2aea051017dc6d9492b19e47", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6622, "upload_time": "2019-10-20T22:41:19", "url": "https://files.pythonhosted.org/packages/bd/8f/afe060511e5023d7a6d2c240e2a1eafdf18fa498dc4e7400cacd977dcdee/TabulaPro-1.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "16bd472f708008e7c8f4c8fa2051b8dc", "sha256": "d9fde0850e27023b042b17c097a33667678a89b47b1b1b09eb74b280e5fa0912" }, "downloads": -1, "filename": "TabulaPro-1.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "16bd472f708008e7c8f4c8fa2051b8dc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12988, "upload_time": "2019-10-20T22:41:18", "url": "https://files.pythonhosted.org/packages/b2/02/d0653f30fd9b3ec0759f64380dc821c80f8ffe576477e41956c7fe2a376d/TabulaPro-1.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fb41c5fd2aea051017dc6d9492b19e47", "sha256": "595984b83b97c08f45e1a318619226877901f8e045d6cfd2e35da6fdaef20ef2" }, "downloads": -1, "filename": "TabulaPro-1.2.0.tar.gz", "has_sig": false, "md5_digest": "fb41c5fd2aea051017dc6d9492b19e47", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6622, "upload_time": "2019-10-20T22:41:19", "url": "https://files.pythonhosted.org/packages/bd/8f/afe060511e5023d7a6d2c240e2a1eafdf18fa498dc4e7400cacd977dcdee/TabulaPro-1.2.0.tar.gz" } ] }