{ "info": { "author": "Matthias Lee", "author_email": "pytesseract@madmaze.net", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "Python Tesseract\n================\n\n.. image:: https://travis-ci.org/madmaze/pytesseract.svg\n :target: https://travis-ci.org/madmaze/pytesseract\n :alt: Travis build status\n\n.. image:: https://img.shields.io/pypi/pyversions/pytesseract.svg\n :target: https://pypi.python.org/pypi/pytesseract\n :alt: Python versions\n\n.. image:: https://img.shields.io/pypi/v/pytesseract.svg\n :target: https://pypi.python.org/pypi/pytesseract\n :alt: PyPI release\n\n.. image:: \thttps://img.shields.io/github/release/madmaze/pytesseract.svg\n :target: https://github.com/madmaze/pytesseract/releases\n :alt: Github release\n\n.. image:: https://anaconda.org/conda-forge/pytesseract/badges/version.svg\n :target: https://anaconda.org/conda-forge/pytesseract\n :alt: Conda release\n\nPython-tesseract is an optical character recognition (OCR) tool for python.\nThat is, it will recognize and \"read\" the text embedded in images.\n\nPython-tesseract is a wrapper for `Google's Tesseract-OCR Engine `_.\nIt is also useful as a stand-alone invocation script to tesseract, as it can read all image types\nsupported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff,\nand others. Additionally, if used as a script, Python-tesseract will print the recognized\ntext instead of writing it to a file.\n\nUSAGE\n-----\n\n**Quickstart**\n\n*Note*: Test images are located in the ``tests/data`` folder of the Git repo.\n\n.. code-block:: python\n\n try:\n from PIL import Image\n except ImportError:\n import Image\n import pytesseract\n\n # If you don't have tesseract executable in your PATH, include the following:\n pytesseract.pytesseract.tesseract_cmd = r''\n # Example tesseract_cmd = r'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'\n\n # Simple image to string\n print(pytesseract.image_to_string(Image.open('test.png')))\n\n # French text image to string\n print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))\n\n # In order to bypass the image conversions of pytesseract, just use relative or absolute image path\n # NOTE: In this case you should provide tesseract supported images or tesseract will return error\n print(pytesseract.image_to_string('test.png'))\n\n # Batch processing with a single file containing the list of multiple image file paths \n print(pytesseract.image_to_string('images.txt'))\n\n # Get bounding box estimates\n print(pytesseract.image_to_boxes(Image.open('test.png')))\n\n # Get verbose data including boxes, confidences, line and page numbers\n print(pytesseract.image_to_data(Image.open('test.png')))\n\n # Get information about orientation and script detection\n print(pytesseract.image_to_osd(Image.open('test.png')))\n\n # Get a searchable PDF\n pdf = pytesseract.image_to_pdf_or_hocr('test.png', extension='pdf')\n\n # Get HOCR output\n hocr = pytesseract.image_to_pdf_or_hocr('test.png', extension='hocr')\n\nSupport for OpenCV image/NumPy array objects\n\n.. code-block:: python\n\n import cv2\n\n img = cv2.imread(r'//digits.png')\n print(pytesseract.image_to_string(img))\n # OR explicit beforehand converting\n print(pytesseract.image_to_string(Image.fromarray(img))\n\nIf you need custom configuration like `oem`/`psm`, use the **config** keyword. \n\n.. code-block:: python\n\n # Example of adding any additional options.\n custom_oem_psm_config = r'--oem 3 --psm 6'\n pytesseract.image_to_string(image, config=custom_oem_psm_config)\n\nAdd the following config, if you have tessdata error like: \"Error opening data file...\"\n\n.. code-block:: python\n\n # Example config: r'--tessdata-dir \"C:\\Program Files (x86)\\Tesseract-OCR\\tessdata\"'\n # It's important to add double quotes around the dir path.\n tessdata_dir_config = r'--tessdata-dir \"\"'\n pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)\n\n**Functions**\n\n* **get_tesseract_version** Returns the Tesseract version installed in the system.\n\n* **image_to_string** Returns the result of a Tesseract OCR run on the image to string\n\n* **image_to_boxes** Returns result containing recognized characters and their box boundaries\n\n* **image_to_data** Returns result containing box boundaries, confidences, and other information. Requires Tesseract 3.05+. For more information, please check the `Tesseract TSV documentation `_\n\n* **image_to_osd** Returns result containing information about orientation and script detection.\n\n* **run_and_get_output** Returns the raw output from Tesseract OCR. Gives a bit more control over the parameters that are sent to tesseract.\n\n**Parameters**\n\n``image_to_data(image, lang=None, config='', nice=0, output_type=Output.STRING, timeout=0)``\n\n* **image** Object, PIL Image/NumPy array of the image to be processed by Tesseract\n\n* **lang** String, Tesseract language code string\n\n* **config** String, Any additional configurations as a string, ex: ``config='--psm 6'``\n\n* **nice** Integer, modifies the processor priority for the Tesseract run. Not supported on Windows. Nice adjusts the niceness of unix-like processes.\n\n* **output_type** Class attribute, specifies the type of the output, defaults to ``string``. For the full list of all supported types, please check the definition of `pytesseract.Output `_ class.\n\n* **timeout** Integer or Float, duration in seconds for the OCR processing, after which, pytesseract will terminate and raise RuntimeError.\n\n\nINSTALLATION\n------------\n\nPrerequisites:\n\n- Python-tesseract requires Python 2.7 or Python 3.5+\n- You will need the Python Imaging Library (PIL) (or the `Pillow `_ fork).\n Under Debian/Ubuntu, this is the package **python-imaging** or **python3-imaging**.\n- Install `Google Tesseract OCR `_\n (additional info how to install the engine on Linux, Mac OSX and Windows).\n You must be able to invoke the tesseract command as *tesseract*. If this\n isn't the case, for example because tesseract isn't in your PATH, you will\n have to change the \"tesseract_cmd\" variable ``pytesseract.pytesseract.tesseract_cmd``.\n Under Debian/Ubuntu you can use the package **tesseract-ocr**.\n For Mac OS users. please install homebrew package **tesseract**.\n\n| Installing via pip:\n\nCheck the `pytesseract package page `_ for more information.\n\n.. code-block:: bash\n\n $ (env)> pip install pytesseract\n\n| Or if you have git installed:\n\n.. code-block:: bash\n\n $ (env)> pip install -U git+https://github.com/madmaze/pytesseract.git\n\n| Installing from source:\n\n.. code-block:: bash\n\n $> git clone https://github.com/madmaze/pytesseract.git\n $ (env)> cd pytesseract && pip install -U .\n\n| Install with conda (via `conda-forge `_):\n\n.. code-block:: bash\n\n $> conda install -c conda-forge pytesseract\n\nTESTING\n-------\n\nTo run this project's test suite, install and run ``tox``. Ensure that you have ``tesseract``\ninstalled and in your PATH.\n\n.. code-block:: bash\n\n $ (env)> pip install tox\n $ (env)> tox\n\nLICENSE\n-------\nPython-tesseract is released under the GPL v3.\n\nCONTRIBUTORS\n------------\n- Originally written by `Samuel Hoffstaetter `_\n- `Juarez Bochi `_\n- `Matthias Lee `_\n- `Lars Kistner `_\n- `Ryan Mitchell `_\n- `Emilio Cecchini `_\n- `John Hagen `_\n- `Darius Morawiec `_\n- `Eddie Bedada `_\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/madmaze/python-tesseract", "keywords": "python-tesseract OCR Python", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "pytesseract", "package_url": "https://pypi.org/project/pytesseract/", "platform": "", "project_url": "https://pypi.org/project/pytesseract/", "project_urls": { "Homepage": "https://github.com/madmaze/python-tesseract" }, "release_url": "https://pypi.org/project/pytesseract/0.3.0/", "requires_dist": null, "requires_python": "", "summary": "Python-tesseract is a python wrapper for Google's Tesseract-OCR", "version": "0.3.0" }, "last_serial": 5718106, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "fb0a1e146f88b74fd9a88786e4cc8ef4", "sha256": "24812aee6df24149b0f048fff2bc886e3beb292f475d622c096824b9bf9e7d02" }, "downloads": -1, "filename": "pytesseract-0.1.tar.gz", "has_sig": false, "md5_digest": "fb0a1e146f88b74fd9a88786e4cc8ef4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3321, "upload_time": "2014-02-06T04:31:37", "url": "https://files.pythonhosted.org/packages/62/e3/84698f88eb404c6c6f8d185830543ba857fd68c4bc5c991c960cd59dde9a/pytesseract-0.1.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "712eef2669f3e19abda39a7639faa5a9", "sha256": "3d8c1484d215573466f95aa4c311e9b0c120183be2d9c842708a6d73889f7f8f" }, "downloads": -1, "filename": "pytesseract-0.1.3-py2.7.egg", "has_sig": false, "md5_digest": "712eef2669f3e19abda39a7639faa5a9", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 151754, "upload_time": "2014-08-04T21:31:39", "url": "https://files.pythonhosted.org/packages/87/42/4332790f20c68464ecb8c9db00cef33eff259a094da94ef4480a79b4fc6e/pytesseract-0.1.3-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "4c56b70e8d0ddabdb1237d39ed3e300b", "sha256": "5bdc9890b9a1fe2302c70a93ce26a9c6efc900c4057f39b65ba14ac863039b8a" }, "downloads": -1, "filename": "pytesseract-0.1.3.tar.gz", "has_sig": false, "md5_digest": "4c56b70e8d0ddabdb1237d39ed3e300b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2825, "upload_time": "2014-08-04T21:31:21", "url": "https://files.pythonhosted.org/packages/87/14/e044c1e0259cd43f28a92c82fc23b15091d72c4c50a9b2411a132fee754f/pytesseract-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "7d7785a5a42870819854a89b24937593", "sha256": "b9e8a081398b921d42cae60cbc938e8bbfae657b707ee02322ad60fba1b2d995" }, "downloads": -1, "filename": "pytesseract-0.1.4.tar.gz", "has_sig": false, "md5_digest": "7d7785a5a42870819854a89b24937593", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 148907, "upload_time": "2014-08-11T16:52:01", "url": "https://files.pythonhosted.org/packages/59/8a/0b1daf9f98950a659b630da42d07cce046741c5842f0a0943c2ede8f7374/pytesseract-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "10f0aed10ed675c7b9d9be7f2be36020", "sha256": "cbb4cde174e1ea809f4b4d1634c4b0c66317ed5fa9f5f77ff130cd008735eacc" }, "downloads": -1, "filename": "pytesseract-0.1.5.tar.gz", "has_sig": false, "md5_digest": "10f0aed10ed675c7b9d9be7f2be36020", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 149034, "upload_time": "2014-08-14T14:08:23", "url": "https://files.pythonhosted.org/packages/80/36/20a3caf4afbb25b5ff5746e5c24cd311eec4e6fe52e143ea957a8b2a425f/pytesseract-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "b7fb44c04d76cfa1c3c9bee1b481f4f3", "sha256": "319c3a47114d32813c305cf61463106fc57baf4e0d84318b938a640d1e8550cb" }, "downloads": -1, "filename": "pytesseract-0.1.6.tar.gz", "has_sig": false, "md5_digest": "b7fb44c04d76cfa1c3c9bee1b481f4f3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 149805, "upload_time": "2015-03-19T18:36:06", "url": "https://files.pythonhosted.org/packages/fc/0c/f4d4124d648484005f9f29ed21c986fe9efcce8c90604a6e9fb0de4dda5e/pytesseract-0.1.6.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "764bb4a0c7e9f85c0ee27beffb65065a", "sha256": "9cf5f088349ec29ac4770658a5c150b48f93cbdbdc7405a51d778ee7e81a1426" }, "downloads": -1, "filename": "pytesseract-0.1.7.tar.gz", "has_sig": false, "md5_digest": "764bb4a0c7e9f85c0ee27beffb65065a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 150596, "upload_time": "2017-05-29T14:59:08", "url": "https://files.pythonhosted.org/packages/29/2f/7b73fa55f3c36160d54ec08b6bdddc1d8385fe0aa1acf8dac3ef1e635516/pytesseract-0.1.7.tar.gz" } ], "0.1.8": [ { "comment_text": "", "digests": { "md5": "f73c3339891875cc73f13312b8729f7a", "sha256": "af2570f582e541843499b1ba9e4eca6baf704e49a5da4c492fc688bf2880e8ee" }, "downloads": -1, "filename": "pytesseract-0.1.8.tar.gz", "has_sig": false, "md5_digest": "f73c3339891875cc73f13312b8729f7a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 152993, "upload_time": "2018-01-21T02:41:19", "url": "https://files.pythonhosted.org/packages/6b/b1/223c46b7023b35054195e5cdba67a53c74b203d02691ee38dc8ceaa565e0/pytesseract-0.1.8.tar.gz" } ], "0.1.9": [ { "comment_text": "", "digests": { "md5": "93cb0324156ef7c320b0753a5eb25a77", "sha256": "df6ac578897471ca460f11dd1a8b49d32c1ffc2d1940dbeb3996fa19de8cb07a" }, "downloads": -1, "filename": "pytesseract-0.1.9.tar.gz", "has_sig": false, "md5_digest": "93cb0324156ef7c320b0753a5eb25a77", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 154879, "upload_time": "2018-01-31T01:16:51", "url": "https://files.pythonhosted.org/packages/36/60/26054cb04e6b22fb64719640cb25a22ae11665bec731670420338141d700/pytesseract-0.1.9.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "7b49c283971b0ec58869f555df3a184a", "sha256": "35035b0a69a789224d1e981d95453d644c95f8bf869372787f2164b9872d597f" }, "downloads": -1, "filename": "pytesseract-0.2.0.tar.gz", "has_sig": false, "md5_digest": "7b49c283971b0ec58869f555df3a184a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 154904, "upload_time": "2018-01-31T01:16:59", "url": "https://files.pythonhosted.org/packages/7c/19/93a003a1f514dd0584eba13afc1e7093c135964766c56998bc3a2efc331f/pytesseract-0.2.0.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "f92ff854713601350c5ca1e858f1e220", "sha256": "c73a0c774f7f4487a573e9793257afdf1c504ac8a2c4c8e6b0e77176ddc5a539" }, "downloads": -1, "filename": "pytesseract-0.2.2.tar.gz", "has_sig": false, "md5_digest": "f92ff854713601350c5ca1e858f1e220", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 168847, "upload_time": "2018-05-31T01:28:52", "url": "https://files.pythonhosted.org/packages/c8/bb/62ac168973155ee3277971594b8739ef9873d07fdb0835ab3da43d43541a/pytesseract-0.2.2.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "d58084683ef765699900d07e2a108bc1", "sha256": "9a9fae6331084f588c0cf2ad9ed50fca47e20429407e63389eb42d4e64460013" }, "downloads": -1, "filename": "pytesseract-0.2.4.tar.gz", "has_sig": false, "md5_digest": "d58084683ef765699900d07e2a108bc1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 169103, "upload_time": "2018-07-20T03:39:31", "url": "https://files.pythonhosted.org/packages/13/56/befaafbabb36c03e4fdbb3fea854e0aea294039308a93daf6876bf7a8d6b/pytesseract-0.2.4.tar.gz" } ], "0.2.5": [ { "comment_text": "", "digests": { "md5": "c334052fed66795c06b8528c30f7614b", "sha256": "d54d370d3c1032c7fc097b2e88bb899368a7979550e5fd6f6e0ccdf0d60e9f72" }, "downloads": -1, "filename": "pytesseract-0.2.5.tar.gz", "has_sig": false, "md5_digest": "c334052fed66795c06b8528c30f7614b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 169231, "upload_time": "2018-10-05T01:47:17", "url": "https://files.pythonhosted.org/packages/f9/4d/0cc26dbb2298080ed0f1ca848c06a1b68ab041e809f3583fe8642ee228cc/pytesseract-0.2.5.tar.gz" } ], "0.2.6": [ { "comment_text": "", "digests": { "md5": "64331868bd26bf71026b5fb859c47b36", "sha256": "11c20321595b6e2e904b594633edf1a717212b13bac7512986a2d807b8849770" }, "downloads": -1, "filename": "pytesseract-0.2.6.tar.gz", "has_sig": false, "md5_digest": "64331868bd26bf71026b5fb859c47b36", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 169452, "upload_time": "2018-12-16T02:37:50", "url": "https://files.pythonhosted.org/packages/71/5a/d7600cad26276d991feecb27f3627ae2d0ee89aa1e3065fa4f9f1f2defbc/pytesseract-0.2.6.tar.gz" } ], "0.2.7": [ { "comment_text": "", "digests": { "md5": "a2c45c38a55d98b752b4f7682bc54751", "sha256": "46363b300d6890d24782852e020c06e96344529fead98f3b9b8506c82c37db6f" }, "downloads": -1, "filename": "pytesseract-0.2.7.tar.gz", "has_sig": false, "md5_digest": "a2c45c38a55d98b752b4f7682bc54751", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 169581, "upload_time": "2019-06-19T02:40:12", "url": "https://files.pythonhosted.org/packages/1d/40/3f72d13d0f347bf688ff189b6d6bb369125c0bed9ed4b15e7f20c65123a8/pytesseract-0.2.7.tar.gz" } ], "0.2.8": [ { "comment_text": "", "digests": { "md5": "525ff765bfc145c310f8623a237d1d28", "sha256": "8de670c2b705325ce28b76b0c3d2d322d4af1a36f71b6a6529b941d86e96513c" }, "downloads": -1, "filename": "pytesseract-0.2.8.tar.gz", "has_sig": false, "md5_digest": "525ff765bfc145c310f8623a237d1d28", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20466, "upload_time": "2019-08-16T00:46:49", "url": "https://files.pythonhosted.org/packages/92/8b/c5f06d60b3c6ba284d7c04315b124462db68a49ff0ce89799c4af3078d72/pytesseract-0.2.8.tar.gz" } ], "0.2.9": [ { "comment_text": "", "digests": { "md5": "30c49323af3a8970c3f53f83842ca747", "sha256": "6fc79282411f4aef2981c5da96ebe9efc22dec7548a7d90fd58d7947d41d59d6" }, "downloads": -1, "filename": "pytesseract-0.2.9.tar.gz", "has_sig": false, "md5_digest": "30c49323af3a8970c3f53f83842ca747", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20564, "upload_time": "2019-08-16T00:47:00", "url": "https://files.pythonhosted.org/packages/79/16/27c638611384a1b7a6f8b8bb2c54c3a5a493000bc7914efec77e855938be/pytesseract-0.2.9.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "6c64625d382b346de543b462ae203145", "sha256": "ae1dce01413d1f8eb0614fd65d831e26e649dc1a31699b7275455c57aa563b59" }, "downloads": -1, "filename": "pytesseract-0.3.0.tar.gz", "has_sig": false, "md5_digest": "6c64625d382b346de543b462ae203145", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20790, "upload_time": "2019-08-23T01:04:04", "url": "https://files.pythonhosted.org/packages/47/e5/892d78db0d26372aa376fc1b127e9cd4cc158727a76e0802069115fcbd6e/pytesseract-0.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6c64625d382b346de543b462ae203145", "sha256": "ae1dce01413d1f8eb0614fd65d831e26e649dc1a31699b7275455c57aa563b59" }, "downloads": -1, "filename": "pytesseract-0.3.0.tar.gz", "has_sig": false, "md5_digest": "6c64625d382b346de543b462ae203145", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20790, "upload_time": "2019-08-23T01:04:04", "url": "https://files.pythonhosted.org/packages/47/e5/892d78db0d26372aa376fc1b127e9cd4cc158727a76e0802069115fcbd6e/pytesseract-0.3.0.tar.gz" } ] }