{ "info": { "author": "Fayez Zouheiry", "author_email": "iamfayez@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: POSIX", "Programming Language :: Cython", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Multimedia :: Graphics :: Capture :: Scanners", "Topic :: Multimedia :: Graphics :: Graphics Conversion", "Topic :: Scientific/Engineering :: Image Recognition" ], "description": "=========\ntesserocr\n=========\n\nA simple, |Pillow|_-friendly,\nwrapper around the ``tesseract-ocr`` API for Optical Character Recognition\n(OCR).\n\n.. image:: https://travis-ci.org/sirfz/tesserocr.svg?branch=master\n :target: https://travis-ci.org/sirfz/tesserocr\n :alt: TravisCI build status\n\n.. image:: https://img.shields.io/pypi/v/tesserocr.svg?maxAge=2592000\n :target: https://pypi.python.org/pypi/tesserocr\n :alt: Latest version on PyPi\n\n.. image:: https://img.shields.io/pypi/pyversions/tesserocr.svg?maxAge=2592000\n :alt: Supported python versions\n\n**tesserocr** integrates directly with Tesseract's C++ API using Cython\nwhich allows for a simple Pythonic and easy-to-read source code. It\nenables real concurrent execution when used with Python's ``threading``\nmodule by releasing the GIL while processing an image in tesseract.\n\n**tesserocr** is designed to be |Pillow|_-friendly but can also be used\nwith image files instead.\n\n.. |Pillow| replace:: ``Pillow``\n.. _Pillow: http://python-pillow.github.io/\n\nRequirements\n============\n\nRequires libtesseract (>=3.04) and libleptonica (>=1.71).\n\nOn Debian/Ubuntu:\n\n::\n\n $ apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config\n\nYou may need to `manually compile tesseract`_ for a more recent version. Note that you may need\nto update your ``LD_LIBRARY_PATH`` environment variable to point to the right library versions in\ncase you have multiple tesseract/leptonica installations.\n\n|Cython|_ (>=0.23) is required for building and optionally |Pillow|_ to support ``PIL.Image`` objects.\n\n.. _manually compile tesseract: https://github.com/tesseract-ocr/tesseract/wiki/Compiling\n.. |Cython| replace:: ``Cython``\n.. _Cython: http://cython.org/\n\nInstallation\n============\nLinux and BSD/MacOS\n-------------------\n::\n\n $ pip install tesserocr\n\nThe setup script attempts to detect the include/library dirs (via |pkg-config|_ if available) but you\ncan override them with your own parameters, e.g.:\n\n::\n\n $ CPPFLAGS=-I/usr/local/include pip install tesserocr\n\nor\n\n::\n\n $ python setup.py build_ext -I/usr/local/include\n\nTested on Linux and BSD/MacOS\n\n.. |pkg-config| replace:: **pkg-config**\n.. _pkg-config: https://pkgconfig.freedesktop.org/\n\nWindows\n-------\n\nThe proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. This means that no additional installation of tesseract is required on your system.\n\nConda\n`````\n\nYou can use the channel `simonflueckiger `_ to install from Conda:\n\n::\n\n > conda install -c simonflueckiger tesserocr\n\nor to get **tesserocr** compiled with **tesseract 4.0.0**:\n\n::\n\n > conda install -c simonflueckiger/label/tesseract-4.0.0-master tesserocr\n\npip\n```\n\nDownload the wheel file corresponding to your Windows platform and Python installation from `simonflueckiger/tesserocr-windows_build/releases `_ and install them via:\n\n::\n\n > pip install .whl\n\nUsage\n=====\n\nInitialize and re-use the tesseract API instance to score multiple\nimages:\n\n.. code:: python\n\n from tesserocr import PyTessBaseAPI\n\n images = ['sample.jpg', 'sample2.jpg', 'sample3.jpg']\n\n with PyTessBaseAPI() as api:\n for img in images:\n api.SetImageFile(img)\n print(api.GetUTF8Text())\n print(api.AllWordConfidences())\n # api is automatically finalized when used in a with-statement (context manager).\n # otherwise api.End() should be explicitly called when it's no longer needed.\n\n``PyTessBaseAPI`` exposes several tesseract API methods. Make sure you\nread their docstrings for more info.\n\nBasic example using available helper functions:\n\n.. code:: python\n\n import tesserocr\n from PIL import Image\n\n print(tesserocr.tesseract_version()) # print tesseract-ocr version\n print(tesserocr.get_languages()) # prints tessdata path and list of available languages\n\n image = Image.open('sample.jpg')\n print(tesserocr.image_to_text(image)) # print ocr text from image\n # or\n print(tesserocr.file_to_text('sample.jpg'))\n\n``image_to_text`` and ``file_to_text`` can be used with ``threading`` to\nconcurrently process multiple images which is highly efficient.\n\nAdvanced API Examples\n---------------------\n\nGetComponentImages example:\n```````````````````````````\n\n.. code:: python\n\n from PIL import Image\n from tesserocr import PyTessBaseAPI, RIL\n\n image = Image.open('/usr/src/tesseract/testing/phototest.tif')\n with PyTessBaseAPI() as api:\n api.SetImage(image)\n boxes = api.GetComponentImages(RIL.TEXTLINE, True)\n print('Found {} textline image components.'.format(len(boxes)))\n for i, (im, box, _, _) in enumerate(boxes):\n # im is a PIL image object\n # box is a dict with x, y, w and h keys\n api.SetRectangle(box['x'], box['y'], box['w'], box['h'])\n ocrResult = api.GetUTF8Text()\n conf = api.MeanTextConf()\n print(u\"Box[{0}]: x={x}, y={y}, w={w}, h={h}, \"\n \"confidence: {1}, text: {2}\".format(i, conf, ocrResult, **box))\n\nOrientation and script detection (OSD):\n```````````````````````````````````````\n\n.. code:: python\n\n from PIL import Image\n from tesserocr import PyTessBaseAPI, PSM\n\n with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:\n image = Image.open(\"/usr/src/tesseract/testing/eurotext.tif\")\n api.SetImage(image)\n api.Recognize()\n\n it = api.AnalyseLayout()\n orientation, direction, order, deskew_angle = it.Orientation()\n print(\"Orientation: {:d}\".format(orientation))\n print(\"WritingDirection: {:d}\".format(direction))\n print(\"TextlineOrder: {:d}\".format(order))\n print(\"Deskew angle: {:.4f}\".format(deskew_angle))\n\nor more simply with ``OSD_ONLY`` page segmentation mode:\n\n.. code:: python\n\n from tesserocr import PyTessBaseAPI, PSM\n\n with PyTessBaseAPI(psm=PSM.OSD_ONLY) as api:\n api.SetImageFile(\"/usr/src/tesseract/testing/eurotext.tif\")\n\n os = api.DetectOS()\n print(\"Orientation: {orientation}\\nOrientation confidence: {oconfidence}\\n\"\n \"Script: {script}\\nScript confidence: {sconfidence}\".format(**os))\n\nmore human-readable info with tesseract 4+ (demonstrates LSTM engine usage):\n\n.. code:: python\n\n from tesserocr import PyTessBaseAPI, PSM, OEM\n\n with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:\n api.SetImageFile(\"/usr/src/tesseract/testing/eurotext.tif\")\n\n os = api.DetectOrientationScript()\n print(\"Orientation: {orient_deg}\\nOrientation confidence: {orient_conf}\\n\"\n \"Script: {script_name}\\nScript confidence: {script_conf}\".format(**os))\n\nIterator over the classifier choices for a single symbol:\n`````````````````````````````````````````````````````````\n\n.. code:: python\n\n from __future__ import print_function\n\n from tesserocr import PyTessBaseAPI, RIL, iterate_level\n\n with PyTessBaseAPI() as api:\n api.SetImageFile('/usr/src/tesseract/testing/phototest.tif')\n api.SetVariable(\"save_blob_choices\", \"T\")\n api.SetRectangle(37, 228, 548, 31)\n api.Recognize()\n\n ri = api.GetIterator()\n level = RIL.SYMBOL\n for r in iterate_level(ri, level):\n symbol = r.GetUTF8Text(level) # r == ri\n conf = r.Confidence(level)\n if symbol:\n print(u'symbol {}, conf: {}'.format(symbol, conf), end='')\n indent = False\n ci = r.GetChoiceIterator()\n for c in ci:\n if indent:\n print('\\t\\t ', end='')\n print('\\t- ', end='')\n choice = c.GetUTF8Text() # c == ci\n print(u'{} conf: {}'.format(choice, c.Confidence()))\n indent = True\n print('---------------------------------------------')", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sirfz/tesserocr", "keywords": "Tesseract,tesseract-ocr,OCR,optical character recognition,PIL,Pillow,Cython", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "tesserocr", "package_url": "https://pypi.org/project/tesserocr/", "platform": "", "project_url": "https://pypi.org/project/tesserocr/", "project_urls": { "Homepage": "https://github.com/sirfz/tesserocr" }, "release_url": "https://pypi.org/project/tesserocr/2.4.1/", "requires_dist": null, "requires_python": "", "summary": "A simple, Pillow-friendly, Python wrapper around tesseract-ocr API using Cython", "version": "2.4.1" }, "last_serial": 5721506, "releases": { "1.2.1rc2": [ { "comment_text": "", "digests": { "md5": "9c1f57d72adf762d6cb4ff1cf4212bff", "sha256": "a11152100583492c697edee5ae239d66d856ff42aa253704455781412305b347" }, "downloads": -1, "filename": "tesserocr-1.2.1rc2.tar.gz", "has_sig": false, "md5_digest": "9c1f57d72adf762d6cb4ff1cf4212bff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57788, "upload_time": "2015-12-29T13:55:44", "url": "https://files.pythonhosted.org/packages/c4/3f/ee98bf1bd27fda8d2d2a4685cb8c27045cbb06f7dc22afdb50f71969f109/tesserocr-1.2.1rc2.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "68840541331670c01a2067f8a854822f", "sha256": "8f0ef4058c51b0ba7de7d14b0357fd00eb970762b56fb5308016bcb5382af1da" }, "downloads": -1, "filename": "tesserocr-2.0.0.tar.gz", "has_sig": false, "md5_digest": "68840541331670c01a2067f8a854822f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 147749, "upload_time": "2016-01-10T23:07:28", "url": "https://files.pythonhosted.org/packages/00/da/312caf79300fa0add6278a8ab8c133a70ac8912a287055a5fa37d13a42c9/tesserocr-2.0.0.tar.gz" } ], "2.1.1": [ { "comment_text": "", "digests": { "md5": "e5fd52582b2f89c11c1a373f80d25fbe", "sha256": "f7edf01c67667d660c2423b085d50e7d84683b68e234b76f4a5e3a06a1a59c26" }, "downloads": -1, "filename": "tesserocr-2.1.1.tar.gz", "has_sig": false, "md5_digest": "e5fd52582b2f89c11c1a373f80d25fbe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47040, "upload_time": "2016-06-03T21:23:32", "url": "https://files.pythonhosted.org/packages/6e/7b/1962b65567c1643f457f1255f862fdb50c99eb33aabd91c2d8e2c223ab16/tesserocr-2.1.1.tar.gz" } ], "2.1.2": [ { "comment_text": "", "digests": { "md5": "e2de9d5e6f32e12a8c42c5038d877f59", "sha256": "93748442c9b3fc499ac5061b7e188b89c37212ad07f348064e4b3d079d80bd75" }, "downloads": -1, "filename": "tesserocr-2.1.2.tar.gz", "has_sig": false, "md5_digest": "e2de9d5e6f32e12a8c42c5038d877f59", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 47277, "upload_time": "2016-06-08T12:01:51", "url": "https://files.pythonhosted.org/packages/48/9c/16ee28e2cd47dd3016eab35f131175baac5fdbf0e50ba7f7b07bac226f8f/tesserocr-2.1.2.tar.gz" } ], "2.1.3": [ { "comment_text": "", "digests": { "md5": "9fb6e8e6d1e1a7a5faa660b12d1b18fa", "sha256": "3a75c702f6e787ea4f16f26a9909910b731bc388b415b3b54d42e766b4f481ae" }, "downloads": -1, "filename": "tesserocr-2.1.3.tar.gz", "has_sig": false, "md5_digest": "9fb6e8e6d1e1a7a5faa660b12d1b18fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 49415, "upload_time": "2016-11-12T14:03:13", "url": "https://files.pythonhosted.org/packages/ee/4b/a1627425460855f5aa83d072094349a2fe4550443faabcfdefccb212236c/tesserocr-2.1.3.tar.gz" } ], "2.2.0": [ { "comment_text": "", "digests": { "md5": "89d381b549ad6f2979f01713004ec960", "sha256": "989fa2541c55370d3779fcb0b3646aa3b43045b4a72799b6ccc44f72517f99ce" }, "downloads": -1, "filename": "tesserocr-2.2.0.tar.gz", "has_sig": false, "md5_digest": "89d381b549ad6f2979f01713004ec960", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52905, "upload_time": "2017-05-28T15:36:18", "url": "https://files.pythonhosted.org/packages/dc/30/dd254d6e13ed097d4330bf4bec6d1f038af4deeec62488f65fb5b74ad15f/tesserocr-2.2.0.tar.gz" } ], "2.2.1": [ { "comment_text": "", "digests": { "md5": "62e80889bfb616538296dd8196c8207e", "sha256": "1805e9285246fec02e45ff76c7b0c13c4de388fa0d900d364c99e3928edf061c" }, "downloads": -1, "filename": "tesserocr-2.2.1.tar.gz", "has_sig": false, "md5_digest": "62e80889bfb616538296dd8196c8207e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 52923, "upload_time": "2017-05-31T19:48:43", "url": "https://files.pythonhosted.org/packages/c9/bc/f10f88a2d8a421ff565624a2759de16078980b9384da59198ebf4b0444ac/tesserocr-2.2.1.tar.gz" } ], "2.2.2": [ { "comment_text": "", "digests": { "md5": "e6c9c8f6f6720e16cd612146e20e7feb", "sha256": "f245c85643acaa6c740885c2cb18c6ca2ddb569756fbdeb10bd85a08c0a9fff2" }, "downloads": -1, "filename": "tesserocr-2.2.2.tar.gz", "has_sig": false, "md5_digest": "e6c9c8f6f6720e16cd612146e20e7feb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 53567, "upload_time": "2017-07-26T19:02:14", "url": "https://files.pythonhosted.org/packages/cf/0d/9e554f041962b8dd7acd978330535fed879452bb0af257c287ca4ae9c525/tesserocr-2.2.2.tar.gz" } ], "2.3.0": [ { "comment_text": "", "digests": { "md5": "38ae68684969b164509f84fb1bd0dbdb", "sha256": "de1c57808f4a3ca59540750114537246685f447a11b0be1abb15b51680189719" }, "downloads": -1, "filename": "tesserocr-2.3.0.tar.gz", "has_sig": false, "md5_digest": "38ae68684969b164509f84fb1bd0dbdb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 54946, "upload_time": "2018-06-26T15:53:10", "url": "https://files.pythonhosted.org/packages/49/fc/f5d3ca8606566073919a5964118591734e91ee8e42adcc0cc8fcefc3953a/tesserocr-2.3.0.tar.gz" } ], "2.3.1": [ { "comment_text": "", "digests": { "md5": "99e2001affe861ae3a5aa2e9f233e2d7", "sha256": "72ac9b6bd2c6b76c9c512494824d3d741bc314abbf2ae9b91d128ba81fdffd11" }, "downloads": -1, "filename": "tesserocr-2.3.1.tar.gz", "has_sig": false, "md5_digest": "99e2001affe861ae3a5aa2e9f233e2d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 54996, "upload_time": "2018-08-13T17:35:57", "url": "https://files.pythonhosted.org/packages/f8/6d/4e81e041f33a4419e59edcb1dbdf3c56e9393f60f5ef531381bd67a1339b/tesserocr-2.3.1.tar.gz" } ], "2.4.0": [ { "comment_text": "", "digests": { "md5": "399d92e6be925c40a2ec7072cbb43851", "sha256": "ba60466fc91835656d4b4f93deb4ba7543d1588722ec4e0b84cc694500373b85" }, "downloads": -1, "filename": "tesserocr-2.4.0.tar.gz", "has_sig": false, "md5_digest": "399d92e6be925c40a2ec7072cbb43851", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56278, "upload_time": "2018-12-05T14:43:10", "url": "https://files.pythonhosted.org/packages/92/2d/05a7f8387e93c192919b508e4f4936f232bd3d2ca388b9130ae538a9f9ad/tesserocr-2.4.0.tar.gz" } ], "2.4.1": [ { "comment_text": "", "digests": { "md5": "4bcbe85411c0f6de5f3f178a27592118", "sha256": "8becfe0542a1a118fe92f9b2c1fc0ded9c90ce12a8374b3de33a9e20179fa546" }, "downloads": -1, "filename": "tesserocr-2.4.1.tar.gz", "has_sig": false, "md5_digest": "4bcbe85411c0f6de5f3f178a27592118", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56844, "upload_time": "2019-08-23T16:06:35", "url": "https://files.pythonhosted.org/packages/97/14/2297039bd74baf2ce970549d4fec383fb6e435d0a48f79813f195834937f/tesserocr-2.4.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4bcbe85411c0f6de5f3f178a27592118", "sha256": "8becfe0542a1a118fe92f9b2c1fc0ded9c90ce12a8374b3de33a9e20179fa546" }, "downloads": -1, "filename": "tesserocr-2.4.1.tar.gz", "has_sig": false, "md5_digest": "4bcbe85411c0f6de5f3f178a27592118", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56844, "upload_time": "2019-08-23T16:06:35", "url": "https://files.pythonhosted.org/packages/97/14/2297039bd74baf2ce970549d4fec383fb6e435d0a48f79813f195834937f/tesserocr-2.4.1.tar.gz" } ] }