{ "info": { "author": "Fayez Zouheiry", "author_email": "iamfayez@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: POSIX", "Programming Language :: Cython", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Multimedia :: Graphics :: Capture :: Scanners", "Topic :: Multimedia :: Graphics :: Graphics Conversion", "Topic :: Scientific/Engineering :: Image Recognition" ], "description": "=========\ntesserocr\n=========\n\nA simple, |Pillow|_-friendly,\nwrapper around the ``tesseract-ocr`` API for Optical Character Recognition\n(OCR).\n\n.. image:: https://travis-ci.org/sirfz/tesserocr.svg?branch=master\n :target: https://travis-ci.org/sirfz/tesserocr\n :alt: TravisCI build status\n\n.. image:: https://img.shields.io/pypi/v/tesserocr.svg?maxAge=2592000\n :target: https://pypi.python.org/pypi/tesserocr\n :alt: Latest version on PyPi\n\n.. image:: https://img.shields.io/pypi/pyversions/tesserocr.svg?maxAge=2592000\n :alt: Supported python versions\n\n**tesserocr** integrates directly with Tesseract's C++ API using Cython\nwhich allows for a simple Pythonic and easy-to-read source code. It\nenables real concurrent execution when used with Python's ``threading``\nmodule by releasing the GIL while processing an image in tesseract.\n\n**tesserocr** is designed to be |Pillow|_-friendly but can also be used\nwith image files instead.\n\n.. |Pillow| replace:: ``Pillow``\n.. _Pillow: http://python-pillow.github.io/\n\nRequirements\n============\n\nRequires libtesseract (>=3.04) and libleptonica (>=1.71).\n\nOn Debian/Ubuntu:\n\n::\n\n $ apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config\n\nYou may need to `manually compile tesseract`_ for a more recent version. Note that you may need\nto update your ``LD_LIBRARY_PATH`` environment variable to point to the right library versions in\ncase you have multiple tesseract/leptonica installations.\n\n|Cython|_ (>=0.23) is required for building and optionally |Pillow|_ to support ``PIL.Image`` objects.\n\n.. _manually compile tesseract: https://github.com/tesseract-ocr/tesseract/wiki/Compiling\n.. |Cython| replace:: ``Cython``\n.. _Cython: http://cython.org/\n\nInstallation\n============\nLinux and BSD/MacOS\n-------------------\n::\n\n $ pip install tesserocr\n\nThe setup script attempts to detect the include/library dirs (via |pkg-config|_ if available) but you\ncan override them with your own parameters, e.g.:\n\n::\n\n $ CPPFLAGS=-I/usr/local/include pip install tesserocr\n\nor\n\n::\n\n $ python setup.py build_ext -I/usr/local/include\n\nTested on Linux and BSD/MacOS\n\n.. |pkg-config| replace:: **pkg-config**\n.. _pkg-config: https://pkgconfig.freedesktop.org/\n\nWindows\n-------\n\nThe proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. This means that no additional installation of tesseract is required on your system.\n\nConda\n`````\n\nYou can use the channel `simonflueckiger `_ to install from Conda:\n\n::\n\n > conda install -c simonflueckiger tesserocr\n\nor to get **tesserocr** compiled with **tesseract 4.0.0**:\n\n::\n\n > conda install -c simonflueckiger/label/tesseract-4.0.0-master tesserocr\n\npip\n```\n\nDownload the wheel file corresponding to your Windows platform and Python installation from `simonflueckiger/tesserocr-windows_build/releases `_ and install them via:\n\n::\n\n > pip install .whl\n\nUsage\n=====\n\nInitialize and re-use the tesseract API instance to score multiple\nimages:\n\n.. code:: python\n\n from tesserocr import PyTessBaseAPI\n\n images = ['sample.jpg', 'sample2.jpg', 'sample3.jpg']\n\n with PyTessBaseAPI() as api:\n for img in images:\n api.SetImageFile(img)\n print(api.GetUTF8Text())\n print(api.AllWordConfidences())\n # api is automatically finalized when used in a with-statement (context manager).\n # otherwise api.End() should be explicitly called when it's no longer needed.\n\n``PyTessBaseAPI`` exposes several tesseract API methods. Make sure you\nread their docstrings for more info.\n\nBasic example using available helper functions:\n\n.. code:: python\n\n import tesserocr\n from PIL import Image\n\n print(tesserocr.tesseract_version()) # print tesseract-ocr version\n print(tesserocr.get_languages()) # prints tessdata path and list of available languages\n\n image = Image.open('sample.jpg')\n print(tesserocr.image_to_text(image)) # print ocr text from image\n # or\n print(tesserocr.file_to_text('sample.jpg'))\n\n``image_to_text`` and ``file_to_text`` can be used with ``threading`` to\nconcurrently process multiple images which is highly efficient.\n\nAdvanced API Examples\n---------------------\n\nGetComponentImages example:\n```````````````````````````\n\n.. code:: python\n\n from PIL import Image\n from tesserocr import PyTessBaseAPI, RIL\n\n image = Image.open('/usr/src/tesseract/testing/phototest.tif')\n with PyTessBaseAPI() as api:\n api.SetImage(image)\n boxes = api.GetComponentImages(RIL.TEXTLINE, True)\n print('Found {} textline image components.'.format(len(boxes)))\n for i, (im, box, _, _) in enumerate(boxes):\n # im is a PIL image object\n # box is a dict with x, y, w and h keys\n api.SetRectangle(box['x'], box['y'], box['w'], box['h'])\n ocrResult = api.GetUTF8Text()\n conf = api.MeanTextConf()\n print(u\"Box[{0}]: x={x}, y={y}, w={w}, h={h}, \"\n \"confidence: {1}, text: {2}\".format(i, conf, ocrResult, **box))\n\nOrientation and script detection (OSD):\n```````````````````````````````````````\n\n.. code:: python\n\n from PIL import Image\n from tesserocr import PyTessBaseAPI, PSM\n\n with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:\n image = Image.open(\"/usr/src/tesseract/testing/eurotext.tif\")\n api.SetImage(image)\n api.Recognize()\n\n it = api.AnalyseLayout()\n orientation, direction, order, deskew_angle = it.Orientation()\n print(\"Orientation: {:d}\".format(orientation))\n print(\"WritingDirection: {:d}\".format(direction))\n print(\"TextlineOrder: {:d}\".format(order))\n print(\"Deskew angle: {:.4f}\".format(deskew_angle))\n\nor more simply with ``OSD_ONLY`` page segmentation mode:\n\n.. code:: python\n\n from tesserocr import PyTessBaseAPI, PSM\n\n with PyTessBaseAPI(psm=PSM.OSD_ONLY) as api:\n api.SetImageFile(\"/usr/src/tesseract/testing/eurotext.tif\")\n\n os = api.DetectOS()\n print(\"Orientation: {orientation}\\nOrientation confidence: {oconfidence}\\n\"\n \"Script: {script}\\nScript confidence: {sconfidence}\".format(**os))\n\nmore human-readable info with tesseract 4+ (demonstrates LSTM engine usage):\n\n.. code:: python\n\n from tesserocr import PyTessBaseAPI, PSM, OEM\n\n with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:\n api.SetImageFile(\"/usr/src/tesseract/testing/eurotext.tif\")\n\n os = api.DetectOrientationScript()\n print(\"Orientation: {orient_deg}\\nOrientation confidence: {orient_conf}\\n\"\n \"Script: {script_name}\\nScript confidence: {script_conf}\".format(**os))\n\nIterator over the classifier choices for a single symbol:\n`````````````````````````````````````````````````````````\n\n.. code:: python\n\n from __future__ import print_function\n\n from tesserocr import PyTessBaseAPI, RIL, iterate_level\n\n with PyTessBaseAPI() as api:\n api.SetImageFile('/usr/src/tesseract/testing/phototest.tif')\n api.SetVariable(\"save_blob_choices\", \"T\")\n api.SetRectangle(37, 228, 548, 31)\n api.Recognize()\n\n ri = api.GetIterator()\n level = RIL.SYMBOL\n for r in iterate_level(ri, level):\n symbol = r.GetUTF8Text(level) # r == ri\n conf = r.Confidence(level)\n if symbol:\n print(u'symbol {}, conf: {}'.format(symbol, conf), end='')\n indent = False\n ci = r.GetChoiceIterator()\n for c in ci:\n if indent:\n print('\\t\\t ', end='')\n print('\\t- ', end='')\n choice = c.GetUTF8Text() # c == ci\n print(u'{} conf: {}'.format(choice, c.Confidence()))\n indent = True\n print('---------------------------------------------')", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sirfz/tesserocr", "keywords": "Tesseract,tesseract-ocr,OCR,optical character recognition,PIL,Pillow,Cython", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "ocrd-fork-tesserocr", "package_url": "https://pypi.org/project/ocrd-fork-tesserocr/", "platform": "", "project_url": "https://pypi.org/project/ocrd-fork-tesserocr/", "project_urls": { "Homepage": "https://github.com/sirfz/tesserocr" }, "release_url": "https://pypi.org/project/ocrd-fork-tesserocr/3.0.0rc2/", "requires_dist": null, "requires_python": "", "summary": "A simple, Pillow-friendly, Python wrapper around tesseract-ocr API using Cython", "version": "3.0.0rc2" }, "last_serial": 5549862, "releases": { "3.0.0rc1": [ { "comment_text": "", "digests": { "md5": "4b57512287241062aaca9712c0b8118d", "sha256": "e27271929cf7e3da4911bc79f49f1a08168bf2022ae505fcd5a29fb28c1c24e6" }, "downloads": -1, "filename": "ocrd-fork-tesserocr-3.0.0rc1.tar.gz", "has_sig": false, "md5_digest": "4b57512287241062aaca9712c0b8118d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 54954, "upload_time": "2018-06-22T00:01:36", "url": "https://files.pythonhosted.org/packages/ea/52/0d8d0d92344ecc15ab0243adccf74c4828c23261d380e3b368d10e9636d2/ocrd-fork-tesserocr-3.0.0rc1.tar.gz" } ], "3.0.0rc2": [ { "comment_text": "", "digests": { "md5": "8e083be1d73e175695fe4363cee171bc", "sha256": "6eccaf8b8eff897c09f9b4258410ba4c32c04e633d7d2d6f6170646321cc2b7f" }, "downloads": -1, "filename": "ocrd-fork-tesserocr-3.0.0rc2.tar.gz", "has_sig": false, "md5_digest": "8e083be1d73e175695fe4363cee171bc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56409, "upload_time": "2019-07-18T08:57:09", "url": "https://files.pythonhosted.org/packages/5b/dc/155dda28b9d8b61723ea4669ead95b6127e110a52c8125c74042c663c654/ocrd-fork-tesserocr-3.0.0rc2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8e083be1d73e175695fe4363cee171bc", "sha256": "6eccaf8b8eff897c09f9b4258410ba4c32c04e633d7d2d6f6170646321cc2b7f" }, "downloads": -1, "filename": "ocrd-fork-tesserocr-3.0.0rc2.tar.gz", "has_sig": false, "md5_digest": "8e083be1d73e175695fe4363cee171bc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56409, "upload_time": "2019-07-18T08:57:09", "url": "https://files.pythonhosted.org/packages/5b/dc/155dda28b9d8b61723ea4669ead95b6127e110a52c8125c74042c663c654/ocrd-fork-tesserocr-3.0.0rc2.tar.gz" } ] }