{
    "info": {
        "author": "Fayez Zouheiry",
        "author_email": "iamfayez@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 5 - Production/Stable",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Operating System :: POSIX",
            "Programming Language :: Cython",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5",
            "Programming Language :: Python :: 3.6",
            "Programming Language :: Python :: 3.7",
            "Programming Language :: Python :: Implementation :: CPython",
            "Programming Language :: Python :: Implementation :: PyPy",
            "Topic :: Multimedia :: Graphics :: Capture :: Scanners",
            "Topic :: Multimedia :: Graphics :: Graphics Conversion",
            "Topic :: Scientific/Engineering :: Image Recognition"
        ],
        "description": "=========\ntesserocr\n=========\n\nA simple, |Pillow|_-friendly,\nwrapper around the ``tesseract-ocr`` API for Optical Character Recognition\n(OCR).\n\n.. image:: https://travis-ci.org/sirfz/tesserocr.svg?branch=master\n    :target: https://travis-ci.org/sirfz/tesserocr\n    :alt: TravisCI build status\n\n.. image:: https://img.shields.io/pypi/v/tesserocr.svg?maxAge=2592000\n    :target: https://pypi.python.org/pypi/tesserocr\n    :alt: Latest version on PyPi\n\n.. image:: https://img.shields.io/pypi/pyversions/tesserocr.svg?maxAge=2592000\n    :alt: Supported python versions\n\n**tesserocr** integrates directly with Tesseract's C++ API using Cython\nwhich allows for a simple Pythonic and easy-to-read source code. It\nenables real concurrent execution when used with Python's ``threading``\nmodule by releasing the GIL while processing an image in tesseract.\n\n**tesserocr** is designed to be |Pillow|_-friendly but can also be used\nwith image files instead.\n\n.. |Pillow| replace:: ``Pillow``\n.. _Pillow: http://python-pillow.github.io/\n\nRequirements\n============\n\nRequires libtesseract (>=3.04) and libleptonica (>=1.71).\n\nOn Debian/Ubuntu:\n\n::\n\n    $ apt-get install tesseract-ocr libtesseract-dev libleptonica-dev pkg-config\n\nYou may need to `manually compile tesseract`_ for a more recent version. Note that you may need\nto update your ``LD_LIBRARY_PATH`` environment variable to point to the right library versions in\ncase you have multiple tesseract/leptonica installations.\n\n|Cython|_ (>=0.23) is required for building and optionally |Pillow|_ to support ``PIL.Image`` objects.\n\n.. _manually compile tesseract: https://github.com/tesseract-ocr/tesseract/wiki/Compiling\n.. |Cython| replace:: ``Cython``\n.. _Cython: http://cython.org/\n\nInstallation\n============\nLinux and BSD/MacOS\n-------------------\n::\n\n    $ pip install tesserocr\n\nThe setup script attempts to detect the include/library dirs (via |pkg-config|_ if available) but you\ncan override them with your own parameters, e.g.:\n\n::\n\n    $ CPPFLAGS=-I/usr/local/include pip install tesserocr\n\nor\n\n::\n\n    $ python setup.py build_ext -I/usr/local/include\n\nTested on Linux and BSD/MacOS\n\n.. |pkg-config| replace:: **pkg-config**\n.. _pkg-config: https://pkgconfig.freedesktop.org/\n\nWindows\n-------\n\nThe proposed downloads consist of stand-alone packages containing all the Windows libraries needed for execution. This means that no additional installation of tesseract is required on your system.\n\nConda\n`````\n\nYou can use the channel `simonflueckiger <https://anaconda.org/simonflueckiger/tesserocr>`_ to install from Conda:\n\n::\n\n    > conda install -c simonflueckiger tesserocr\n\nor to get **tesserocr** compiled with **tesseract 4.0.0**:\n\n::\n\n    > conda install -c simonflueckiger/label/tesseract-4.0.0-master tesserocr\n\npip\n```\n\nDownload the wheel file corresponding to your Windows platform and Python installation from `simonflueckiger/tesserocr-windows_build/releases <https://github.com/simonflueckiger/tesserocr-windows_build/releases>`_ and install them via:\n\n::\n\n    > pip install <package_name>.whl\n\nUsage\n=====\n\nInitialize and re-use the tesseract API instance to score multiple\nimages:\n\n.. code:: python\n\n    from tesserocr import PyTessBaseAPI\n\n    images = ['sample.jpg', 'sample2.jpg', 'sample3.jpg']\n\n    with PyTessBaseAPI() as api:\n        for img in images:\n            api.SetImageFile(img)\n            print(api.GetUTF8Text())\n            print(api.AllWordConfidences())\n    # api is automatically finalized when used in a with-statement (context manager).\n    # otherwise api.End() should be explicitly called when it's no longer needed.\n\n``PyTessBaseAPI`` exposes several tesseract API methods. Make sure you\nread their docstrings for more info.\n\nBasic example using available helper functions:\n\n.. code:: python\n\n    import tesserocr\n    from PIL import Image\n\n    print(tesserocr.tesseract_version())  # print tesseract-ocr version\n    print(tesserocr.get_languages())  # prints tessdata path and list of available languages\n\n    image = Image.open('sample.jpg')\n    print(tesserocr.image_to_text(image))  # print ocr text from image\n    # or\n    print(tesserocr.file_to_text('sample.jpg'))\n\n``image_to_text`` and ``file_to_text`` can be used with ``threading`` to\nconcurrently process multiple images which is highly efficient.\n\nAdvanced API Examples\n---------------------\n\nGetComponentImages example:\n```````````````````````````\n\n.. code:: python\n\n    from PIL import Image\n    from tesserocr import PyTessBaseAPI, RIL\n\n    image = Image.open('/usr/src/tesseract/testing/phototest.tif')\n    with PyTessBaseAPI() as api:\n        api.SetImage(image)\n        boxes = api.GetComponentImages(RIL.TEXTLINE, True)\n        print('Found {} textline image components.'.format(len(boxes)))\n        for i, (im, box, _, _) in enumerate(boxes):\n            # im is a PIL image object\n            # box is a dict with x, y, w and h keys\n            api.SetRectangle(box['x'], box['y'], box['w'], box['h'])\n            ocrResult = api.GetUTF8Text()\n            conf = api.MeanTextConf()\n            print(u\"Box[{0}]: x={x}, y={y}, w={w}, h={h}, \"\n                  \"confidence: {1}, text: {2}\".format(i, conf, ocrResult, **box))\n\nOrientation and script detection (OSD):\n```````````````````````````````````````\n\n.. code:: python\n\n    from PIL import Image\n    from tesserocr import PyTessBaseAPI, PSM\n\n    with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:\n        image = Image.open(\"/usr/src/tesseract/testing/eurotext.tif\")\n        api.SetImage(image)\n        api.Recognize()\n\n        it = api.AnalyseLayout()\n        orientation, direction, order, deskew_angle = it.Orientation()\n        print(\"Orientation: {:d}\".format(orientation))\n        print(\"WritingDirection: {:d}\".format(direction))\n        print(\"TextlineOrder: {:d}\".format(order))\n        print(\"Deskew angle: {:.4f}\".format(deskew_angle))\n\nor more simply with ``OSD_ONLY`` page segmentation mode:\n\n.. code:: python\n\n    from tesserocr import PyTessBaseAPI, PSM\n\n    with PyTessBaseAPI(psm=PSM.OSD_ONLY) as api:\n        api.SetImageFile(\"/usr/src/tesseract/testing/eurotext.tif\")\n\n        os = api.DetectOS()\n        print(\"Orientation: {orientation}\\nOrientation confidence: {oconfidence}\\n\"\n              \"Script: {script}\\nScript confidence: {sconfidence}\".format(**os))\n\nmore human-readable info with tesseract 4+ (demonstrates LSTM engine usage):\n\n.. code:: python\n\n    from tesserocr import PyTessBaseAPI, PSM, OEM\n\n    with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:\n        api.SetImageFile(\"/usr/src/tesseract/testing/eurotext.tif\")\n\n        os = api.DetectOrientationScript()\n        print(\"Orientation: {orient_deg}\\nOrientation confidence: {orient_conf}\\n\"\n              \"Script: {script_name}\\nScript confidence: {script_conf}\".format(**os))\n\nIterator over the classifier choices for a single symbol:\n`````````````````````````````````````````````````````````\n\n.. code:: python\n\n    from __future__ import print_function\n\n    from tesserocr import PyTessBaseAPI, RIL, iterate_level\n\n    with PyTessBaseAPI() as api:\n        api.SetImageFile('/usr/src/tesseract/testing/phototest.tif')\n        api.SetVariable(\"save_blob_choices\", \"T\")\n        api.SetRectangle(37, 228, 548, 31)\n        api.Recognize()\n\n        ri = api.GetIterator()\n        level = RIL.SYMBOL\n        for r in iterate_level(ri, level):\n            symbol = r.GetUTF8Text(level)  # r == ri\n            conf = r.Confidence(level)\n            if symbol:\n                print(u'symbol {}, conf: {}'.format(symbol, conf), end='')\n            indent = False\n            ci = r.GetChoiceIterator()\n            for c in ci:\n                if indent:\n                    print('\\t\\t ', end='')\n                print('\\t- ', end='')\n                choice = c.GetUTF8Text()  # c == ci\n                print(u'{} conf: {}'.format(choice, c.Confidence()))\n                indent = True\n            print('---------------------------------------------')",
        "description_content_type": "text/x-rst",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/sirfz/tesserocr",
        "keywords": "Tesseract,tesseract-ocr,OCR,optical character recognition,PIL,Pillow,Cython",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "ocrd-fork-tesserocr",
        "package_url": "https://pypi.org/project/ocrd-fork-tesserocr/",
        "platform": "",
        "project_url": "https://pypi.org/project/ocrd-fork-tesserocr/",
        "project_urls": {
            "Homepage": "https://github.com/sirfz/tesserocr"
        },
        "release_url": "https://pypi.org/project/ocrd-fork-tesserocr/3.0.0rc2/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "A simple, Pillow-friendly, Python wrapper around tesseract-ocr API using Cython",
        "version": "3.0.0rc2"
    },
    "last_serial": 5549862,
    "releases": {
        "3.0.0rc1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "4b57512287241062aaca9712c0b8118d",
                    "sha256": "e27271929cf7e3da4911bc79f49f1a08168bf2022ae505fcd5a29fb28c1c24e6"
                },
                "downloads": -1,
                "filename": "ocrd-fork-tesserocr-3.0.0rc1.tar.gz",
                "has_sig": false,
                "md5_digest": "4b57512287241062aaca9712c0b8118d",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 54954,
                "upload_time": "2018-06-22T00:01:36",
                "url": "https://files.pythonhosted.org/packages/ea/52/0d8d0d92344ecc15ab0243adccf74c4828c23261d380e3b368d10e9636d2/ocrd-fork-tesserocr-3.0.0rc1.tar.gz"
            }
        ],
        "3.0.0rc2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "8e083be1d73e175695fe4363cee171bc",
                    "sha256": "6eccaf8b8eff897c09f9b4258410ba4c32c04e633d7d2d6f6170646321cc2b7f"
                },
                "downloads": -1,
                "filename": "ocrd-fork-tesserocr-3.0.0rc2.tar.gz",
                "has_sig": false,
                "md5_digest": "8e083be1d73e175695fe4363cee171bc",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 56409,
                "upload_time": "2019-07-18T08:57:09",
                "url": "https://files.pythonhosted.org/packages/5b/dc/155dda28b9d8b61723ea4669ead95b6127e110a52c8125c74042c663c654/ocrd-fork-tesserocr-3.0.0rc2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "8e083be1d73e175695fe4363cee171bc",
                "sha256": "6eccaf8b8eff897c09f9b4258410ba4c32c04e633d7d2d6f6170646321cc2b7f"
            },
            "downloads": -1,
            "filename": "ocrd-fork-tesserocr-3.0.0rc2.tar.gz",
            "has_sig": false,
            "md5_digest": "8e083be1d73e175695fe4363cee171bc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 56409,
            "upload_time": "2019-07-18T08:57:09",
            "url": "https://files.pythonhosted.org/packages/5b/dc/155dda28b9d8b61723ea4669ead95b6127e110a52c8125c74042c663c654/ocrd-fork-tesserocr-3.0.0rc2.tar.gz"
        }
    ]
}