{ "info": { "author": "Leonard Chan", "author_email": "lchan1994@yahoo.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha" ], "description": "# OCR Project\nThe purpose of this repo is to construct an ocr for detecting keywords from\npeople's handwriting to make their notes searchable.\nWhole sentences do not need to be fully extracted, as long as a few unique\nkeywords are extracted such that the extracted set of words are seo-able.\n\n\n## Environment\nThis package was developed in the following environment.\n- Python 2.7.9\n- Mac OSX 10.11.4, Ubuntu 14.04\n- pip 8.1.1\n\n\n## Dependencies\nThis package is dependant on the packages provided in requirements.txt.\n- numpy\n- scipy\n- matplotlib\n- cv2\n\nOptional development packages to install are provided in dev-requirements.txt.\nPrecompiling the pure code with cython can allow for an extra boost in\nperformance.\n\n### OpenCV\nThis package is heavily dependant on opencv, so be sure to install it first\nbefore installing the cv2 package in requirements.txt.\n\n- [Installing on Ubuntu/Debian](http://milq.github.io/install-opencv-ubuntu-debian/)\n- [Installing on Mac OS](http://www.pyimagesearch.com/2015/06/15/install-opencv-3-0-and-python-2-7-on-osx/)\n\n### Scikit Learn\nIf using the multilayer perceptron, the latest dev version of scikit learn\n(0.18.dev0) is required since it already contains an implementation of a\nmultilayer perceptron that this package uses.\n\nAt the time of writing this, version 0.17.0 is provided on pypi, so this dev\nversion cannot be installed via pip and will have to be installed by cloning\nthe [repo](https://github.com/scikit-learn/scikit-learn) and just following\nthe instructions in the repo for installing it.\n\n### Troubleshooting\nIf you have trouble installing numpy, scipy, or cv2, try installing them\nindividually instead of together in the requirements.txt.\n\nIf you have trouble installing them in a virtualenv, try installing them on\nyour global pip instead.\n\n\n## Installing\nAfter cloning the repo, just run the following:\n\n```sh\n$ python setup.py develop # Create executables\n```\n\nTo make the ocr packageglobally availale on your computer via pip:\n```sh\n$ python setup.py bdist_wheel # Create binary distribution\n$ wheel install dist/ocr-{version}-py{2/3}-none-any.whl # Install created wheel\n$ wheel install-scripts ocr # Enable package scripts\n````\nThe first command with `develop` essentially does the same as the following\nthree commands, but these commands allow for the package to be available\nglobally through your computer in pip space like any other package installed\nvia pip.\n\n\n## Usage\nThe package can be used to create classifiers and use these classifiers to\nclassify characters extracted from an image.\n\nBefore extracting text from an image, you will need to create a classifier to classify\neach character extracted from the image.\n\n### K Nearest Neighbors\nCreate a knn classifier using handwritten training data in the given dir and save it\nin a file called classifier.p.\n```sh\n$ knn-create -t handwritten -d data/training/handwritten -s classifier.p\n53.2258064516 % # Success rate on testing data\n```\n\nExtract text from the given image using the classifier we just made. Resize the image\nto reduce computation time.\n```sh\n$ knn-extract data/test/sample3.jpg -p classifier.p --resize 0.25\n6hhVL tS L mUYh kYttrYYr 6b h qYhcY # Yeah, this isn't very good output\n```\n\n### Multilayer Perceptron\nCreate an mlp classifier using shrinked, handwritten training data in the given\ndir and save it in a file called classifier.p.\n```sh\n$ mlp-create -t shrinked -d data/shrinked20x20/ -s classifier.p\n63.3431085044 % # Success rate on testing data\n```\n\nExtract text from the given image using the classifier we just made. Resize the image\nto reduce computation time.\n```sh\n$ mlp-extract data/test/sample3.jpg -p classifier.p --resize 0.25\niYYij jY N CjYj iYjV1Y1L 1j Y YYLG1 # Yeah, this isn't very good output\n```\n\n\n## Development\nTo develop/test other classifiers, you would just need to create a classifier\nwrapper from `ocr.classifiers.base.Classifier` and create an extraction\nscript that uses this classifier to get text from an image.\n\nExample usages of this are in the KNN and MLP Classifiers in ocr/classifiers/\nand examples of implementing the extraction script are in scripts/.\n\n\n## Results\nThe package functionally works in that it can create and train classifiers, and use\nthese classifiers to extract text from images, but this package does not work in that\nthe correct text is extracted. The following are ways to improve performance:\n- Better sample data\n- Better features\n - Only the individual grayscale pixel values are used as featues. Alternative\n features that should be experimented with include:\n - Number of edges on each character\n - Number of circles on each character\n - Projecting each character onto the x and y axis and using the distributions\n along each axis as featues.\n- Smarter character region finding in an image.\n\nFor now, this package works best as a testbed for experimenting with different\nclassifiers and features.\n\n\n## TODO\n- Update unit tests.\n- Make presentation.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/PiJoules/ocr-project", "keywords": "ocr,knn,mlp", "license": "Unlicense", "maintainer": null, "maintainer_email": null, "name": "ocr-testbed", "package_url": "https://pypi.org/project/ocr-testbed/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/ocr-testbed/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/PiJoules/ocr-project" }, "release_url": "https://pypi.org/project/ocr-testbed/0.0.1/", "requires_dist": null, "requires_python": null, "summary": "OCR development and testing.", "version": "0.0.1" }, "last_serial": 2141375, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "423e60953ac3411eff9f6738a69c23c8", "sha256": "e3bfdb8ae22b058357dc2278148f7364b2fc90e688cee6829cefd549a8cb20f7" }, "downloads": -1, "filename": "ocr_testbed-0.0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "423e60953ac3411eff9f6738a69c23c8", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 23923, "upload_time": "2016-05-30T20:51:02", "url": "https://files.pythonhosted.org/packages/3e/96/0b7aed2e6cb2dece3391b3b9db0254f3142c9fe62be8f22b392d51ec7125/ocr_testbed-0.0.1-py2-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "423e60953ac3411eff9f6738a69c23c8", "sha256": "e3bfdb8ae22b058357dc2278148f7364b2fc90e688cee6829cefd549a8cb20f7" }, "downloads": -1, "filename": "ocr_testbed-0.0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "423e60953ac3411eff9f6738a69c23c8", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 23923, "upload_time": "2016-05-30T20:51:02", "url": "https://files.pythonhosted.org/packages/3e/96/0b7aed2e6cb2dece3391b3b9db0254f3142c9fe62be8f22b392d51ec7125/ocr_testbed-0.0.1-py2-none-any.whl" } ] }