{ "info": { "author": "breezedeus", "author_email": "breezedeus@163.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation", "Topic :: Software Development :: Libraries" ], "description": "\u4e2d\u6587\u7248\u8bf4\u660e\u8bf7\u89c1[\u4e2d\u6587README](./README_cn.md)\u3002\n\n\n\n# Update 2019.07.25: release cnocr V1.0.0\n\n`cnocr` `v1.0.0` is released, which is more efficient for prediction. **The new version of the model is not compatible with the previous version.** So if upgrading, please download the latest model file again. See below for the details (same as before).\n\n\n\nMain changes are\uff1a\n\n- **The new crnn model supports prediction for variable-width image files, so is more efficient for prediction.**\n- Support fine-tuning the existing model with specific data.\n- Fix bugs\uff0csuch as `train accuracy` always `0`.\n- Depended package `mxnet` is upgraded from `1.3.1` to `1.4.1`.\n\n\n\n# cnocr\n\nA python package for Chinese OCR with available trained models.\nSo it can be used directly after installed.\n\nThe accuracy of the current crnn model is about `98.8%`.\n\nThe project originates from our own ([\u7231\u56e0\u4e92\u52a8 Ein+](https://einplus.cn)) internal needs.\nThanks for the internal supports.\n\n## Changes\n\nMost of the codes are adapted from [crnn-mxnet-chinese-text-recognition](https://github.com/diaomin/crnn-mxnet-chinese-text-recognition).\nMuch thanks to the author.\n\nSome changes are:\n\n* use raw MXNet CTC Loss instead of WarpCTC Loss. No more complicated installation.\n* public pre-trained model for anyone. No more a-few-days training.\n* add online `predict` function and script. Easy to use.\n\n## Installation\n\n```bash\npip install cnocr\n```\n\n> Please use Python3 (3.4, 3.5, 3.6 should work). Python2 is not tested.\n\n## Usage\n\nThe first time cnocr is used, the model files will be downloaded automatically from \n[Dropbox](https://www.dropbox.com/s/7w8l3mk4pvkt34w/cnocr-models-v1.0.0.zip?dl=0) to `~/.cnocr`. \n\nThe zip file will be extracted and you can find the resulting model files in `~/.cnocr/models` by default.\nIn case the automatic download can't perform well, you can download the zip file manually \nfrom [Baidu NetDisk](https://pan.baidu.com/s/1DWV3H2UWmzOU6d48UbTYVw) with extraction code `ss81`, and put the zip file to `~/.cnocr`. The code will do else.\n\n\n\n### Predict\n\nThree functions are provided for prediction.\n\n\n\n#### 1. `CnOcr.ocr(img_fp)`\n\nThe function `cnOcr.ocr (img_fp)` can recognize texts in an image containing multiple lines of text (or single lines).\n\n\n\n**Function Description**\n\n- input parameter `img_fp`: image file path; or color image `mx.nd.NDArray` or `np.ndarray`, with shape `(height, width, 3)`, and the channels should be RGB formatted.\n- return: `List(List(Char))`, such as: `[['\u7b2c', '\u4e00', '\u884c'], ['\u7b2c', '\u4e8c', '\u884c'], ['\u7b2c', '\u4e09', '\u884c']]`.\n\n\n\n\n**Usage Case**\n\n\n```python\nfrom cnocr import CnOcr\nocr = CnOcr()\nres = ocr.ocr('examples/multi-line_cn1.png')\nprint(\"Predicted Chars:\", res)\n```\n\nor:\n\n```python\nimport mxnet as mx\nfrom cnocr import CnOcr\nocr = CnOcr()\nimg_fp = 'examples/multi-line_cn1.png'\nimg = mx.image.imread(img_fp, 1)\nres = ocr.ocr(img)\nprint(\"Predicted Chars:\", res)\n```\n\nThe previous codes can recognize texts in the image file [examples/multi-line_cn1.png](./examples/multi-line_cn1.png):\n\n![examples/multi-line_cn1.png](./examples/multi-line_cn1.png)\n\nThe OCR results shoule be:\n\n```bash\nPredicted Chars: [['\u7f51', '\u7edc', '\u652f', '\u4ed8', '\u5e76', '\u65e0', '\u672c', '\u8d28', '\u7684', '\u533a', '\u522b', '\uff0c', '\u56e0', '\u4e3a'],\n ['\u6bcf', '\u4e00', '\u4e2a', '\u624b', '\u673a', '\u53f7', '\u7801', '\u548c', '\u90ae', '\u4ef6', '\u5730', '\u5740', '\u80cc', '\u540e'],\n ['\u90fd', '\u4f1a', '\u5bf9', '\u5e94', '\u7740', '\u4e00', '\u4e2a', '\u8d26', '\u6237', '\u4e00', '\u2015', '\u8fd9', '\u4e2a', '\u8d26'],\n ['\u6237', '\u53ef', '\u4ee5', '\u662f', '\u4fe1', '\u7528', '\u5361', '\u8d26', '\u6237', '\u3001', '\u501f', '\u8bb0', '\u5361', '\u8d26'],\n ['\u6237', '\uff0c', '\u4e5f', '\u5305', '\u62ec', '\u90ae', '\u5c40', '\u6c47', '\u6b3e', '\u3001', '\u624b', '\u673a', '\u4ee3'],\n ['\u6536', '\u3001', '\u7535', '\u8bdd', '\u4ee3', '\u6536', '\u3001', '\u9884', '\u4ed8', '\u8d39', '\u5361', '\u548c', '\u70b9', '\u5361'],\n ['\u7b49', '\u591a', '\u79cd', '\u5f62', '\u5f0f', '\u3002']]\n```\n\n#### 2. `CnOcr.ocr_for_single_line(img_fp)`\n\nIf you know that the image you're predicting contains only one line of text, function `CnOcr.ocr_for_single_line(img_fp)` can be used instead\u3002Compared with `CnOcr.ocr()`, the result of `CnOcr.ocr_for_single_line()` is more reliable because the process of splitting lines is not required. \n\n\n\n**Function Description**\n\n- input parameter `img_fp`: image file path; or color image `mx.nd.NDArray` or `np.ndarray`, with shape `[height, width]` or `[height, width, channel]`. The optional channel should be `1` (gray image) or `3` (color image).\n- return: `List(Char)`, such as: `['\u4f60', '\u597d']`.\n\n\n\n**Usage Case**\uff1a\n\n```python\nfrom cnocr import CnOcr\nocr = CnOcr()\nres = ocr.ocr_for_single_line('examples/rand_cn1.png')\nprint(\"Predicted Chars:\", res)\n```\n\nor:\n\n```python\nimport mxnet as mx\nfrom cnocr import CnOcr\nocr = CnOcr()\nimg_fp = 'examples/rand_cn1.png'\nimg = mx.image.imread(img_fp, 1)\nres = ocr.ocr_for_single_line(img)\nprint(\"Predicted Chars:\", res)\n```\n\n\nThe previous codes can recognize texts in the image file [examples/rand_cn1.png](./examples/rand_cn1.png)\uff1a\n\n![examples/rand_cn1.png](./examples/rand_cn1.png)\n\nThe OCR results shoule be:\n\n```bash\nPredicted Chars: ['\u7b20', '\u6de1', '\u563f', '\u9a85', '\u8c27', '\u9f0e', '\u81ed', '\u59da', '\u6b7c', '\u8822', '\u9a7c', '\u8033', '\u88d4', '\u631d', '\u6daf', '\u72d7', '\u84bd', '\u5b50', '\u72b7'] \n```\n\n#### 3. `CnOcr.ocr_for_single_lines(img_list)`\n\nFunction `CnOcr.ocr_for_single_lines(img_list)` can predict a number of single-line-text image arrays batchly. Actually `CnOcr.ocr(img_fp)` and `CnOcr.ocr_for_single_line(img_fp)` both invoke `CnOcr.ocr_for_single_lines(img_list)` internally.\n\n\n\n**Function Description**\n\n- input parameter `img_list`: list of images, in which each element should be a line image array, with type `mx.nd.NDArray` or `np.ndarray`. Each element should be a tensor with values ranging from `0` to` 255`, and with shape `[height, width]` or `[height, width, channel]`. The optional channel should be `1` (gray image) or `3` (color image).\n- return: `List(List(Char))`, such as: `[['\u7b2c', '\u4e00', '\u884c'], ['\u7b2c', '\u4e8c', '\u884c'], ['\u7b2c', '\u4e09', '\u884c']]`.\n\n\n\nUsage Case**\uff1a\n\n```python\nimport mxnet as mx\nfrom cnocr import CnOcr\nocr = CnOcr()\nimg_fp = 'examples/multi-line_cn1.png'\nimg = mx.image.imread(img_fp, 1).asnumpy()\nline_imgs = line_split(img, blank=True)\nline_img_list = [line_img for line_img, _ in line_imgs]\nres = ocr.ocr_for_single_lines(line_img_list)\nprint(\"Predicted Chars:\", res)\n```\n\nMore usage cases can be found at [tests/test_cnocr.py](./tests/test_cnocr.py).\n\n\n### Using the Script\n\n```bash\npython scripts/cnocr_predict.py --file examples/multi-line_cn1.png\n```\n\n\n\n### (No NECESSARY) Train\n\nYou can use the package without any train. But if you really really want to train your own models, follow this:\n\n```bash\npython scripts/cnocr_train.py --cpu 2 --num_proc 4 --loss ctc --dataset cn_ocr\n```\n\n\n\nFine-tuning the model with specific data from existing models is also supported. Please refer to the following command:\n\n```bash\npython scripts/cnocr_train.py --cpu 2 --num_proc 4 --loss ctc --dataset cn_ocr --load_epoch 20\n```\n\n\n\nMore references can be found at [scripts/run_cnocr_train.sh](./scripts/run_cnocr_train.sh).\n\n\n\n## Future Work\n\n* [x] support multi-line-characters recognition (`Done`)\n* [x] crnn model supports prediction for variable-width image files (`Done`)\n* [x] Add Unit Tests (`Doing`)\n* [x] Bugfixes (`Doing`)\n* [ ] Support space recognition (Tried, but not successful for now )\n* [ ] Try other models such as DenseNet, ResNet\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/breezedeus/cnocr", "keywords": "", "license": "Apache 2.0", "maintainer": "", "maintainer_email": "", "name": "cnocr", "package_url": "https://pypi.org/project/cnocr/", "platform": "Mac", "project_url": "https://pypi.org/project/cnocr/", "project_urls": { "Homepage": "https://github.com/breezedeus/cnocr" }, "release_url": "https://pypi.org/project/cnocr/1.0.0/", "requires_dist": [ "numpy (<1.15.0,>=1.14.0)", "pillow (>=5.3.0)", "mxnet (<1.5.0,>=1.4.1)", "gluoncv (<0.4.0,>=0.3.0)" ], "requires_python": "", "summary": "Package for Chinese OCR, which can be used after installed without training yourself OCR model", "version": "1.0.0" }, "last_serial": 5580940, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "17778ce84a31339b349a8bf41195efad", "sha256": "7d5754f9bdbd93e283e6893b9153f5b224fb07787f28ec2b887b51809fc57e41" }, "downloads": -1, "filename": "cnocr-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "17778ce84a31339b349a8bf41195efad", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 32078, "upload_time": "2019-03-27T15:22:15", "url": "https://files.pythonhosted.org/packages/f5/f8/4da355ec579d61b756ab1bd355b78cbc7697e1c4f5fc1b9dec8057737325/cnocr-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "48fce165f81dda0461a015c17d69c2ac", "sha256": "d58adb8d340c55a9bce7e54ed07985f3f3449b7880ad3bae4baf0a6d21ced58d" }, "downloads": -1, "filename": "cnocr-0.1.1.tar.gz", "has_sig": false, "md5_digest": "48fce165f81dda0461a015c17d69c2ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16715, "upload_time": "2019-03-27T15:22:18", "url": "https://files.pythonhosted.org/packages/38/e9/84fc884b33b87ea8d376395db804c33149d9edc0887d78d623b50f6b796e/cnocr-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "5b90bc2afaf1f061e29fc300c466ec3e", "sha256": "dbc50f9c3bf5c594a666bd09ebd99f01d3d2d2dbc48011bc611c53fa13d205e4" }, "downloads": -1, "filename": "cnocr-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5b90bc2afaf1f061e29fc300c466ec3e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 35634, "upload_time": "2019-04-07T07:02:56", "url": "https://files.pythonhosted.org/packages/49/b7/a9d383d87e892721683042bcb18128eae7435bc5e379f7f13c3d1591c64e/cnocr-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "293806c10cb27dce17054b96d80da435", "sha256": "7f736b33a29cf7ccfd72e1c27e49fc68bfd9e2129480c08218f4f815a0b20f14" }, "downloads": -1, "filename": "cnocr-0.2.0.tar.gz", "has_sig": false, "md5_digest": "293806c10cb27dce17054b96d80da435", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19386, "upload_time": "2019-04-07T07:03:47", "url": "https://files.pythonhosted.org/packages/1a/d7/2156d29de187f00ec27551c8d6bd798f2b1c4e001f82a21e0c8fc2b1c489/cnocr-0.2.0.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "457c351be21949edf115cf60726fd87c", "sha256": "a6ba6fb8a94e851f847463e93107ef8ccffb431a26e1c734e71616fd7db5893f" }, "downloads": -1, "filename": "cnocr-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "457c351be21949edf115cf60726fd87c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 39255, "upload_time": "2019-07-25T03:09:31", "url": "https://files.pythonhosted.org/packages/8b/2a/86464f97dee48b691abc0c3e3f2c85602462645d4f5f062b0789087b4ea4/cnocr-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "192c2706f1a3808148ff14c3adfad6a0", "sha256": "95eaef5e83b4f49beea7a072c155ce34eddad3bd163372dc29e958bdb6e4b66a" }, "downloads": -1, "filename": "cnocr-1.0.0.tar.gz", "has_sig": false, "md5_digest": "192c2706f1a3808148ff14c3adfad6a0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23678, "upload_time": "2019-07-25T03:09:37", "url": "https://files.pythonhosted.org/packages/90/22/5b396ba294d947e3652ff7140def3660a9e60782c368541d063b3e9a944a/cnocr-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "457c351be21949edf115cf60726fd87c", "sha256": "a6ba6fb8a94e851f847463e93107ef8ccffb431a26e1c734e71616fd7db5893f" }, "downloads": -1, "filename": "cnocr-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "457c351be21949edf115cf60726fd87c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 39255, "upload_time": "2019-07-25T03:09:31", "url": "https://files.pythonhosted.org/packages/8b/2a/86464f97dee48b691abc0c3e3f2c85602462645d4f5f062b0789087b4ea4/cnocr-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "192c2706f1a3808148ff14c3adfad6a0", "sha256": "95eaef5e83b4f49beea7a072c155ce34eddad3bd163372dc29e958bdb6e4b66a" }, "downloads": -1, "filename": "cnocr-1.0.0.tar.gz", "has_sig": false, "md5_digest": "192c2706f1a3808148ff14c3adfad6a0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23678, "upload_time": "2019-07-25T03:09:37", "url": "https://files.pythonhosted.org/packages/90/22/5b396ba294d947e3652ff7140def3660a9e60782c368541d063b3e9a944a/cnocr-1.0.0.tar.gz" } ] }