{ "info": { "author": "Silas Bischoff", "author_email": "silas.bischoff@googlemail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Financial and Insurance Industry", "Intended Audience :: Legal Industry", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Image Recognition", "Topic :: Software Development :: Libraries", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "# basic-document-classifier\nA simple CNN for n-class classification of document images.\n\nIt doesn't take colour into account (it transforms to grayscale).\nFor small numbers of classes (2 to 4) this model can achieve > 90% accuracy with as little as 10 to 30 training images per class.\nTraining data can be provided in [any image format supported by *PIL*](https://pillow.readthedocs.io/en/5.1.x/handbook/image-file-formats.html).\n\n## Installation\n\n```pip install document-classifier```\nor\n```poetry add document-classifier```\n\n## Usage\n\n```python\nfrom document_classifier import CNN\n\n# Create a classification model for 3 document classes.\nclassifier = CNN(class_number=3)\n\n# Train the model based on images stored on the file system.\ntraining_metrics = classifier.train(\n batch_size=8,\n epochs=40,\n train_data_path=\"./train_data\",\n test_data_path=\"./test_data\"\n)\n# \"./train_data\" and \"./test_data\" have to contain a subfolder for\n# each document class, e.g. \"./train_data/letter\" or \"./train_data/report\".\n\n# View training metrics like the validation accuracy on the test data.\nprint(training_metrics.history[\"val_acc\"])\n\n# Save the trained model to the file system.\nclassifier.save(model_path=\"./my_model\")\n\n# Load the model from the file system.\nclassifier = CNN.load(model_path=\"./my_model\")\n\n# Predict the class of some document image stored in the file system.\nprediction = classifier.predict(image=\"./my_image.jpg\")\n# The image parameter also taks binary image data as a bytes object.\n```\n\nThe prediction result is a 2-tuple containing the document class label as a string and the confidence score as a float.\n\n## Changes\n\n### 0.1.2\n- Give every CNN instance its own isolated tensorflow graph and session\n\n### 0.1.1\n- Fix a bug that occured when using multiple model instances at the same time\n\n## TODO\n\nThe model architecture is fixed for now and geared towards smaller numbers of classes and training images.\nI'm working on automatic scaling for the CNN.\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sbischoff-ai/basic-document-classifier", "keywords": "document,image,classification,deep,learning", "license": "MIT", "maintainer": "Silas Bischoff", "maintainer_email": "silas.bischoff@googlemail.com", "name": "document-classifier", "package_url": "https://pypi.org/project/document-classifier/", "platform": "", "project_url": "https://pypi.org/project/document-classifier/", "project_urls": { "Homepage": "https://github.com/sbischoff-ai/basic-document-classifier", "Repository": "https://github.com/sbischoff-ai/basic-document-classifier" }, "release_url": "https://pypi.org/project/document-classifier/0.1.2/", "requires_dist": [ "keras (>=2.2,<3.0)", "pillow (>=6.1,<7.0)", "tensorflow-gpu (>=1.14,<2.0)", "numpy (>=1.17,<2.0)" ], "requires_python": ">=3.6,<4.0", "summary": "A simple CNN for n-class classification of document images.", "version": "0.1.2" }, "last_serial": 5681191, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "b561d30df46f6271bbc748612505c339", "sha256": "8408368ac41448853a84c450f0c8b297700ae8ddfa422edd3e5b58d040f66557" }, "downloads": -1, "filename": "document_classifier-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "b561d30df46f6271bbc748612505c339", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 5212, "upload_time": "2019-08-09T10:02:17", "url": "https://files.pythonhosted.org/packages/18/16/f192f02508bf6afd324d5c8ed7dbacb521d804611a6677219be843752d49/document_classifier-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fdf8ba6ca0dd8ea6bcfef57cf702bb60", "sha256": "7293870651c9e02a1dc6dc2f4636c9125629765aa92d5d692e1275be44abf50b" }, "downloads": -1, "filename": "document-classifier-0.1.0.tar.gz", "has_sig": false, "md5_digest": "fdf8ba6ca0dd8ea6bcfef57cf702bb60", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 5058, "upload_time": "2019-08-09T10:02:15", "url": "https://files.pythonhosted.org/packages/35/01/3fa3cee3c460f90c0f08b561118a0c3c8148763e859fea71230c31e4f81c/document-classifier-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "eb8d3bdbd14e4bc69d9f89772bfc87ca", "sha256": "ece629f44883bc26878a277dbf2b7cb953a21b36482c71894402f4c35400d049" }, "downloads": -1, "filename": "document_classifier-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "eb8d3bdbd14e4bc69d9f89772bfc87ca", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 5416, "upload_time": "2019-08-14T14:45:55", "url": "https://files.pythonhosted.org/packages/47/ef/68df59c4d849c2c749a259ce1cbf41e0793e8f1ac6a742dcec57d4fe213f/document_classifier-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e6caaf04e20ccd00ea0acaa40e35295d", "sha256": "7c2889dba345c80de72e116d6799cfa875b5325a446654153183a91bf7250542" }, "downloads": -1, "filename": "document-classifier-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e6caaf04e20ccd00ea0acaa40e35295d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 5284, "upload_time": "2019-08-14T14:45:53", "url": "https://files.pythonhosted.org/packages/67/4f/3b5ffedbbb68f391344c756b156be83cc9adab9b9a85ddb5a1be4bc4875c/document-classifier-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "018d188321db3e05edda157ccdda97d4", "sha256": "7c08159b70d52a697dfdf711d07dc1038d468662db42f5e4d8b204695fc4385d" }, "downloads": -1, "filename": "document_classifier-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "018d188321db3e05edda157ccdda97d4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 5490, "upload_time": "2019-08-15T08:50:10", "url": "https://files.pythonhosted.org/packages/01/02/987b9f5d70442023b41e7b931346baf024aeb904dfca4c21b1f88b8ca33f/document_classifier-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b1bb719103d0b67c48411f5a11c06251", "sha256": "8ce1d133e0b5c08786d89d2faeace221f8e05c7daa7d60e7bf644695d8493fae" }, "downloads": -1, "filename": "document-classifier-0.1.2.tar.gz", "has_sig": false, "md5_digest": "b1bb719103d0b67c48411f5a11c06251", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 5372, "upload_time": "2019-08-15T08:50:08", "url": "https://files.pythonhosted.org/packages/d5/68/df9f7529410061fe15eb6e7150b6ab7c955366c7bbfcd070c9c06b5271c1/document-classifier-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "018d188321db3e05edda157ccdda97d4", "sha256": "7c08159b70d52a697dfdf711d07dc1038d468662db42f5e4d8b204695fc4385d" }, "downloads": -1, "filename": "document_classifier-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "018d188321db3e05edda157ccdda97d4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 5490, "upload_time": "2019-08-15T08:50:10", "url": "https://files.pythonhosted.org/packages/01/02/987b9f5d70442023b41e7b931346baf024aeb904dfca4c21b1f88b8ca33f/document_classifier-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b1bb719103d0b67c48411f5a11c06251", "sha256": "8ce1d133e0b5c08786d89d2faeace221f8e05c7daa7d60e7bf644695d8493fae" }, "downloads": -1, "filename": "document-classifier-0.1.2.tar.gz", "has_sig": false, "md5_digest": "b1bb719103d0b67c48411f5a11c06251", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 5372, "upload_time": "2019-08-15T08:50:08", "url": "https://files.pythonhosted.org/packages/d5/68/df9f7529410061fe15eb6e7150b6ab7c955366c7bbfcd070c9c06b5271c1/document-classifier-0.1.2.tar.gz" } ] }