{ "info": { "author": "shmakovpn", "author_email": "shmakovpn@yandex.ru", "bugtrack_url": null, "classifiers": [], "description": "=================\nDjango-ocr-server\n=================\nDjango-ocr-server lets you recognize images and PDF. It is using tesseract for this.\nhttps://github.com/tesseract-ocr/tesseract\n\nDjango-ocr-server saves the result in the database.\nTo prevent repeated recognition of the same file,\nit also saves the hash sum of the uploaded file.\nTherefore, when reloading an already existing file, the result returns immediately,\nbypassing the recognition process, which significantly reduces the load on the server.\n\nIf as a result of recognition a non-empty text is received, a searchable PDF is created.\n\nFor the searchable PDF is calculated hash sum too.\nTherefore, if you upload the created by Django-ocr-server searchable pdf to the server back,\nthen this file will not be recognized, but the result will be immediately returned.\n\nThe server can process not only images, but PDF.\nAt the same time, he analyzes, if the PDF already contains real text,\nthis text will be used and the file will not be recognized,\nwhich reduces the load on the server and improves the quality of the output.\n\n .. image:: django_ocr_server.png\n\nStorage of downloaded files and created searchable PDFs can be disabled in the settings.\n\nFor uploaded files and created searchable PDFs,\nand the processing results whole\nin the settings you can specify the lifetime after which the data will be automatically deleted.\n\nTo interact with Django-ocr-server you can use API or the admin interface.\n\nDocumentation\n=============\nhttp://django-ocr-server.readthedocs.org/en/latest\nThis open-source app is brought to you by Shmakovpn. (https://github.com/shmakovpn)\n\nInstallation\n============\nLinux Mint 19 (Ubuntu bionic)\n-----------------------------\n Installing packages\n | $sudo apt install g++ # need to build pdftotext\n | $sudo apt install libpoppler-cpp-dev # need to buid pdftotext\n Installing tesseract\n | $sudo apt install tesseract-ocr\n | $sudo apt install tesseract-ocr-rus # install languages you want\n Installing python3.7\n | $sudo apt install python3.7\n | $sudo apt install python3.7-dev\n Installing pip\n $sudo apt install python-pip\n Installing virtualenv\n | $pip install --user virtualenv\n | $echo 'PATH=~/.local/bin:$PATH' >> ~/.bashrc\n | $source ~/.bashrc\n Installing virtualenvwrapper\n | $pip install --user setuptools\n | $pip install --user wheel\n | $pip install --user virtualenvwrapper\n | $echo 'source ~/.local/bin/virtualenvwrapper.sh' >> ~/.bashrc\n | $source ~/.bashrc\n Creating virtualenv for django_ocr_server\n $mkvirtualenv django_ocr_server -p /usr/bin/python3.7\n Inslalling django-ocr-server (on virtualenv django_ocr_server). It installs Django as a dependency\n $pip install django-ocr-server-1.0.tar.gz\n Create your Django project (on virtualenv django_ocr_server)\n $django-admin startproject ocr_server\n Go to project directory\n $cd ocr_server\n Edit ocr_server/settings.py\n Add applications to INSTALLED_APPS\n\n .. code-block::\n\n INSTALLED_APPS = [\n ...\n 'rest_framework',\n 'rest_framework.authtoken',\n 'django_ocr_server',\n 'rest_framework_swagger',\n ]\n\n\n Edit ocr_server/urls.py\n\n .. code-block::\n\n from django.contrib import admin\n from django.urls import path, include\n from rest_framework.documentation import include_docs_urls\n\n admin.site.site_header = 'OCR Server Administration'\n admin.site.site_title = 'Welcome to OCR Server Administration Portal'\n\n urlpatterns = [\n path('admin/', admin.site.urls, ),\n path('docs/', include_docs_urls(title='OCR Server API')),\n path('', include('django_ocr_server.urls'), ),\n ]\n\n Perform migrations (on virtualenv django_ocr_server)\n $python manage.py migrate\n Create superuser (on virtualenv django_ocr_server)\n $python manage.py createsuperuser\n Run server (on virtualenv django_ocr_server), than visit http://localhost:8000/\n $python manage.py runserver\n\nLinux Mint 19 (Ubuntu bionic) automatic installation\n-----------------------------------------------------\n Clone django_ocr_server from github\n $git clone https://github.com/shmakovpn/django_ocr_server.git\n Run the installation script using sudo\n $sudo {your_path}/django_ocr_server/install_ubuntu.sh\n\n The script creates OS user named 'django_ocr_server', installs all needed packages.\n Creates the virtual environment.\n It installs django_ocr_server (from PyPI by default, but you can create the package from\n cloned repository, see the topic 'Creation a distribution package' how to do this).\n Then it creates the django project named 'ocr_server' in the home directory of 'django_ocr_server' OS user.\n After the script changes settings.py and urls.py is placed in ~django_ocr_server/ocr_server/ocr_server/.\n Finally it applies migrations and creates the superuser named 'admin' with the same password 'admin'.\n\n Run server under OS user django_ocr_server, then change 'admin' password in the http://localhost:your_port/admin/ page.\n | $sudo su\n | $su django_ocr_server\n | cd ~/ocr_server\n | workon django_ocr_server\n | python manage.py runserver\n\nCentos 7\n--------\n Install epel repository\n $sudo yum install epel-release\n Install python 3.6\n | $sudo yum install python36\n | $sudo yum install python36-devel\n Install gcc\n | $sudo yum intall gcc\n | $sudo yum install gcc-c++\n Install dependencies\n $sudo yum install poppler-cpp-devel\n Install tesseract\n | $sudo yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/\n | $sudo bash -c \"echo 'gpgcheck=0' >> /etc/yum.repos.d/download.opensuse.org_repositories_home_Alexander_Pozdnyakov_CentOS_7*.repo\"\n | $sudo yum update\n | $sudo yum install tesseract\n | $sudo yum install tesseract-langpack-rus # install a language pack you need\n Install pip\n $sudo yum install python-pip\n Install virtualenv\n $sudo pip install virtualenv\n Create the virtual env for django_ocr_server\n $sudo virtualenv /var/www/ocr_server/venv -p /usr/bin/python36 --distribute\n Give rights to the project folder to your user\n $sudo chown -R {your_user} /var/www/ocr_server/\n Activate virtualenv\n $source /var/www/ocr_server/venv/bin/activate\n Install postgresql 11 (The Postgresql version 9.2 that is installing in Centos 7 by default returns an error when applying migrations )\n | $sudo rpm -Uvh https://yum.postgresql.org/11/redhat/rhel-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm\n | $sudo yum install postgresql11-server\n | $sudo yum install postgresql-devel\n | $sudo /usr/pgsql-11/bin/postgresql-11-setup initdb\n | Edit /var/lib/pgsql/11/data/pg_hba.conf\n | host all all 127.0.0.1/32 md5\n | host all all ::1/128 md5\n | $sudo systemctl enable postgresql-11\n | $sudo systemctl start postgresql-11\n | $sudo -u postgres psql\n | # create database django_ocr_server encoding utf8;\n | # create user django_ocr_server with password 'django_ocr_server';\n | # alter database django_ocr_server owner to django_ocr_server;\n | # alter user django_ocr_server createdb; # if you want to run tests\n | # \\q\n | pip install psycopg2 # (on virtualenv django_ocr_server)\n Create django project (on virtualenv django_ocr_server)\n | $cd /var/www/ocr_server\n | $django-admin startproject ocr_server .\n\n Edit ocr_server/settings.py\n Add applications to INSTALLED_APPS\n\n .. code-block::\n\n INSTALLED_APPS = [\n ...\n 'rest_framework',\n 'rest_framework.authtoken',\n 'django_ocr_server',\n 'rest_framework_swagger',\n ]\n\n Configure database connection\n\n .. code-block::\n\n DATABASES = {\n 'default': {\n 'ENGINE': 'django.db.backends.postgresql_psycopg2',\n 'NAME': 'django_ocr_server',\n 'USER': 'django_ocr_server',\n 'PASSWORD': 'django_ocr_server',\n 'HOST': 'localhost',\n 'PORT': '',\n }\n }\n\n Edit ocr_server/urls.py\n .. code-block::\n\n from django.contrib import admin\n from django.urls import path, include\n from rest_framework.documentation import include_docs_urls\n\n admin.site.site_header = 'OCR Server Administration'\n admin.site.site_title = 'Welcome to OCR Server Administration Portal'\n\n urlpatterns = [\n path('admin/', admin.site.urls, ),\n path('docs/', include_docs_urls(title='OCR Server API')),\n path('', include('django_ocr_server.urls'), ),\n ]\n\n Apply migrations (on virtualenv django_ocr_server)\n $python manage.py migrate\n Create superuser (on virtualenv django_ocr_server)\n $python manage.py createsuperuser\n Run server (on virtualenv django_ocr_server), than visit http://localhost:8000/\n $python manage.py runserver\n\nRunning tests\n=============\n Perform under you django_ocr_server virtual environment\n $python manage.py test django_ocr_server.tests\n\nAPI documentation\n=================\n Django-ocr-server provides API documentation use restframework.documentation and swagger.\n Visit http://localhost:8000/swagger and http://localhost:8000/docs/\n\nNote\n====\nYou can think that Django-ocr-sever does not work.\nOptical Character Recognition is a very difficult operation for a server.\nAnd it takes some time.\nIt all depends on the file you want to recognize and the parameters of your server.\nFor example my computer 'Ryzen 7 64 Gb RAM' needs 25\nminutes to recognize a book in pdf format without text layer and contains 500 pages.\n\nLicense\n=======\n The code in this repository is licensed under the Apache License, Version 2.0 (the \"License\");\n you may not use this file except in compliance with the License.\n You may obtain a copy of the License at\n\n http://www.apache.org/licenses/LICENSE-2.0\n\n Unless required by applicable law or agreed to in writing, software\n distributed under the License is distributed on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n See the License for the specific language governing permissions and\n limitations under the License.\n\n**NOTE**: This software depends on other packages that may be licensed under different open source licenses.\n\nCreation a distribution package\n===============================\n As mentioned earlier, the automatic installation script 'install_ubuntu.sh'\n uses the package from the PyPI repository by default. To change this behavior or\n if you need your own distribution package you can build it.\n\n Run command\n | $cd path to cloned project from github\n | $python setup.py sdist\n\n Look in 'dist' directory, there is your package was created.\n\n Also you can continue automatic installation. The package will be used.\n\nDeploying to production\n=======================\nLinux Mint 19 (Ubuntu bionic)\n-----------------------------\n Installing nginx\n $sudo apt install nginx\n Installing uwsgi (on virtualenv django_ocr_server)\n $pip install uwsgi\n Create {path_to_your_project}/uwsgi.ini\n .. code-block::\n\n [uwsgi]\n chdir = {path_to_your_project} # e.g. /home/shmakovpn/ocr_server\n module = {your_project}.wsgi # e.g. ocr_server.wsgi\n home = {path_to_your_virtualenv} # e.g. /home/shmakovpn/.virtualenvs/django_ocr_server\n master = true\n processes = 10\n http = 127.0.0.1:8003\n vacuum = true\n\n Create /etc/nginx/sites-available/django_ocr_server.conf\n .. code-block::\n\n server {\n listen 80; # choose port what you want\n server_name _;\n charset utf-8;\n client_max_body_size 75M;\n location /static/rest_framework_swagger {\n alias {path_to_your virtualenv}/lib/python3.6/site-packages/rest_framework_swagger/static/rest_framework_swagger;\n }\n location /static/rest_framework {\n alias {path_to_your virtualenv}/lib/python3.7/site-packages/rest_framework/static/rest_framework;\n }\n location /static/admin {\n alias {path_to_your virtualenv}/lib/python3.7/site-packages/django/contrib/admin/static/admin;\n }\n location / {\n proxy_pass http://127.0.0.1:8003;\n }\n }\n\n Enable the django_ocr_server site\n $sudo ln -s /etc/nginx/sites-available/django_ocr_server.conf /etc/nginx/sites-enabled/\n\n Remove the nginx default site\n $sudo rm /etc/nginx/sites-enabled/default\n\n Create the systemd service unit /etc/systemd/system/django-ocr-server.service\n .. code-block::\n\n [Unit]\n Description=uWSGI Django OCR Server\n After=syslog.service\n\n [Service]\n User={your user}\n Group={your group}\n Environment=\"PATH={path_to_your_virtualenv}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\"\n ExecStart={path_to_your_virtualenv}/bin/uwsgi --ini {path_to_your_project}/uwsgi.ini\n RuntimeDirectory=uwsgi\n Restart=always\n KillSignal=SIGQUIT\n Type=notify\n StandardError=syslog\n NotifyAccess=all\n\n [Install]\n WantedBy=multi-user.target\n\n Reload systemd\n $sudo systemctl daemon-reload\n Start the django-ocr-server service\n $sudo systemctl start django-ocr-server\n Enable the django-ocr-server service to start automatically after server is booted\n $sudo systemclt enable django-ocr-server\n Start nginx\n $sudo systemctl start nginx\n Enable nginx service to start automatically after server is booted\n $sudo systemctl enable nginx\n Go to http://{your_server}:80\n You will be redirected to admin page\n\nCentos 7\n--------\n Installing nginx\n $sudo apt install nginx\n Installing uwsgi (on virtualenv django_ocr_server)\n $pip install uwsgi\n Create /var/www/ocr_server/uwsgi.ini\n .. code-block::\n\n [uwsgi]\n chdir = /var/www/ocr_server\n module = ocr_server.wsgi\n home = /var/www/ocr_server/venv\n master = true\n processes = 10\n http = 127.0.0.1:8003\n vacuum = true\n\n Create the systemd service unit /etc/systemd/system/django-ocr-server.service\n .. code-block::\n\n [Unit]\n Description=uWSGI Django OCR Server\n After=syslog.service\n\n [Service]\n User=nginx\n Group=nginx\n Environment=\"PATH=/var/www/ocr_server/venv/bin:/sbin:/bin:/usr/sbin:/usr/bin\"\n ExecStart=/var/www/ocr_server/venv/bin/uwsgi --ini /var/www/ocr_server/uwsgi.ini\n RuntimeDirectory=uwsgi\n Restart=always\n KillSignal=SIGQUIT\n Type=notify\n StandardError=syslog\n NotifyAccess=all\n\n [Install]\n WantedBy=multi-user.target\n\n Reload systemd service\n $sudo systemctl daemon-reload\n Chango user of /var/www/ocr_server to nginx\n $sudo chown -R nginx:nginx /var/www/ocr_server\n Start Django-ocr-server service\n $sudo systemctl start django-ocr-service\n Check that port is up\n $sudo netstat -anlpt \\| grep 8003\n | you have to got something like this:\n | tcp 0 0 127.0.0.1:8003 0.0.0.0:* LISTEN 2825/uwsgi\n Enable Django-ocr-server uwsgi service\n $sudo systemctl enable django-ocr-service\n\n Edit /etc/nginx/nginx.conf\n .. code-block::\n\n server {\n listen 80 default_server;\n listen [::]:80 default_server;\n server_name _;\n charset utf-8;\n client_max_body_size 75M;\n location /static/rest_framework_swagger {\n alias /var/www/ocr_server/venv/lib/python3.6/site-packages/rest_framework_swagger/static/rest_framework_swagger;\n }\n location /static/rest_framework {\n alias /var/www/ocr_server/venv/lib/python3.6/site-packages/rest_framework/static/rest_framework;\n }\n location /static/admin {\n alias /var/www/ocr_server/venv/lib/python3.6/site-packages/django/contrib/admin/static/admin;\n }\n location / {\n proxy_pass http://127.0.0.1:8003;\n }\n }\n\n Configure selinux\n .. code-block::\n\n $sudo semanage port -a -t http_port_t -p tcp 8003\n $sudo semanage fcontext -a -t httpd_sys_content_t '/var/www/ocr_server/venv/lib/python3.6/site-packages/rest_framework_swagger/static/rest_framework_swagger(/.*)?'\n $sudo restorecon -Rv '/var/www/ocr_server/venv/lib/python3.6/site-packages/rest_framework_swagger/static/rest_framework_swagger/'\n $sudo semanage fcontext -a -t httpd_sys_content_t '/var/www/ocr_server/venv/lib/python3.6/site-packages/rest_framework/static/rest_framework(/.*)?'\n $sudo restorecon -Rv '/var/www/ocr_server/venv/lib/python3.6/site-packages/rest_framework/static/rest_framework/'\n $sudo semanage fcontext -a -t httpd_sys_content_t '/var/www/ocr_server/venv/lib/python3.6/site-packages/django/contrib/admin/static/admin(/.*)?'\n $sudo restorecon -Rv '/var/www/ocr_server/venv/lib/python3.6/site-packages/django/contrib/admin/static/admin/'\n\n Start nginx service\n $sudo systemctl start nginx\n Enable nginx service\n $sudo systemctl enable nginx\n Configure firewall\n | $sudo firewall-cmd --zone=public --add-service=http --permanent\n | $sudo firewall-cmd --reload\n Go to http://{your_server}:80\n You will be redirected to admin page\n\nUsage examples\n==============\n You can download all examples from https://github.com/shmakovpn/django_ocr_server/tree/master/usage_examples\n\ncurl\n----\n Use curl with '@' before the path of the uploading file\n .. code-block::\n\n #!/usr/bin/env bash\n curl -F \"file=@example.png\" localhost:8000/upload/\n\npython\n------\n Use requests.post function\n .. code-block::\n\n import requests\n\n\n with open(\"example.png\", 'rb') as fp:\n print(requests.post(\"http://localhost:8000/upload/\",\n files={'file': fp}, ).content)\n\nperl\n----\n Use LWP::UserAgent and HTTP::Request::Common\n .. code-block::\n\n #!/usr/bin/perl\n use strict;\n use warnings FATAL => 'all';\n use LWP::UserAgent;\n use HTTP::Request::Common;\n\n my $ua = LWP::UserAgent->new;\n my $url = \"http://localhost:8000/upload/\";\n my $fname = \"example.png\";\n\n my $req = POST($url,\n Content_Type => 'form-data',\n Content => [\n file => [ $fname ]\n ]);\n\n my $response = $ua->request($req);\n\n if ($response->is_success()) {\n print \"OK: \", $response->content;\n } else {\n print \"Failed: \", $response->as_string;\n }\n\nphp\n---\n Use\n .. code-block::\n\n new CURLFile($file, $mime, $name),\n );\n\n curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);\n\n // Execute the request\n $response = curl_exec( $ch);\n echo($response);\n\n curl_close ($ch);\n\n ?>\n\nConfiguration\n=============\n For changing your django_ocr_server behavior you can use\n several parameters in the settings.py of your django project.\n\n | OCR_STORE_FILES Set it to True (default) to enable storing uploaded files on the server\n | OCR_FILE_PREVIEW Set it to True (default) to enable showing uploaded images preview in admin interface\n | OCR_TESSERACT_LANG Sets priority of using languages, default to 'rus+eng'\n | OCR_STORE_PDF Set it to True (default) to enable storing created searchable PDFs on the server\n | OCR_FILES_UPLOAD_TO Sets path for uploaded files\n | OCR_PDF_UPLOAD_TO Sets path for created searchable PDFs\n | OCR_FILES_TTL Sets time to live for uploaded files, uploaded files older this interval will be removed. Use python datetime.timedelta to set it or 0 (default) to disable.\n | OCR_PDF_TTL Sets time to live for created searchable PDFs, PDFs older this interval will be removed. Use python datetime.timedelta to set it or 0 (default) to disable.\n | OCR_TTL Sets time to live for created models of OCRedFile, models older this interval will be removed. Use python datetime.timedelta to set it or 0 (default) to disable.\n\nManagement Commands\n===================\n Run it to clean trash. It removes all uploaded files and PDFs that do not have related models in database.\n $python manage.py clean\n\n Run it to remove models, uploaded files and PDFs, whose time to live (TTL) has expired.\n $python manage.py ttl", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/shmakovpn/django_ocr_server/archive/1.3.zip", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/shmakovpn/django_ocr_server", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "django-ocr-server", "package_url": "https://pypi.org/project/django-ocr-server/", "platform": "", "project_url": "https://pypi.org/project/django-ocr-server/", "project_urls": { "Download": "https://github.com/shmakovpn/django_ocr_server/archive/1.3.zip", "Homepage": "https://github.com/shmakovpn/django_ocr_server" }, "release_url": "https://pypi.org/project/django-ocr-server/1.3/", "requires_dist": null, "requires_python": "", "summary": "", "version": "1.3" }, "last_serial": 5958132, "releases": { "1.2": [ { "comment_text": "", "digests": { "md5": "d456977a5bfa49d709bee9a2f416dcd1", "sha256": "c63c5d3bffa149377678e9992db6a329d153cdd5a3aecd8d7da6ff62ec139c96" }, "downloads": -1, "filename": "django-ocr-server-1.2.tar.gz", "has_sig": false, "md5_digest": "d456977a5bfa49d709bee9a2f416dcd1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 144213, "upload_time": "2019-10-11T02:53:52", "url": "https://files.pythonhosted.org/packages/bd/26/eb826451b06072be368fd888f4f6c4ef159a80fe86792be6d8d4e590c6dd/django-ocr-server-1.2.tar.gz" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "c34723106654bb83748142e47c3ddc85", "sha256": "38295b6e4290c6ebe8e3c1efe93ad5353517c8008d9f7c0d108f15942c4ccf21" }, "downloads": -1, "filename": "django-ocr-server-1.3.tar.gz", "has_sig": false, "md5_digest": "c34723106654bb83748142e47c3ddc85", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 144212, "upload_time": "2019-10-11T03:14:41", "url": "https://files.pythonhosted.org/packages/38/8f/b0e5b9f8ae91041c9666507bcf339e28121f6947db7d5b42cf87528be9cf/django-ocr-server-1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c34723106654bb83748142e47c3ddc85", "sha256": "38295b6e4290c6ebe8e3c1efe93ad5353517c8008d9f7c0d108f15942c4ccf21" }, "downloads": -1, "filename": "django-ocr-server-1.3.tar.gz", "has_sig": false, "md5_digest": "c34723106654bb83748142e47c3ddc85", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 144212, "upload_time": "2019-10-11T03:14:41", "url": "https://files.pythonhosted.org/packages/38/8f/b0e5b9f8ae91041c9666507bcf339e28121f6947db7d5b42cf87528be9cf/django-ocr-server-1.3.tar.gz" } ] }