{ "info": { "author": "Elvis MBONING", "author_email": "levismboning@ntealan.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: Academic Free License (AFL)", "License :: OSI Approved :: GNU Affero General Public License v3", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering :: Image Recognition", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Build Tools", "Topic :: Text Processing :: Linguistic", "Topic :: Text Processing :: Markup" ], "description": "MetaLex Tool (tool for lexicographers and metalexicographers)\n===============================================================\n\n**metalex** is general tool in AGPL License for **lexicographics** and **metalexicographics** activities.\nFor current development version of this tool, see `MetaLex/v2.0 `_\n\n\n\nUsage\n=====\n\n- metalex proceeds in this way (open github project page to see the graphic)\n\n::\n\n 1-> Enhance image quality \n 2-> OCR \n 3-> Enhance quality of OCR text\n 4-> Codification parser of articles dictionaries\n 5-> Text codification (XML, HTML, LMF, TEI)\n 6-> Tools for metalexicographic analysis \n\n\n- This is an example of this process used with metalex \n\n::\n\n I am a metalexicographer or linguist and I have paper dictionaries. \n I want to perform a diachronic study of the evolution of the wording of \n definitions in a collection of dictionaries available from period A to period B.\n\n\n- Traditionally or at best, the contemporary metalexicographer (according to our point of view) would apply the following methodology :\n\n::\n\n 1- Scanning of printed materials (Scan) and enhance its qualities\n 2- Optical reading of the pictures (Ocrisation) = extract articles content \n 3- Manual Error Corrections of text articles \n 4- Marking of the articles with regular standard \n 5- Metalexographical analysis / decryption of articles \n\n\n- metalex through its modules operates in the same way by successively executing each of these tasks automatically.\n\n::\n\n 1) MetaLex enhances the quality of dictionary images \n **metalex.ocrtext.normalizeImage.EnhanceImages().filter(f.DETAIL)**\n 2) MetaLex extract from dictionary images all dictionary articles \n **metalex.ocrtext.make_ocr.image_to_text()**\n 3) MetaLex corrects dictionary articles \n **metalex.ocrtext.make_text_well()**\n 4) MetaLex marking dictionary articles depending of some standard \n **metalex.xmlised.put_xml('tei') or MetaLex.xmlised.put_xml('lmf')**\n 5) MetaLex generates some metalexicographics analysis of part of content dictionary \n **metalex.xmlised.handleStat()**\n\n\n- Some other more complex processes can be done !\n\n\nRequirements\n============\n\nMetaLex is developed in **Python 2.7** environment, the following packages are required :\n\n- We can install all extra package dependencies manually\n\n.. code-block:: shell\n\n sudo apt-get install build-essential libssl-dev libffi-dev python-dev\n sudo pip install Cython\n sudo apt-get install libtesseract-dev libleptonica-dev libjpeg-dev zlib1g-dev libpng-dev\n sudo apt-get install tesseract-ocr-all\n sudo apt-get install python-html5lib\n sudo apt-get install python-lxml\n sudo apt-get install python-bs4\n sudo pip install pillow\n sudo pip install --no-cache-dir -I pillow\n sudo pip install http://effbot.org/downloads/Imaging-1.1.7.tar.gz\n sudo pip install termcolor\n sudo CPPFLAGS=-I/usr/local/include pip install tesserocr\n\n\n- Or follow these steps from github project page\n\n.. code-block:: shell\n\n sudo ./config.sh # Install linux package dependencies\n\n sudo pip install -r requirements.txt # Install python module dependencies\n\n\n- or merely use (the preferred option is to begin with the extra linux packages)\n\n.. code-block:: shell\n\n sudo pip install metalex\n\n\nHow to run MetaLex ?\n====================\n\n- Run the help command \n\n.. code-block:: shell\n\n metalex -h\n\n\n.. code-block:: shell\n\n\n ---------------------------------------------------------------\n | * * * * * * * * * * * * * * * * ** ** |\n | * * * * * * * * * * * * * * |\n | * * * * * * * * * * * * * * ** ** |\n ---------------------------------------------------------------\n metalex is general tool for lexicographics and metalexicographics activities\n\n\n\n optional arguments:\n -h, --help show this help message and exit\n -v, --version show program's version number and exit\n -p PROJECTNAME, --project PROJECTNAME\n Defined metalex project name\n -c author comment contributors, --confproject author comment contributors\n Defined metalex configuration for the current project\n -i [IMAGEFILE], --dicimage [IMAGEFILE]\n Input one or multiple dictionary image(s) file(s) for\n current metalex project\n --dld DOWNLOAD Download ocropy model from Github for current metalex\n project\n -o {ocropy,tesserocr}, --ocrtype {ocropy,tesserocr}\n OCR type to use for current metalex project\n -m {modeldef,}, --model {modeldef,}\n OCR LSTM model to use for current metalex project\n -d IMAGESDIR, --imagedir IMAGESDIR\n Input folder name of dictionary image files for\n current metalex project\n --imgalg actiontype value\n Set algorithm for enhancing dictionary image files for\n current metalex project (actiontype must be : contrast\n or bright or filter)\n -r FILERULE, --filerule FILERULE\n Defined file rules that we use to enhance quality of\n OCR result\n -l LANG, --lang LANG Set language for optical characters recognition and\n others metalex treatment\n -x {xml,lmf,tei}, --xml {xml,lmf,tei}\n Defined output result treatment of metalex\n -s, --save Save output result of the current project in files\n -t, --terminal Show result of the current treatment in the terminal\n\n ------------------------------------------------------------------------------\n metalex project : special Thank to Bill for metalex-vagrant version\n ------------------------------------------------------------------------------\n\n\n\n\n- Build the file rules of the project.\n\nMetaLex takes files using specific structure to enhance output text of OCR data (from dictionary image files). **\\\\W** for word replacement, **\\\\C** for character replacement and **\\\\R** for regular expression replacement. The spaces between headers are used to to describe remplacement.\n\n::\n\n \\START\n \\MetaLex\\project_name\\type_of_project\\lang\\author\\date\n \\W \n /t'/t\n /{/f.\n /E./f.\n \\C\n /i'/i\n \\R\n /a-z+/ij\n \\END\n\n- If you want to use ocropy OCR, please download its models first : It is save at **$home/metalex/models**.\n\n.. code-block:: shell\n\n metalex --dld modelDef\n\n\n- Run your project with the default parameters except dictionary images data and save results. You must create a folder containing dictionary image files such as **test-files/images/**.\n\n.. code-block:: shell\n\n # [OCRopy OCR] We defined a folder containing dictionary images for current process\n\n metalex -d 'test-files/images' -o ocropy -m modeldef -s \n\n # [Tesserocr OCR] Or you can define a single dictionary image file\n\n metalex -i 'test-files/images/LarClasIll_1911_gay-Trouin.jpg' -o tesserocr -m modeldef -s \n\n\n- Run your project with your own set of parameters and save results\n\n.. code-block:: shell\n\n metalex -p 'projectname' -c 'author' 'comment' 'contributors' -d 'test-files/images' -r 'test-files/file_Rule.dic' -l 'fra' -o tesserocr -m modeldef -s\n\n\n- **OUTPUT :** For the first command (without parameters), the result in the console will produce this. **NB :** With parameters, error and warning messages will disappear.\n\n.. code-block:: latex\n\n Open github project page to see the graphix\n\n\nContributors\n============\n\nSpecial thank to `Bill `_ for `MetaLex-vagrant `_ version for windows, Mac OS 6, Linux\n\n\nReference\n=========\n\nPlease don't forget to cite this work :\n\n.. code-block:: latex\n\n @article{Mboning-Elvis,\n title = {Quand le TAL s'empare de la m\u00e9talexicographie : conception d'un outil pour le m\u00e9talexicographe},\n author = {Mboning, Elvis},\n url = {https://github.com/Levis0045/MetaLex},\n date = {2017-06-20},\n shool = {Universit\u00e9 de Lille 3},\n year = {2017},\n pages = {12},\n keywords = {m\u00e9talexicographie, TAL, fouille de donn\u00e9es, extraction d'information, lecture optique, lexicographie, Xmlisation, DTD}\n }\n\n\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/Levis0045/MetaLex", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Levis0045/MetaLex", "keywords": "OCR numerisation lexicographie linguistique TAL LSTM ML TAL", "license": "AGPL", "maintainer": "", "maintainer_email": "", "name": "metalex", "package_url": "https://pypi.org/project/metalex/", "platform": "", "project_url": "https://pypi.org/project/metalex/", "project_urls": { "Download": "https://github.com/Levis0045/MetaLex", "Homepage": "https://github.com/Levis0045/MetaLex" }, "release_url": "https://pypi.org/project/metalex/2.6.5/", "requires_dist": [ "termcolor", "tesserocr", "bs4", "lxml", "beautifulsoup4", "PIL", "html5lib", "Pillow", "Cython", "termcolor", "imagesize", "psutil", "metalex.test; extra == 'test'" ], "requires_python": "", "summary": "MetaLex is tool for lexicographic and metalexicographic activities", "version": "2.6.5" }, "last_serial": 3836858, "releases": { "2.6.3": [ { "comment_text": "", "digests": { "md5": "567d29b7fe7db7679624bde567f428af", "sha256": "8b68405e446f97389cd90c48e1efd545a7c77e9d018f9b101caac6a9e2e3540f" }, "downloads": -1, "filename": "metalex-2.6.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "567d29b7fe7db7679624bde567f428af", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 154230, "upload_time": "2018-05-05T08:09:05", "url": "https://files.pythonhosted.org/packages/23/52/4d69fc606ba28f2b13c318da083829d9e4a12f2c74463999e6f1631e9a20/metalex-2.6.3-py2.py3-none-any.whl" } ], "2.6.4": [ { "comment_text": "", "digests": { "md5": "4b611596fc55733cf186ce37153eca53", "sha256": "1c1d855bfb33c62c515b635124c7596d44f36e957c99f7e6f3887f4927f51f2d" }, "downloads": -1, "filename": "metalex-2.6.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "4b611596fc55733cf186ce37153eca53", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 154234, "upload_time": "2018-05-05T08:13:28", "url": "https://files.pythonhosted.org/packages/f2/09/ba70632d3388630664ce38c9463eb0b7ec5ae2d9943c8e719ee1865d277e/metalex-2.6.4-py2.py3-none-any.whl" } ], "2.6.5": [ { "comment_text": "", "digests": { "md5": "23a55f914b020a641bb71dc9348d7763", "sha256": "57661b2a5daf57d04719be06f16c80276a5b44aa25fd178927d9f565060eb6b2" }, "downloads": -1, "filename": "metalex-2.6.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "23a55f914b020a641bb71dc9348d7763", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 155321, "upload_time": "2018-05-05T11:34:44", "url": "https://files.pythonhosted.org/packages/fd/a8/1e39f0fac4e4f249881456e341f3ae4dbb507799d47f4f7d7ad7c4cc434a/metalex-2.6.5-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "23a55f914b020a641bb71dc9348d7763", "sha256": "57661b2a5daf57d04719be06f16c80276a5b44aa25fd178927d9f565060eb6b2" }, "downloads": -1, "filename": "metalex-2.6.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "23a55f914b020a641bb71dc9348d7763", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 155321, "upload_time": "2018-05-05T11:34:44", "url": "https://files.pythonhosted.org/packages/fd/a8/1e39f0fac4e4f249881456e341f3ae4dbb507799d47f4f7d7ad7c4cc434a/metalex-2.6.5-py2.py3-none-any.whl" } ] }