{ "info": { "author": "Thibault Cl\u00e9rice", "author_email": "", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Topic :: Text Processing :: Linguistic" ], "description": "\n# Le Boucher d'Amsterdam\n\nBoudams, or \"Le boucher d'Amsterdam\", is a deep-learning tool built for tokenizing Latin or Medieval French languages.\n\n## How to cite\n\nAn article has been published about this work : https://hal.archives-ouvertes.fr/hal-02154122v1\n\n```text\n@unpublished{clerice:hal-02154122,\n TITLE = {{Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin}},\n AUTHOR = {Cl{\\'e}rice, Thibault},\n URL = {https://hal.archives-ouvertes.fr/hal-02154122},\n NOTE = {working paper or preprint},\n YEAR = {2019},\n MONTH = Jun,\n KEYWORDS = {convolutional network ; scripta continua ; tokenization ; Old French ; word segmentation},\n PDF = {https://hal.archives-ouvertes.fr/hal-02154122/file/Evaluating_Deep_Learning_Methods_for_Tokenization_of_Scripta_Continua_in_Old_French_and_Latin%284%29.pdf},\n HAL_ID = {hal-02154122},\n HAL_VERSION = {v1},\n}\n\n```\n\n## How to\n\nInstall the usual way you install python stuff: `python setup.py install` (**Python >= 3.6**)).\n\nThe config file can be kickstarted using `boudams template config.json`, we recommend using the following settings :\n\n- `linear-conv-no-pos` for the model, as it is not limited by the input size;\n- `normalize` and `lower` to `True` depending on your dataset size.\n\nThe initial dataset is pretty small but if you want to build with your own, it's fairly simple : you need data in the \nfollowing shape : `\"samesentencesame sentence\"` where the first element is the same than the second but with no\nspace and they are separated by tabs (`\\t`, marked here as ``).\n\n\n```json\n{\n \"name\": \"model\",\n \"max_sentence_size\": 150,\n \"network\": {\n \"emb_enc_dim\": 256,\n \"enc_n_layers\": 10,\n \"enc_kernel_size\": 3,\n \"enc_dropout\": 0.25\n },\n \"model\": \"linear-conv-no-pos\",\n \"learner\": {\n \"lr_grace_periode\": 2,\n \"lr_patience\": 2,\n \"lr\": 0.0001\n },\n \"label_encoder\": {\n \"normalize\": true,\n \"lower\": true\n },\n \"datasets\": {\n \"test\": \"./test.tsv\",\n \"train\": \"./train.tsv\",\n \"dev\": \"./dev.tsv\",\n \"random\": true\n }\n}\n```\n\nThe best architecture I find for medieval French was Conv to Linear without POS using the following setup:\n\n```json\n{\n \"network\": {\n \"emb_enc_dim\": 256,\n \"enc_n_layers\": 10,\n \"enc_kernel_size\": 5,\n \"enc_dropout\": 0.25\n },\n \"model\": \"linear-conv-no-pos\",\n \"batch_size\": 64,\n \"learner\": {\n \"lr_grace_periode\": 2,\n \"lr_patience\": 2,\n \"lr\": 0.00005,\n \"lr_factor\": 0.5\n }\n}\n```\n\n\n## Credits\n\nInspirations, bits of code and source for being able to understand how Seq2Seq words or write my own Torch module come \nboth from [Ben Trevett](https://github.com/bentrevett/pytorch-seq2seq) and [Enrique Manjavacas](https://github.com/emanjavacas/pie). \n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ponteineptique/boudams", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "boudams", "package_url": "https://pypi.org/project/boudams/", "platform": "", "project_url": "https://pypi.org/project/boudams/", "project_urls": { "Homepage": "https://github.com/ponteineptique/boudams" }, "release_url": "https://pypi.org/project/boudams/0.1.0/", "requires_dist": [ "torch (>=1.0.1.post2)", "tqdm (>=4.31.1)", "unidecode (>=1.0.23)", "leven (==1.0.4)", "click (<=8.0,>=7.0)", "regex (==2019.5.25)", "mufidecode" ], "requires_python": ">=3.6.0", "summary": "A framework and toolkit for automatic segmentation", "version": "0.1.0" }, "last_serial": 5963670, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "599f48648ae4fefab4d5af38d43fde98", "sha256": "939726ac993eb910786a9599f7b714d4773bda12272f67b5ed0b278911780642" }, "downloads": -1, "filename": "boudams-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "599f48648ae4fefab4d5af38d43fde98", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 3624, "upload_time": "2019-10-12T09:36:33", "url": "https://files.pythonhosted.org/packages/9e/6e/6fb5c75f8b14a6e1fb868d7c2790d4dfb587988cafc0152a195ba99e4990/boudams-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bcbef31a04ff697bfac7c8c11632c6d7", "sha256": "fac7046fb4ea673e0214c668555b9e98446d6cdaab61363d73207f2154fce311" }, "downloads": -1, "filename": "boudams-0.1.0.tar.gz", "has_sig": false, "md5_digest": "bcbef31a04ff697bfac7c8c11632c6d7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 3978, "upload_time": "2019-10-12T09:36:35", "url": "https://files.pythonhosted.org/packages/a2/39/5e9a04e601004ca6942aefa1872852264187d67f0f28025a88c8a0574500/boudams-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "599f48648ae4fefab4d5af38d43fde98", "sha256": "939726ac993eb910786a9599f7b714d4773bda12272f67b5ed0b278911780642" }, "downloads": -1, "filename": "boudams-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "599f48648ae4fefab4d5af38d43fde98", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 3624, "upload_time": "2019-10-12T09:36:33", "url": "https://files.pythonhosted.org/packages/9e/6e/6fb5c75f8b14a6e1fb868d7c2790d4dfb587988cafc0152a195ba99e4990/boudams-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bcbef31a04ff697bfac7c8c11632c6d7", "sha256": "fac7046fb4ea673e0214c668555b9e98446d6cdaab61363d73207f2154fce311" }, "downloads": -1, "filename": "boudams-0.1.0.tar.gz", "has_sig": false, "md5_digest": "bcbef31a04ff697bfac7c8c11632c6d7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 3978, "upload_time": "2019-10-12T09:36:35", "url": "https://files.pythonhosted.org/packages/a2/39/5e9a04e601004ca6942aefa1872852264187d67f0f28025a88c8a0574500/boudams-0.1.0.tar.gz" } ] }