{ "info": { "author": "Ivan Nesic", "author_email": "contact@droll.science", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Topic :: Software Development :: Libraries", "Topic :: Text Processing :: Linguistic" ], "description": "cwsplit\n=======\n\n|build status|\n\nCompound Word Splitter (cwsplit) for any language supported by\n`enchant `__.\n\nInstallation\n------------\n\nMake sure you have enchant dictionary installed.\n\nYou can check the list of installed packages by running:\n\n.. code:: python\n\n import enchant\n print(enchant.list_languages())\n\nCheck the `pyenchant `__ and\n`enchant `__ links for more\ninfo.\n\nUsage\n-----\n\nImport module:\n\n.. code:: python\n\n from cwsplit import split\n\nFor German (Default)\n\n.. code:: python\n\n split('Rindfleisch')\n # ['rind', 'fleisch']\n\nFor English:\n\n.. code:: python\n\n split('blackboard', 'en_en')\n # ['black', 'board']\n\nor\n\n.. code:: python\n\n from cwsplit import load_dict\n load_dict('en_en')\n split('blackboard')\n # ['black', 'board']\n\nSometimes the word is misspelled or just doesn\u2019t exist. By deafult the\nword will be split in characters until the longer word shows up.\n\nPositive effect of this behaviour is the connecting letters like \u2018s\u2019 in\n*\u00fcberwachungsaufgaben* will be isolated.\n\nOn the other hand, let\u2019s imagine we have a non-existing word\n*gibberishfleisch*, this will be decompounded into words *gib*, *b*,\n*e*, *r*, *i*, *s*, *h* and *fleisch*.\n\n.. code:: python\n\n split('gibberishfleisch', language='de_de')\n # ['gib', 'b', 'e', 'r', 'i', 's', 'h', 'fleisch']\n\nThis does not look good at all. This is why you can select the sortest\nword size, so all shorter consecutive words will be concatenated. For\nexample, let\u2019s define the shortest ward as 4 characters long:\n\n.. code:: python\n\n split('gibberishfleisch', language='de_de', min_word_size=4)\n # ['gibberish', 'fleisch']\n\nNow we get two words *gibberish* and *fleisch*, which is something you\nwould expect.\n\nThis will not affect the correct words that have a connecting \u2018s\u2019.\n\nFor example:\n\n.. code:: python\n\n split('\u00fcbertragungsgesetz', min_word_size=4)\n # ['\u00fcbertragung','s', 'gesetz']\n\nremains correct.\n\nAlgorithm\n---------\n\nThis is a very simple recursive algorithm that looks up for the longest\nword inside of the provided word, by checking if it exists in the\n`enchant `__ dictionary.\nThe output is always returned as a list of strings. If no shorter words\nare found, the input word will be return as a single element list.\n\nDevelopers\n----------\n\nUpload script uses `pandoc `__ to\nconvert README.md to README in *rst* fromat, needed in order to create\nthe package. Make sure you have it installed if you plan to use the\nscript.\n\n.. |build status| image:: http://img.shields.io/travis/username/repo/master.svg?style=flat\n :target: https://travis-ci.com/fatkaratekid/cwsplit", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://droll.science", "keywords": "compound word splitter language english german", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "cwsplit", "package_url": "https://pypi.org/project/cwsplit/", "platform": "", "project_url": "https://pypi.org/project/cwsplit/", "project_urls": { "Homepage": "http://droll.science" }, "release_url": "https://pypi.org/project/cwsplit/0.4.1/", "requires_dist": null, "requires_python": "", "summary": "Compund word splitter for enchant supported languages", "version": "0.4.1" }, "last_serial": 3720729, "releases": { "0.4.1": [ { "comment_text": "", "digests": { "md5": "7b504f02f2871d68354c8c806ab7885f", "sha256": "66f52f9fcc6f5095c3c3539cc3a0fc2a7998ac620eda0412158381afe76c6c4a" }, "downloads": -1, "filename": "cwsplit-0.4.1.tar.gz", "has_sig": false, "md5_digest": "7b504f02f2871d68354c8c806ab7885f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3205, "upload_time": "2018-03-30T20:46:55", "url": "https://files.pythonhosted.org/packages/28/99/05c7d0316fb3aed3065fb35b9a9182c4fb4242524668aafbbff9a6dac85b/cwsplit-0.4.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7b504f02f2871d68354c8c806ab7885f", "sha256": "66f52f9fcc6f5095c3c3539cc3a0fc2a7998ac620eda0412158381afe76c6c4a" }, "downloads": -1, "filename": "cwsplit-0.4.1.tar.gz", "has_sig": false, "md5_digest": "7b504f02f2871d68354c8c806ab7885f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3205, "upload_time": "2018-03-30T20:46:55", "url": "https://files.pythonhosted.org/packages/28/99/05c7d0316fb3aed3065fb35b9a9182c4fb4242524668aafbbff9a6dac85b/cwsplit-0.4.1.tar.gz" } ] }