{ "info": { "author": "Simon Busard", "author_email": "simon.busard@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Topic :: Utilities" ], "description": "wagoner\n=======\n\n``wagoner`` is an anagram for rawogen, a random words generator.\n\n``wagoner`` generates random words based on a given text. A random word is\ngenerated incrementally, and the given text is used to know which character\nshould follow.\n\n\nUsage\n-----\n\n``wagoner`` generates random words from tables extracted from texts. You need\nfirst to extract such a table from a text. To extract a table, use\n``wagoner.table``::\n\n python -m wagoner.table TEXT --output text.table\n\nThis command will extract the table corresponding to ``TEXT`` into the file\n``text.table``.\n\nThen, given the extracted table, ``wagoner.word`` can extract random words::\n\n python -m wagoner.word text.table\n\nThis command will generate ten words of ten characters, based on the\ninformation in ``text.table``. To generate another number of words, use the\n``--count`` option and to change their length, the ``--length`` option.\n\nRandom words can also be generated from trees. While the tables only tell\nthe generation which character should follow, trees also ensure that the\ngenerated word will start like a regular word (from the text it comes from) and\nwill end like a regular (maybe different word). Trees are built from a\ntable or from a text::\n\n python -m wagoner.tree CONTENT --output text.tree\n\nThis command will extract a tree from ``CONTENT`` and save it into the file\n``text.tree``.\n\nWarning: trees can be very large and expensive to build; to control their\ncomplexity, you can use the ``--prefix`` and ``--length`` options. See below\nfor more information.\n\n\nHow does it work\n----------------\n\nThe main idea behind wagoner is to use a text to conduct the generation of\nrandom words. The words in the text show what usually follows a given prefix,\nand this information is used to choose the next character when building a word\nincrementally.\n\nFor example, given the text composed of the two words ``ambiguous`` and\n``gamblings``, and the prefix ``ig``, the first word tells us that the next\ncharacter can be ``u`` (because in ``ambiguous``, the sub-word ``ig`` is\nfollowed by ``u``), and the second word tells us that the next character can\nbe ``a`` or ``s`` (because in ``gamblings``, the prefix ``g`` is followed by\n``a`` or ``s``). The next character is chosen randomly among the possible next\ncharacters and the word is incrementally generated following the same rule.\n\nThis next character is chosen randomly among the possible next characters, but\nweights are used to more probably use frequent characters, and longer\nprefixes. In other words, when choosing the next character for the prefix\n``ig``, more weight is given to ``u`` than to ``a`` and ``s`` because ``u``\nfollows ``ig`` in ``ambiguous`` while ``a`` and ``s`` only follow ``g`` in\n``gamblings``. Furthermore, if in the given text, a character follows more\nfrequently a given prefix than another, the first one will have a higher\nchance to be chosen. This last characteristics of the generation can be turned\ndown by using the ``--flatten`` option of ``wagoner.word``. If this option is\nactive, two characters following the same prefix will have the same chance to\nbe chosen; nevertheless, a character following a shorter prefix will have less\nchance to be chosen than a character following a longer prefix.\n\nThis information of the frequence of successing characters in the words of\nthe given text is stored in tables. Note that the ``--flatten`` option is\nalso present with ``wagoner.table``; this forces ``wagoner.table`` to\nextract a table in which the number of occurrences of a character following a\ngiven prefix is forgotten, giving the same result as the ``--flatten`` option\nof ``wagoner.word`` at generation.\n\nFinally, when incrementally generating a random word, the full prefix is used\nto extract next character. In usual languages such as English, knowing the\nfull prefix is not necessary to generate a pronounceable word; instead, three\nor four characters are usually sufficient to choose a next character that will\nsound correctly. Restricting the generation of words based on a bounded prefix\ninstead of the full word can be achieved by using the ``--prefix`` option of\n``wagoner.word``. Similarly, the same option can be used on ``wagoner.table``\nto directly generate (smaller) tables that will not propose next characters\nfor longer prefixes.\n\nTrees store more information than tables, but are also tailored to more\nstrict generation. Trees store the information about which character should\nfollow a given prefix, but only for words of a given length. They also store\ninformation to generate words starting as words of the text and ending as them.\nBecause of this addition information, trees can be really complex, huge and\nlong to generate; they can potentially store each word of a given length\ngenerable from the table. To tackle this complexity, trees can be limited to\nlooking at only a suffix of the current prefix by using the ``--prefix``\noption; furthermore, by building trees for shorter words (using the\n``--length`` option), these trees are smaller.\n\n\nExample\n-------\n\nLet's take the list of English words available on\nhttp://www-01.sil.org/linguistics/wordlists/english/. This list contains more\nthan 100000 English words and should be useful to generate random words that\nsound English.\n\nFirst thing to do before generating random words is to extract the table::\n\n python -m wagoner.table wordsEn.txt --output wordsEn.table\n\nThe table is extracted and saved in the ``wordsEn.table`` file. Then, we can\ngenerate random words::\n\n python -m wagoner.word wordsEn.table\n\nthat outputs, for example::\n\n joggeriest\n lvinistica\n cleiadicat\n zenerousne\n ulencering\n mmanencien\n zeratenesi\n keynoteric\n encientnes\n crappinesa\n\nYou can see that some words are really like real words: ``crappinesa`` really\nlooks like ``crappiness``, a word of ``wordsEn.txt``. To avoid generating such\ntoo real words, we can ask wagoner to only consider finite prefixes; this will\navoid to be trapped in the increasing proability of ressembling the word\n``crappiness`` as the word is generated::\n\n python -m wagoner.word wordsEn.table --prefix=2\n\nthat outputs, for example::\n\n keyelittat\n retimcenve\n quedectrot\n fodcalitur\n xcedission\n queffliqui\n eshedlerad\n ficklapett\n quatersous\n sulationur\n\nIn this case, the words are still pronounceable, but are not words of the\n``wordsEn.txt`` file.\n\nIf you want to generate words that start and end the same way words of the\ntext start and end, you can generate a tree from the table::\n\n python -m wagoner.tree wordsEn.table --output=wordsEn.tree --prefix=3\n\nThe tree can then be used to generate words::\n\n python -m wagoner.word wordsEn.tree\n\nthat outputs, for example::\n\n sinationso\n disjudging\n titualimal\n avespolybd\n prophology\n japackersc\n nonneappra\n overedefra\n oxideodore\n wordshedro\n\n\nLibrary usage\n-------------\n\n``wagoner`` can also be used as a library. ``wagoner.table.Table`` represent\ntables and a table can be extracted from a text file with::\n\n table = Table.from_words(wagoner.utils.extract_words(text_file))\n\n``Table.from_words`` accept two optional arguments:\n\n* ``prefix``: if greater than 0, the length of the prefixes to take into\n account when generating the word (default is 0).\n* ``flatten``: if ``True``, the table is flattened and two successing\n characters for the same prefix will have the same weight (default is\n ``False``).\n\nFrom such a table, a random word can be extracted::\n\n word = table.random_word(word_length)\n\nwhere ``word_length`` is the length of the desired word. ``random_word``\naccepts several optional arguments:\n\n* ``prefix`` and ``flatten``, as above.\n* ``start``: if ``True``, the generated word starts as a word of the text\n the table is extracted from (default is ``False``).\n* ``end``: if ``True``, the generated word ends as a word of the text\n the table is extracted from (default is ``False``).\n Warning: this option should not be used because it is very time consuming.\n\nFurthermore, ``wagoner.tree.Tree`` represent trees and a tree can be\nextracted from a table with::\n\n tree = Tree.from_table(table, word_length)\n\nwhere ``table`` is a table like the one above and ``word_length`` is the\nlength of the words the tree will produce. Like ``Table.from_words``,\n``Tree.from_table`` supports two optional arguments: ``prefix`` and\n``flatten``. In this case, the ``prefix`` argument is very important\nbecause, if set to 0 (the default value), the tree can be really huge, even\nimpossible to stay in memory, and will need a lot of time to be built.\n\nFrom such a te, a random word can be extracted::\n\n word = tree.random_word()\n\nUnlike the tables, trees contain the complete information to extract words\n(their length, whether flattening the weights) and ``random_word`` takes no\noptional argument (technically, it accepts any arguments, but will ignore\nthem).\n\n\nChangelog\n---------\n\n* 1.1: support for Python 2.7.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sbusard/wagoner", "keywords": "random word generation", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "wagoner", "package_url": "https://pypi.org/project/wagoner/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/wagoner/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/sbusard/wagoner" }, "release_url": "https://pypi.org/project/wagoner/1.1/", "requires_dist": null, "requires_python": null, "summary": "A random word generator.", "version": "1.1" }, "last_serial": 1663573, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "adf08d6ab7f6ccac01a24ad6d0c7a8a5", "sha256": "8482219be6ee12387c859b7c37c6e0ae8ea8d7f49b5894bcd94ad3b6d1ae44f4" }, "downloads": -1, "filename": "wagoner-1.0.zip", "has_sig": false, "md5_digest": "adf08d6ab7f6ccac01a24ad6d0c7a8a5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19735, "upload_time": "2015-08-03T15:43:01", "url": "https://files.pythonhosted.org/packages/9f/35/9167975b5114558d95f3f81287e6b9098e9e89ed500cdb41df6adc59a136/wagoner-1.0.zip" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "03958c1e8fcbb6be48f37b8ec853036f", "sha256": "67acb0f14a66ae840125a6695912ff617ac88d1c87389e46df20bb19f1dcc789" }, "downloads": -1, "filename": "wagoner-1.1.zip", "has_sig": false, "md5_digest": "03958c1e8fcbb6be48f37b8ec853036f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20564, "upload_time": "2015-08-04T12:10:29", "url": "https://files.pythonhosted.org/packages/1a/b7/23ea29aec4d0f2df35b299755bc5f44dd5ce41d30f24a9464431750f253e/wagoner-1.1.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "03958c1e8fcbb6be48f37b8ec853036f", "sha256": "67acb0f14a66ae840125a6695912ff617ac88d1c87389e46df20bb19f1dcc789" }, "downloads": -1, "filename": "wagoner-1.1.zip", "has_sig": false, "md5_digest": "03958c1e8fcbb6be48f37b8ec853036f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20564, "upload_time": "2015-08-04T12:10:29", "url": "https://files.pythonhosted.org/packages/1a/b7/23ea29aec4d0f2df35b299755bc5f44dd5ce41d30f24a9464431750f253e/wagoner-1.1.zip" } ] }