{
    "info": {
        "author": "Cl\u00e9ment Choukroun, Alexandre Mourachko",
        "author_email": "clement.choukroun@ubisoft.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 1 - Planning",
            "Natural Language :: English",
            "Operating System :: OS Independent",
            "Programming Language :: Python",
            "Programming Language :: Python :: 3.6",
            "Topic :: Communications"
        ],
        "description": "cross-words\n==========================================\n\n`cross-words` is a python module that allows you to easily create a corpus of documents with parameterized entities.  \n\nThe main goal of `cross-words` is to offer an easy way to create either sentences or stories for use in chat bot training.\nAs of May 2018, it is mostly designed to be used with [Rasa NLU/Core](http://rasa.com/)\n\n1. [Installation](#install)\n2. [How to use this package](#usage)\n\n# 1. Installation<a name=\"install\"></a>\n\nYou can install it with pip:\n\n    pip install cross-words\n\nOr directly from github if you want the latest development version\n\n    pip install git+https://github.com/data-chirps/cross-words.git\n\n# 2. How to use this package<a name=\"usage\"></a>\n## cross-words DSL\n`cross-words` is based on a simple yet powerful Domain Specific Language.\nWhen used along with Rasa NLU/Core, it uses 3 concepts:\n\n- **intents:** the objective of the chatbot's user (e.g. ask to book a restaurant, confirm a chatbot inquiry etc.)\n- **entities:** specific parts of a sentence containing key information (e.g. which restaurant to book, how many people etc.)\n- **aliases:** lists of synonyms that can be used interchangeably\n\nMore details are available at [Rasa NLU](https://nlu.rasa.com/tutorial.html)\n\nGiven a configuration file (.txt) containing all of the above, `cross-words` is able to generate many training sentences/conversations using combinations of sentence parts.\n\n`cross-words` configuration files look like this:\n\n```\nCould I have the number of @[subject_filter] ~[owners] in @[geo_filter] @[time_filter]?\n\n\n@[time_filter]\n    this month\n    this year\n    LTD\n        life to date\n        up to date\n    since release\n        since launch\n    since beginning of fiscal year\n\n@[geo_filter]\n    France\n    Germany\n    US\n        United States\n        America\n    Canada\n    Italy\n\n@[subject_filter]\n    birds\n        parrots\n        owl\n    dogs\n    cats\n        persian\n\n\n~[owners]\n    owners\n    possessors\n```\n\nIf asked for sentences, `cross-words` will generate a .md file whose first lines will be :\n\n```\n- Could I have the number of [birds](subject_filter) possessors in [Canada](geo_filter) [life to date](time_filter)?\n- Could I have the number of [parrots](subject_filter) possessors in [United States](geo_filter) [since release](time_filter)?\n- Could I have the number of [owl](subject_filter) possessors in [Italy](geo_filter) [up to date](time_filter)?\n- Could I have the number of [owl](subject_filter) possessors in [Italy](geo_filter) [since release](time_filter)?\n- Could I have the number of [dogs](subject_filter) owners in [United States](geo_filter) [LTD](time_filter)?\n- Could I have the number of [dogs](subject_filter) owners in [Canada](geo_filter) [this year](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [France](geo_filter) [this year](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [US](geo_filter) [since release](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [America](geo_filter) [this month](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [Canada](geo_filter) [life to date](time_filter)?\n\n```\nThis file is then ready to use as training input to Rasa NLU.\n\nIf asked for stories:\n\n```\n## Genereated Story 815310784239368\n* acquisition{}\n    - utter_ask_time_filter\n* acquisition{\"time_filter\": \"since beginning of fiscal year\"}\n    - slot{\"time_filter\": \"since beginning of fiscal year\"}\n    - utter_ask_geo_filter\n* acquisition{\"geo_filter\": \"America\"}\n    - slot{\"geo_filter\": \"America\"}\n    - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"dogs\"}\n    - slot{\"subject_filter\": \"dogs\"}\n    - action_acquisition\n\n## Genereated Story 257661587723758\n* acquisition{\"time_filter\": \"since release\", \"geo_filter\": \"Germany\"}\n    - slot{\"time_filter\": \"since release\"}\n    - slot{\"geo_filter\": \"Germany\"}\n    - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"owl\"}\n    - slot{\"subject_filter\": \"owl\"}\n    - action_acquisition\n\n## Genereated Story 877699493192194\n* acquisition{\"subject_filter\": \"parrots\"}\n    - slot{\"subject_filter\": \"parrots\"}\n    - utter_ask_time_filter\n* acquisition{\"time_filter\": \"LTD\"}\n    - slot{\"time_filter\": \"LTD\"}\n    - utter_ask_geo_filter\n* acquisition{\"geo_filter\": \"France\"}\n    - slot{\"geo_filter\": \"France\"}\n    - action_acquisition\n```\nThis file is then ready to use for training with Rasa Core.\n\n## Generating files\n\n`cross-words` mainly comes with 2 functions: parse_input and generate. All other functions are implementation details.\n\n### generate(input_path, output_path=\"./xwords/outputs/\", intent_string=None, output_prefix='', training_ratio=1.0, for_story=False, n_sub=None)\nThis is the main function of `cross-words'.\n\nGiven an input configuration file, it outputs all combinations of intents x entities x aliases into a .md file ready for training.\n\nA few arguments allow to tune its behavior:\n\n- **input_path:** path to the configuration file *(string)*\n- **output_path:** path to the output folder where train/test files will be written *(string)*\n- **intent_string** string to specify intent at the beginning of sentence files (for Rasa NLU) or inside genereated stories (for Rasa Core) *(string)*\n- **output_prefix** string to specify beginning of names of files that are written *(string)*\n- **training_ratio:** ratio between train and test sets. If .7, 30% of all generated combinations will be reserved into a test file. If 1.0, no test file will be created. *(float)*\n- **for_story:** whether to generate sentences (for Rasa NLU) or stories (for Rasa Core) *(bool)*\n- **n_sub:** number of sentences/stories (incl. test) to be taken as a subsample of all possible combinations of intents x entities x aliases *(int)* (required when generating stories for Rasa Core)\n\n### parse_input(input_path)\nThis function is provided as a facilitator for experimentation purposes. It is the first function called by generate.\n\nGiven an input configuration file, generates:\n\n- a list of intents in the form\n```\n    ['intent_sentence_0', 'intent_sentence_1', ...]\n\n    e.g. from above:\n    ['Could I have the number of @[subject_filter] ~[owners] in @[geo_filter] @[time_filter]?']\n```\n- a dictionnary of entitites in the form\n```\n    {'entity_0': ['alternative_00', 'alternative_01', ...],\n     'entity_1': ['alternative_10', 'alternative_11', ...], ...}\n\n    e.g. from above:\n    {'time_filter': ['this month', 'this year', ...],\n     'geo_filter': ['France', 'Germany', ...], ...}\n```\n- a dictionnary of synonyms in the form\n```\n    {'alias_0': ['alternative_00', 'alternative_01', ...],\n     'alias_1': ['alternative_10', 'alternative_11', ...], ...}\n\n    e.g. from above:\n    {'owners': ['owners', 'possessors']}\n```\n\n## Combination logic\n\n`cross-words` is designed to compute sentences by placing all entities and alias alternative into all intents.\n\nAs a rule of thumb, the overall maximum number of generated sentences is in the order of:\n\nnb<sub>intent sentences</sub> &times; avg. nb<sub>entity placeholders per intent sentence</sub> &times; avg. nb<sub>alternatives per entity</sub> &times; avg. nb<sub>alias placeholders per intent sentence</sub> &times; avg. nb<sub>alternatives per alias</sub>\n\nAs such, the created training files grow exponentially, hence the available *n_sub* parameter in **generate**\n\nIn the specific case of stories (Rasa Core), `cross-words` will also use *information availability* as an additional combination dimension.\n\nFor example, the two stories below are based on a different initially available information set given by the user:\n\n```\n## Genereated Story 257661587723758\n* acquisition{\"time_filter\": \"since release\", \"geo_filter\": \"Germany\"}\n    - slot{\"time_filter\": \"since release\"}\n    - slot{\"geo_filter\": \"Germany\"}\n    - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"owl\"}\n    - slot{\"subject_filter\": \"owl\"}\n    - action_acquisition\n\n## Genereated Story 877699493192194\n* acquisition{\"time_filter\": \"since release\"}\n    - slot{\"time_filter\": \"since release\"}\n    - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"owl\"}\n    - slot{\"subject_filter\": \"owl\"}\n    - utter_ask_geo_filter\n* acquisition{\"geo_filter\": \"Germany\"}\n    - slot{\"geo_filter\": \"Germany\"}\n    - action_acquisition \n```\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/data-chirps/xwords",
        "keywords": "",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "cross-words",
        "package_url": "https://pypi.org/project/cross-words/",
        "platform": "",
        "project_url": "https://pypi.org/project/cross-words/",
        "project_urls": {
            "Homepage": "https://github.com/data-chirps/xwords"
        },
        "release_url": "https://pypi.org/project/cross-words/0.0.2/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "Chat bot sentences & story generator.",
        "version": "0.0.2"
    },
    "last_serial": 3931551,
    "releases": {
        "0.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b24b29f383fcc06d97f735e5b41221cf",
                    "sha256": "005794fa77f318f078376e4d9582f34e3abc46760c959f4cfaf4501e8cb29636"
                },
                "downloads": -1,
                "filename": "cross_words-0.0.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "b24b29f383fcc06d97f735e5b41221cf",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 14750,
                "upload_time": "2018-05-09T17:32:11",
                "url": "https://files.pythonhosted.org/packages/ab/99/ae5ab19feca62e8477b515a4587373e4ec1ff074a78597dc68a9bf223ae2/cross_words-0.0.1-py3-none-any.whl"
            }
        ],
        "0.0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "5ce37ca41c80ef87e6598f4c896a0e76",
                    "sha256": "f54cfa676f5f7d5fe9bfe9241918086b57c1677f534f491b510d6a919137f0f8"
                },
                "downloads": -1,
                "filename": "cross_words-0.0.2-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "5ce37ca41c80ef87e6598f4c896a0e76",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 15438,
                "upload_time": "2018-06-05T10:04:53",
                "url": "https://files.pythonhosted.org/packages/63/fd/0af5f56f0dd7f499c54b1b309cb0cfcb234bf9e15592afe36a5204d5b351/cross_words-0.0.2-py2.py3-none-any.whl"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "5ce37ca41c80ef87e6598f4c896a0e76",
                "sha256": "f54cfa676f5f7d5fe9bfe9241918086b57c1677f534f491b510d6a919137f0f8"
            },
            "downloads": -1,
            "filename": "cross_words-0.0.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ce37ca41c80ef87e6598f4c896a0e76",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 15438,
            "upload_time": "2018-06-05T10:04:53",
            "url": "https://files.pythonhosted.org/packages/63/fd/0af5f56f0dd7f499c54b1b309cb0cfcb234bf9e15592afe36a5204d5b351/cross_words-0.0.2-py2.py3-none-any.whl"
        }
    ]
}