{
"info": {
"author": "Cl\u00e9ment Choukroun, Alexandre Mourachko",
"author_email": "clement.choukroun@ubisoft.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 1 - Planning",
"Natural Language :: English",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3.6",
"Topic :: Communications"
],
"description": "cross-words\n==========================================\n\n`cross-words` is a python module that allows you to easily create a corpus of documents with parameterized entities. \n\nThe main goal of `cross-words` is to offer an easy way to create either sentences or stories for use in chat bot training.\nAs of May 2018, it is mostly designed to be used with [Rasa NLU/Core](http://rasa.com/)\n\n1. [Installation](#install)\n2. [How to use this package](#usage)\n\n# 1. Installation\n\nYou can install it with pip:\n\n pip install cross-words\n\nOr directly from github if you want the latest development version\n\n pip install git+https://github.com/data-chirps/cross-words.git\n\n# 2. How to use this package\n## cross-words DSL\n`cross-words` is based on a simple yet powerful Domain Specific Language.\nWhen used along with Rasa NLU/Core, it uses 3 concepts:\n\n- **intents:** the objective of the chatbot's user (e.g. ask to book a restaurant, confirm a chatbot inquiry etc.)\n- **entities:** specific parts of a sentence containing key information (e.g. which restaurant to book, how many people etc.)\n- **aliases:** lists of synonyms that can be used interchangeably\n\nMore details are available at [Rasa NLU](https://nlu.rasa.com/tutorial.html)\n\nGiven a configuration file (.txt) containing all of the above, `cross-words` is able to generate many training sentences/conversations using combinations of sentence parts.\n\n`cross-words` configuration files look like this:\n\n```\nCould I have the number of @[subject_filter] ~[owners] in @[geo_filter] @[time_filter]?\n\n\n@[time_filter]\n this month\n this year\n LTD\n life to date\n up to date\n since release\n since launch\n since beginning of fiscal year\n\n@[geo_filter]\n France\n Germany\n US\n United States\n America\n Canada\n Italy\n\n@[subject_filter]\n birds\n parrots\n owl\n dogs\n cats\n persian\n\n\n~[owners]\n owners\n possessors\n```\n\nIf asked for sentences, `cross-words` will generate a .md file whose first lines will be :\n\n```\n- Could I have the number of [birds](subject_filter) possessors in [Canada](geo_filter) [life to date](time_filter)?\n- Could I have the number of [parrots](subject_filter) possessors in [United States](geo_filter) [since release](time_filter)?\n- Could I have the number of [owl](subject_filter) possessors in [Italy](geo_filter) [up to date](time_filter)?\n- Could I have the number of [owl](subject_filter) possessors in [Italy](geo_filter) [since release](time_filter)?\n- Could I have the number of [dogs](subject_filter) owners in [United States](geo_filter) [LTD](time_filter)?\n- Could I have the number of [dogs](subject_filter) owners in [Canada](geo_filter) [this year](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [France](geo_filter) [this year](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [US](geo_filter) [since release](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [America](geo_filter) [this month](time_filter)?\n- Could I have the number of [cats](subject_filter) owners in [Canada](geo_filter) [life to date](time_filter)?\n\n```\nThis file is then ready to use as training input to Rasa NLU.\n\nIf asked for stories:\n\n```\n## Genereated Story 815310784239368\n* acquisition{}\n - utter_ask_time_filter\n* acquisition{\"time_filter\": \"since beginning of fiscal year\"}\n - slot{\"time_filter\": \"since beginning of fiscal year\"}\n - utter_ask_geo_filter\n* acquisition{\"geo_filter\": \"America\"}\n - slot{\"geo_filter\": \"America\"}\n - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"dogs\"}\n - slot{\"subject_filter\": \"dogs\"}\n - action_acquisition\n\n## Genereated Story 257661587723758\n* acquisition{\"time_filter\": \"since release\", \"geo_filter\": \"Germany\"}\n - slot{\"time_filter\": \"since release\"}\n - slot{\"geo_filter\": \"Germany\"}\n - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"owl\"}\n - slot{\"subject_filter\": \"owl\"}\n - action_acquisition\n\n## Genereated Story 877699493192194\n* acquisition{\"subject_filter\": \"parrots\"}\n - slot{\"subject_filter\": \"parrots\"}\n - utter_ask_time_filter\n* acquisition{\"time_filter\": \"LTD\"}\n - slot{\"time_filter\": \"LTD\"}\n - utter_ask_geo_filter\n* acquisition{\"geo_filter\": \"France\"}\n - slot{\"geo_filter\": \"France\"}\n - action_acquisition\n```\nThis file is then ready to use for training with Rasa Core.\n\n## Generating files\n\n`cross-words` mainly comes with 2 functions: parse_input and generate. All other functions are implementation details.\n\n### generate(input_path, output_path=\"./xwords/outputs/\", intent_string=None, output_prefix='', training_ratio=1.0, for_story=False, n_sub=None)\nThis is the main function of `cross-words'.\n\nGiven an input configuration file, it outputs all combinations of intents x entities x aliases into a .md file ready for training.\n\nA few arguments allow to tune its behavior:\n\n- **input_path:** path to the configuration file *(string)*\n- **output_path:** path to the output folder where train/test files will be written *(string)*\n- **intent_string** string to specify intent at the beginning of sentence files (for Rasa NLU) or inside genereated stories (for Rasa Core) *(string)*\n- **output_prefix** string to specify beginning of names of files that are written *(string)*\n- **training_ratio:** ratio between train and test sets. If .7, 30% of all generated combinations will be reserved into a test file. If 1.0, no test file will be created. *(float)*\n- **for_story:** whether to generate sentences (for Rasa NLU) or stories (for Rasa Core) *(bool)*\n- **n_sub:** number of sentences/stories (incl. test) to be taken as a subsample of all possible combinations of intents x entities x aliases *(int)* (required when generating stories for Rasa Core)\n\n### parse_input(input_path)\nThis function is provided as a facilitator for experimentation purposes. It is the first function called by generate.\n\nGiven an input configuration file, generates:\n\n- a list of intents in the form\n```\n ['intent_sentence_0', 'intent_sentence_1', ...]\n\n e.g. from above:\n ['Could I have the number of @[subject_filter] ~[owners] in @[geo_filter] @[time_filter]?']\n```\n- a dictionnary of entitites in the form\n```\n {'entity_0': ['alternative_00', 'alternative_01', ...],\n 'entity_1': ['alternative_10', 'alternative_11', ...], ...}\n\n e.g. from above:\n {'time_filter': ['this month', 'this year', ...],\n 'geo_filter': ['France', 'Germany', ...], ...}\n```\n- a dictionnary of synonyms in the form\n```\n {'alias_0': ['alternative_00', 'alternative_01', ...],\n 'alias_1': ['alternative_10', 'alternative_11', ...], ...}\n\n e.g. from above:\n {'owners': ['owners', 'possessors']}\n```\n\n## Combination logic\n\n`cross-words` is designed to compute sentences by placing all entities and alias alternative into all intents.\n\nAs a rule of thumb, the overall maximum number of generated sentences is in the order of:\n\nnbintent sentences × avg. nbentity placeholders per intent sentence × avg. nbalternatives per entity × avg. nbalias placeholders per intent sentence × avg. nbalternatives per alias\n\nAs such, the created training files grow exponentially, hence the available *n_sub* parameter in **generate**\n\nIn the specific case of stories (Rasa Core), `cross-words` will also use *information availability* as an additional combination dimension.\n\nFor example, the two stories below are based on a different initially available information set given by the user:\n\n```\n## Genereated Story 257661587723758\n* acquisition{\"time_filter\": \"since release\", \"geo_filter\": \"Germany\"}\n - slot{\"time_filter\": \"since release\"}\n - slot{\"geo_filter\": \"Germany\"}\n - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"owl\"}\n - slot{\"subject_filter\": \"owl\"}\n - action_acquisition\n\n## Genereated Story 877699493192194\n* acquisition{\"time_filter\": \"since release\"}\n - slot{\"time_filter\": \"since release\"}\n - utter_ask_subject_filter\n* acquisition{\"subject_filter\": \"owl\"}\n - slot{\"subject_filter\": \"owl\"}\n - utter_ask_geo_filter\n* acquisition{\"geo_filter\": \"Germany\"}\n - slot{\"geo_filter\": \"Germany\"}\n - action_acquisition \n```\n\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/data-chirps/xwords",
"keywords": "",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "cross-words",
"package_url": "https://pypi.org/project/cross-words/",
"platform": "",
"project_url": "https://pypi.org/project/cross-words/",
"project_urls": {
"Homepage": "https://github.com/data-chirps/xwords"
},
"release_url": "https://pypi.org/project/cross-words/0.0.2/",
"requires_dist": null,
"requires_python": "",
"summary": "Chat bot sentences & story generator.",
"version": "0.0.2"
},
"last_serial": 3931551,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "b24b29f383fcc06d97f735e5b41221cf",
"sha256": "005794fa77f318f078376e4d9582f34e3abc46760c959f4cfaf4501e8cb29636"
},
"downloads": -1,
"filename": "cross_words-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b24b29f383fcc06d97f735e5b41221cf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14750,
"upload_time": "2018-05-09T17:32:11",
"url": "https://files.pythonhosted.org/packages/ab/99/ae5ab19feca62e8477b515a4587373e4ec1ff074a78597dc68a9bf223ae2/cross_words-0.0.1-py3-none-any.whl"
}
],
"0.0.2": [
{
"comment_text": "",
"digests": {
"md5": "5ce37ca41c80ef87e6598f4c896a0e76",
"sha256": "f54cfa676f5f7d5fe9bfe9241918086b57c1677f534f491b510d6a919137f0f8"
},
"downloads": -1,
"filename": "cross_words-0.0.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ce37ca41c80ef87e6598f4c896a0e76",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 15438,
"upload_time": "2018-06-05T10:04:53",
"url": "https://files.pythonhosted.org/packages/63/fd/0af5f56f0dd7f499c54b1b309cb0cfcb234bf9e15592afe36a5204d5b351/cross_words-0.0.2-py2.py3-none-any.whl"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "5ce37ca41c80ef87e6598f4c896a0e76",
"sha256": "f54cfa676f5f7d5fe9bfe9241918086b57c1677f534f491b510d6a919137f0f8"
},
"downloads": -1,
"filename": "cross_words-0.0.2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ce37ca41c80ef87e6598f4c896a0e76",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 15438,
"upload_time": "2018-06-05T10:04:53",
"url": "https://files.pythonhosted.org/packages/63/fd/0af5f56f0dd7f499c54b1b309cb0cfcb234bf9e15592afe36a5204d5b351/cross_words-0.0.2-py2.py3-none-any.whl"
}
]
}