{ "info": { "author": "aitrek", "author_email": "aitrek.zh@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# An Intent Classifier For Chatbot\n\n\n## Introduction \n    \nThe intent recognition is the very key component of a chatbot system. \nWe can recognize a man's intent by what a user speak and the dialog context. \nIt is a very easy daily activity for us human beings, however, it is a very \nhard task for computers. \n\n    \nThe intent recognition is treated as a process of multi-labels classification.\nConcretely speaking, we use words and context as our input, and the output is \nmulti-labels which means a user's words might have multi-intent. \n\n+ input: \n + words: \n Just a string of what user speak. \n Example: \"I wanna known what time is it and how is the weather?\"\n\n + context(optional): \n A json string with kinds of context information. \n Example: '{\"timestamp\": 1553154627, \"location\": \"Shanghai\"}'\n\n The words and context will be transformed to tfidf-vector and dict-vector \n respectively, and then the two vectors will be concatenated to form the final\n input vector.\n\n+ intent labels:\n + multi-labels: \n Labels string separated with \",\", such as \"time,weather\".\n\n + multi-levels: \n Similar intent labels will be put in the same category and form intent \n with multi-levels. \n Example: \n \"news/sport_news/football_news\", \n \"news/sport_news/basketball_news\",\n\n\n## Dataset \n### Intent Dataset\n    \nThe intent dataset can be in any storage, like mysql database or local csv file. \nTwo functions to load dataset from mysql and csv file have been implemented \nin dataset.py. They can be used simply by offering parameters, mysql connection \nconfigure or csv file path. \n    \nIf intent dataset is put in different storage, you could implement function \njust like utils.load_from_mysql/load_from_csv. Just remember that the result \nof the function should be an instance of DataBunch which confined the fields \nof the dataset: \n\n```python\ndef load_from_xxx(xxx_params) -> DataBunch:\n words = []\n contexts = []\n intents = []\n\n # get words, contexts, intents\n ... \n\n return DataBunch(words=np.array(words, dtype=np.str),\n contexts=np.array(contexts, dtype=np.str),\n intents=np.array(intents, dtype=np.str))\n\n```\n### Rules\n    \nRules is essentially a kind of dataset just like intent dataset. They can also be \nstored in any storage and fetched in the same way with intent dataset:\n```python\ndef load_from_xxx(xxx_params) -> DataBunch:\n words_rules = []\n context_rules = []\n intent_labels = []\n\n # get words, contexts, intents\n ... \n\n return RuleBunch(words_rules=words_rules, context_rules=context_rules,\n intent_labels=intent_labels)\n\n```\n\n## Classifiers\nThe classification mechanism consists of rule-based and model-based approaches:\n\n* rule-based classification: \nThe rule-based approach predict intent labels using hand-written regexps and \ncontext keys&values to match the words and context. Unlike model-based \nclassifier, we don't need to train the rule-based classifier but load rules \nfrom storage and then create instance directly with the rules:\n```python\nimport os\n\nfrom intent_classifier import RuleClassifier, ModelClassifier, IntentClassifier\nfrom intent_classifier.dataset import load_intents_from_mysql, load_rules_from_mysql\n\nconfigs_mysql_rule = {\"host\": \"xxx.xxx.xxx.xxx\", \"port\": xxx,\n \"user\": \"xxxx\", \"password\": \"xxxx\",\n \"db\": \"xxxx\", \"table\": \"xxxx\",\n \"customer\": \"xxxx\"}\nrule_bunch = load_rules_from_mysql(configs_mysql_rule)\n\nrule_classifier = RuleClassifier(rule_bunch)\n```\n\n* model-based classification: \nThe model-based approach predict intent labels using machine learning models \ntrainded from intent dataset. Suppose that we have trained and dumped models, \nwe can use the models in this way:\n```python\nfolder = \"xxx/xxx/xxx\"\nmodel_classifier = ModelClassifier(folder=folder, customer=\"xxx\", lang=\"en\", \n ner=None, n_jobs=-1)\nmodel_classifier.load(clf_id=None) # load models with maximum id\n```\n\n* IntentClassifier: \nIntentClassifier wraps RuleClassifier and ModelClassifier to offer a final\nintegration interface to predict intent labels. We can use just RuleClassifier \nor ModelClassifier to initialize IntentClassifer or use both them. If the two \nclassifiers are in use, the ModelClassifier might be skipped if we have already \ngot intent labels from the RuleClassifier. \nPlease note that we should try not use to many rules to predict the intent \nlabels, the best way of which is to use model-based classifier. The rule-based \nclassifier is an option only for very simple case no need of model or as a \ntemporary solution when model is not ready, for instance we need retrain the \nmodel to add some new intents.\n\n\n## Training \nLoad intent dataset, create an instance of ModelClassifier and then fit \nthe dataset. \n```python\n\nconfigs_mysql_model = {\"host\": \"xxx.xxx.xxx.xxx\", \"port\": xxx,\n \"user\": \"xxxx\", \"password\": \"xxxx\",\n \"db\": \"xxxx\", \"table\": \"xxxx\",\n \"customer\": \"xxxx\"}\ndata_bunch = load_intents_from_mysql(configs_mysql_model)\n\nfolder = os.path.join(os.getcwd(), \"models\")\nmodel_classifier = ModelClassifier(folder=folder, customer=\"xxx\", lang=\"en\", \n ner=None, n_jobs=-1)\nmodel_classifier.fit(data_bunch)\n```\nNote that the param \"ner\" in Intent is for named entity recognition, which is \noptional to offer entity information in words.\n\n\n## Save Models \nAfter finishing the fitting, run dump() to save the models in a sub-folder \nwith name from datatime in the specified model folder. The models and report \nwill be save in the sub-folder with name \"intent.model\" and \"report.txt\" \nrespectively. \n```python\nmodel_classifier.dump()\n```\n\n\n## Load Models \nRun load with specific clf_id or with default clf_id to load the most recent \nmodels.\n```python\nmodel_classifier.load(clf_id=\"20190321113421\") # if clf_id is None, the model \n # with maximum id will be loaded\n```\n\n## Create IntentClassifier\n```python\nintent_classifier = IntentClassifier(rule_classifier=rule_classifier,\n model_classifier=model_classifier)\n```\n\n\n## Predict \nPredict intent labels using predict() with words and contexts. The returned will \nbe as list of intent labels.\n```python\nintent_labels = intent_classifier.predict(\n word=\"I wanna known what time is it and how is the weather?\",\n context={\"timestamp\": 1553154627, \"location\": \"Shanghai\"}\n)\n```\n\n\n## Requirements \n+ Python 3.7\n+ numpy\n+ scipy\n+ pandas\n+ scikit-learn==0.20.3\n+ joblib\n+ pymysql (optional, for dataset from mysql)\n+ jieba (optional, for Chinese tokenization)\n\n## Installation \npip install -e git+https://github.com/aitrek/intent_classifier.git\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/aitrek/intent_classifier", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "intent-classifier", "package_url": "https://pypi.org/project/intent-classifier/", "platform": "", "project_url": "https://pypi.org/project/intent-classifier/", "project_urls": { "Homepage": "https://github.com/aitrek/intent_classifier" }, "release_url": "https://pypi.org/project/intent-classifier/0.2.1/", "requires_dist": [ "numpy", "scipy", "PyMySQL", "scikit-learn (==0.20.3)" ], "requires_python": "", "summary": "An Intent Classifier For Chatbot", "version": "0.2.1" }, "last_serial": 5001397, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "469323c638066f07544c3d0bec51d647", "sha256": "efbd747e143c8764d3a0bec19867ed05cc94b48fc330c395b2c7f0eb66284f36" }, "downloads": -1, "filename": "intent_classifier-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "469323c638066f07544c3d0bec51d647", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15091, "upload_time": "2019-03-29T02:40:57", "url": "https://files.pythonhosted.org/packages/14/18/c6e7a7326fc16f8a1624fb42ae1900ac98c5b48488ae23b50d476c3dde52/intent_classifier-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e6bd3b29f45c31890863a2e3c0f7efc3", "sha256": "b97d8c8777c9145c6477194ad989ef25b22270b756c4fa48bd9b6e767a857762" }, "downloads": -1, "filename": "intent_classifier-0.1.0.tar.gz", "has_sig": false, "md5_digest": "e6bd3b29f45c31890863a2e3c0f7efc3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9068, "upload_time": "2019-03-29T02:40:59", "url": "https://files.pythonhosted.org/packages/64/7b/085296a9b160357557654f6267445944360eec3af871b02322cf84768a37/intent_classifier-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "c7f0127a126a994533ca41cbbf9bd1d9", "sha256": "83e5d9ef54001dda3184b5cf60a940932a498c9818464977f6e72d8759aaf1aa" }, "downloads": -1, "filename": "intent_classifier-0.2.0-py3.6.egg", "has_sig": false, "md5_digest": "c7f0127a126a994533ca41cbbf9bd1d9", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 36635, "upload_time": "2019-03-29T02:27:40", "url": "https://files.pythonhosted.org/packages/2f/26/bd93570a2e0ba2deea13f41977e7b98b17a1bc006b3f0aa3abff81eb63ea/intent_classifier-0.2.0-py3.6.egg" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "f28f97f76bd671a3e7ca63fafdc08173", "sha256": "519e87384bb51e9af0711a0782df5db52efc40f42d73b1bf9e49295a4c63e552" }, "downloads": -1, "filename": "intent_classifier-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f28f97f76bd671a3e7ca63fafdc08173", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17464, "upload_time": "2019-03-29T02:27:07", "url": "https://files.pythonhosted.org/packages/c4/10/10dd89c3d3f5aab639d0356ddc4394b3ab849c6f3e149732aa6a623f378b/intent_classifier-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c7613c8b3c91315f71e1ceb626a2709e", "sha256": "aae3805bf484927cc0bb9b5f37ffb59a7ece9cb1c3018557383e645b83104d1e" }, "downloads": -1, "filename": "intent_classifier-0.2.1.tar.gz", "has_sig": false, "md5_digest": "c7613c8b3c91315f71e1ceb626a2709e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13801, "upload_time": "2019-03-29T02:28:42", "url": "https://files.pythonhosted.org/packages/94/4f/2a8e1714017b8f83a10f46ea6a987859092300869d0b5434adea9d56fdae/intent_classifier-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f28f97f76bd671a3e7ca63fafdc08173", "sha256": "519e87384bb51e9af0711a0782df5db52efc40f42d73b1bf9e49295a4c63e552" }, "downloads": -1, "filename": "intent_classifier-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f28f97f76bd671a3e7ca63fafdc08173", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17464, "upload_time": "2019-03-29T02:27:07", "url": "https://files.pythonhosted.org/packages/c4/10/10dd89c3d3f5aab639d0356ddc4394b3ab849c6f3e149732aa6a623f378b/intent_classifier-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c7613c8b3c91315f71e1ceb626a2709e", "sha256": "aae3805bf484927cc0bb9b5f37ffb59a7ece9cb1c3018557383e645b83104d1e" }, "downloads": -1, "filename": "intent_classifier-0.2.1.tar.gz", "has_sig": false, "md5_digest": "c7613c8b3c91315f71e1ceb626a2709e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13801, "upload_time": "2019-03-29T02:28:42", "url": "https://files.pythonhosted.org/packages/94/4f/2a8e1714017b8f83a10f46ea6a987859092300869d0b5434adea9d56fdae/intent_classifier-0.2.1.tar.gz" } ] }