{ "info": { "author": "thu-coai", "author_email": "thu-coai-developer@googlegroups.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "\n# Conversational Toolkits\n\n[![CodeFactor](https://www.codefactor.io/repository/github/thu-coai/cotk/badge)](https://www.codefactor.io/repository/github/thu-coai/cotk)\n[![Coverage Status](https://coveralls.io/repos/github/thu-coai/cotk/badge.svg?branch=master)](https://coveralls.io/github/thu-coai/cotk?branch=master)\n[![Build Status](https://travis-ci.com/thu-coai/cotk.svg?branch=master)](https://travis-ci.com/thu-coai/cotk)\n[![codebeat badge](https://codebeat.co/badges/dc64db27-7e25-4fea-a231-3c9baac916f8)](https://codebeat.co/projects/github-com-thu-coai-cotk-master)\n\n``cotk`` is an open-source lightweight framework for model building and evaluation.\nWe provides standard dataset and evaluation suites in the domain of general language generation.\nIt easy to use and make you focus on designing your models!\n\nFeatures included:\n\n* Light-weight, easy to start. Don't bother your way to construct models.\n* Predefined standard datasets, in the domain of language modeling, dialog generation and more.\n* Predefined evaluation suites, test your model with multiple metrics in several lines.\n* A dashboard to show experiments, compare your and others' models fairly.\n* Long-term maintenance and consistent development.\n\nThis project is a part of ``dialtk`` (Toolkits for Dialog System by Tsinghua University), you can follow [dialtk](http://coai.cs.tsinghua.edu.cn/dialtk/) or [cotk](http://coai.cs.tsinghua.edu.cn/dialtk/cotk/) on our home page.\n\n**Quick links**\n\n* [Tutorial & Documents](https://thu-coai.github.io/cotk_docs/)\n* [Dashboard](http://coai.cs.tsinghua.edu.cn/dashboard/)\n\n**Index**\n\n- [Installation](#installation)\n - [Requirements](#requirements)\n - [Install from pip](#install-from-pip)\n - [Install from source](#install-from-source)\n- [Quick Start](#quick-start)\n - [Dataloader](#Dataloader)\n - [Metrics](#metrics)\n - [Publish Experiments](#publish-experiments)\n - [Reproduce Experiments](#reproduce-experiments)\n - [Predefined Models](#predefined-models)\n- [Issues](#issues)\n- [Contributions](#Contributions)\n- [Team](#team)\n- [License](#license)\n\n\n\n## Installation\n\n### Requirements\n\n- python 3\n- numpy >= 1.13\n- nltk >= 3.4\n- tqdm >= 4.30\n- checksumdir >= 1.1\n- pytorch >= 1.0.0 (optional)\n- pytorch-pretrained-bert (optional)\n\n### Install from pip\n\nYou can simply get the latest stable version from pip using\n\n```bash\n pip install cotk\n```\n\n### Install from source code\n\n* Clone the cotk repository\n\n```bash\n git clone https://github.com/thu-coai/cotk.git\n```\n\n* Install cotk via pip\n\n```bash\n cd cotk\n pip install -e .\n```\n\n* If you want to run the models in ``./models``, you have to additionally install [TensorFlow](https://www.tensorflow.org) or [PyTorch](https://pytorch.org/).\n\n\n\n## Quick Start\n\nLet us skim through the whole package to find what you want. \n\n### Dataloader\n\nLoad common used dataset and preprocess for you:\n\n* Download online resources or import from local\n* Split training set, development set and test set\n* Construct vocabulary list\n\n```python\n >>> # automatically download online resources\n >>> dataloader = cotk.dataloader.MSCOCO(\"resources://MSCOCO_small\")\n >>> # or download from a url\n >>> dl_url = cotk.dataloader.MSCOCO(\"http://cotk-data.s3-ap-northeast-1.amazonaws.com/mscoco_small.zip#MSCOCO\")\n >>> # or import from local file\n >>> dl_zip = cotk.dataloader.MSCOCO(\"./MSCOCO.zip#MSCOCO\")\n\n >>> print(\"Dataset is split into:\", dataloader.key_name)\n [\"train\", \"dev\", \"test\"]\n```\n\nInspect vocabulary list\n\n```python\n >>> print(\"Vocabulary size:\", dataloader.vocab_size)\n Vocabulary size: 2588\n >>> print(\"Frist 10 tokens in vocabulary:\", dataloader.vocab_list[:10])\n Frist 10 tokens in vocabulary: ['', '', '', '', '.', 'a', 'A', 'on', 'of', 'in']\n```\n\nConvert between ids and strings\n\n```python\n >>> print(\"Convert string to ids\", \\\n ... dataloader.convert_tokens_to_ids([\"\", \"hello\", \"world\", \"\"]))\n Convert string to string [2, 1379, 1897, 3]\n >>> print(\"Convert ids to string\", \\\n ... dataloader.convert_ids_to_tokens([2, 1379, 1897, 3]))\n```\n\nIterate over batches\n\n```python\n >>> for data in dataloader.get_batch(\"train\", batch_size=1):\n ... print(data)\n {'sent':\n array([[ 2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 1, 1099, 4, 3]]),\n # This is an old photo of people and a wagon.\n 'sent_allvocabs':\n array([[ 2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3]]),\n # This is an old photo of people and a horse-drawn wagon.\n 'sent_length': array([14])}\n ......\n```\n\nor using ``while`` if you like\n\n```python\n >>> dataloader.restart(\"train\", batch_size=1):\n >>> while True:\n ... data = dataloader.get_next_batch(\"train\")\n ... if data is None: break\n ... print(data)\n```\n\n\n**note**: If you want to know more about data loader, please refer to [docs](https://thu-coai.github.io/cotk_docs/index.html#model-zoo).\n\n### Metrics\n\nWe found there are different versions of the same metric in released codes on Github,\nwhich leads to unfair compare between models. For example, whether considering\n``unk``, calculating the mean of NLL across sentences or tokens in\n``perplexity`` may introduce **an error of several times** and **extremely** harm the evaluation.\n\nWe provide unified metrics implementation for all models. The metric object\nreceives data in batch.\n\n```python\n >>> metric = cotk.metric.SelfBleuCorpusMetric(dataloader, gen_key=\"gen\")\n >>> metric.forward({\n ... \"gen\":\n ... [[2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3],\n ... [2, 46, 145, 500, 1764, 207, 11, 5, 93, 7, 31, 4, 3]]\n ... })\n >>> print(metric.close())\n {'self-bleu': 0.02206768072402293,\n 'self-bleu hashvalue': 'c206893c2272af489147b80df306ee703e71d9eb178f6bb06c73cb935f474452'}\n```\n\nWe also provide standard metrics for selected dataloader.\n\n```python\n >>> metric = dataloader.get_inference_metric(gen_key=\"gen\")\n >>> metric.forward({\n ... \"gen\":\n ... [[2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3],\n ... [2, 46, 145, 500, 1764, 207, 11, 5, 93, 7, 31, 4, 3]]\n ... })\n >>> print(metric.close())\n {'self-bleu': 0.02206768072402293,\n 'self-bleu hashvalue': 'c206893c2272af489147b80df306ee703e71d9eb178f6bb06c73cb935f474452',\n 'fw-bleu': 0.3831004349785445, 'bw-bleu': 0.025958979254273006, 'fw-bw-bleu': 0.04862323612604027,\n 'fw-bw-bleu hashvalue': '530d449a096671d13705e514be13c7ecffafd80deb7519aa7792950a5468549e',\n 'gen': [\n ['', 'This', 'is', 'an', 'old', 'photo', 'of', 'people', 'and', 'a', 'horse-drawn', 'wagon', '.'],\n ['', 'An', 'old', 'stone', 'castle', 'tower', 'with', 'a', 'clock', 'on', 'it', '.']\n ]}\n```\n\n``Hash value`` is provided for checking whether the same dataset is used.\n\n\n**note**: If you want to know more about metrics, please refer to [docs](https://thu-coai.github.io/cotk_docs/metric.html).\n\n### Publish Experiments\n\nWe provide an online dashboard to manage your experiments.\n\nHere we give an simple example for you.\n\nFirst initialize a git repo in your command line.\n\n```bash\n git init\n```\n\nThen write your model with an entry function in ``main.py``.\n\n```python\n import cotk.dataloader\n import json\n\n def run():\n dataloader = cotk.dataloader.MSCOCO(\"resources://MSCOCO_small\")\n metric = dataloader.get_inference_metric()\n metric.forward({\n \"gen\":\n [[2, 181, 13, 26, 145, 177, 8, 22, 12, 5, 3755, 1099, 4, 3],\n [2, 46, 145, 500, 1764, 207, 11, 5, 93, 7, 31, 4, 3]]\n })\n json.dump(metric.close(), open(\"result.json\", 'w'))\n```\n\n\n**note**: The only requirement of your model is to output a file named ``result.json``,\nyou can do whatever you want (even don't load data using ``cotk``).\n\n\nNext, commit your changes and set upstream branch in your command line.\n\n```bash\n git add -A\n git commit -a -m \"init\"\n git remote add origin master https://github.com/USERNAME/REPONAME.git\n git push origin -u master\n```\n\nFinally, type ``cotk run`` to run your model and upload to cotk dashboard.\n\n``cotk`` will automatically collect your git repo, username, commit and ``result.json``\nto the cotk dashboard (TO BE ONLINE).The dashboard is a website where you can manage\nyour experiments or share results with others.\n\nFILL AN IMAGE HERE\n\nIf you don't want to use cotk's dashboard, you can also choose to directly upload your model\nto github.\n\nUse ``cotk run --only-run`` instead of ``cotk run``, you will find a ``.model_config.json``\nis generated. Commit the file and push it to github, the other can automatically download\nyour model as the way described in next section.\n\n\n**note**: The reproducibility should be maintained by the author. We only make sure all the input\nis the same, but difference can be introduced by different random seed, device or other\naffects. Before you upload, run ``cotk run --only-run`` twice and find whether the results\nis the same.\n\n### Reproduce Experiments\n\nYou can download others' model in dashboard\nand try to reproduce their results.\n\n```bash\n cotk download ID\n```\n\nThe ``ID`` comes from dashboard id.\n``cotk`` will download the codes from dashboard and tell you how to run the models.\n\n```none\nINFO: Fetching USERNAME/REPO/COMMIT\n13386B [00:00, 54414.25B/s]\nINFO: Codes from USERNAME/REPO/COMMIT fetched.\nINFO: Model running cmd written in run_model.sh\nModel running cmd: cd ./PATH && cotk run --only-run --entry main\n```\n\nType ``cotk run --only-run --entry main`` will reproduce the same experiments.\n\nYou can also download directly from github if the maintainer has set the ``.model_config.json``.\n\n```bash\n cotk download USER/REPO/COMMIT\n```\n\n``cotk`` will download the codes from github and generate commands by the config file.\n\n### Predefined Models\n\n\nWe have provided some baselines for the classical tasks, see [Model Zoo](https://thu-coai.github.io/cotk_docs/index.html#model-zoo) in docs for details.\n\nYou can also use ``cotk download thu-coai/MODEL_NAME/master`` to get the codes.\n\n## Issues\n\nYou are welcome to create an issue if you want to request a feature, report a bug or ask a general question.\n\n## Contributions\n\nWe welcome contributions from community. \n\n* If you want to make a big change, we recommend first creating an issue with your design.\n* Small contributions can be directly made by a pull request.\n* If you like make contributions for our library, see issues to find what we need.\n\n## Team\n\n`cotk` is maintained and developed by Tsinghua university conversational AI group (THU-coai). Check our [main pages](http://coai.cs.tsinghua.edu.cn/) (In Chinese).\n\n## License\n\nApache License 2.0\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/thu-coai/cotk", "keywords": "", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "cotk", "package_url": "https://pypi.org/project/cotk/", "platform": "", "project_url": "https://pypi.org/project/cotk/", "project_urls": { "Homepage": "https://github.com/thu-coai/cotk" }, "release_url": "https://pypi.org/project/cotk/0.0.1/", "requires_dist": [ "numpy (>=1.13)", "nltk (>=3.4)", "tqdm (>=4.30)", "checksumdir (>=1.1)", "requests", "torch (>=1.0.0) ; extra == 'develop'", "python-coveralls ; extra == 'develop'", "pytest-dependency ; extra == 'develop'", "pytest-mock ; extra == 'develop'", "requests-mock ; extra == 'develop'", "pytest (>=3.6.0) ; extra == 'develop'", "pytest-cov (==2.4.0) ; extra == 'develop'", "checksumdir ; extra == 'develop'", "pytorch-transformers (>=1.1.0) ; extra == 'develop'" ], "requires_python": ">=3.5", "summary": "Conversational Toolkits", "version": "0.0.1" }, "last_serial": 5730393, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "9b437310f8dcd539bf223e9ea39b7957", "sha256": "73ff5227b6d3e49a1b0ad4ec780e69f33c410e785da18b3955f904f9e8be7d0d" }, "downloads": -1, "filename": "cotk-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9b437310f8dcd539bf223e9ea39b7957", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 81037, "upload_time": "2019-08-26T12:03:12", "url": "https://files.pythonhosted.org/packages/d1/70/ee2fc09d9d50f243a77ece22090ea2bfe0857a09acf8bb260c0723246f58/cotk-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ce006f220c5bdbf5968ce9b3371998e3", "sha256": "a55eab79a82b28d8ffb0c3ad3958662ccd21559e9c89e364160efcc43cb80635" }, "downloads": -1, "filename": "cotk-0.0.1.tar.gz", "has_sig": false, "md5_digest": "ce006f220c5bdbf5968ce9b3371998e3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 60426, "upload_time": "2019-08-26T12:03:16", "url": "https://files.pythonhosted.org/packages/e7/33/76163f50654fdf6b579110b0ed1d449ec3e832b6f036b2eda1164ff783e3/cotk-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9b437310f8dcd539bf223e9ea39b7957", "sha256": "73ff5227b6d3e49a1b0ad4ec780e69f33c410e785da18b3955f904f9e8be7d0d" }, "downloads": -1, "filename": "cotk-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9b437310f8dcd539bf223e9ea39b7957", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 81037, "upload_time": "2019-08-26T12:03:12", "url": "https://files.pythonhosted.org/packages/d1/70/ee2fc09d9d50f243a77ece22090ea2bfe0857a09acf8bb260c0723246f58/cotk-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ce006f220c5bdbf5968ce9b3371998e3", "sha256": "a55eab79a82b28d8ffb0c3ad3958662ccd21559e9c89e364160efcc43cb80635" }, "downloads": -1, "filename": "cotk-0.0.1.tar.gz", "has_sig": false, "md5_digest": "ce006f220c5bdbf5968ce9b3371998e3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 60426, "upload_time": "2019-08-26T12:03:16", "url": "https://files.pythonhosted.org/packages/e7/33/76163f50654fdf6b579110b0ed1d449ec3e832b6f036b2eda1164ff783e3/cotk-0.0.1.tar.gz" } ] }