{ "info": { "author": "wac", "author_email": "wuanch@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# This is Data Augmentation for Chinese text for Python3\n\n## Usage\n### you have two func for Chinese text Data Augmentation \n\n### Install textda\npip install:\n\n```bash\npip install textda\n```\n\n1. you can expansion data use **data_expansion**\n```python\nfrom textda.data_expansion import *\nprint(data_expansion('\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u65e0\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00')) \n\n```\noutput:\n\n```python\n['\u751f\u6d3b\u91cc\u9762\u7684\u60ec\u610f\uff0c\u65e0\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n'\u751f\u6d3b\u91cc\u7684\u7b49\u5230\u6625\u6696\u82b1\u5f00',\n'\u751f\u6d3b\u91cc\u65e0\u9700\u60ec\u610f\uff0c\u7684\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n'\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u65e0\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n'\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u5e76\u4e0d\u9700\u8981\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n'\u751f\u6d3b\u65e0\u9700\u7684\u60ec\u610f\uff0c\u91cc\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n'\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u7b49\u5230\u65e0\u9700\u6625\u6696\u82b1\u5f00']\n\n```\n\nparam explain\uff1a\n\n :param sentence: input sentence text\n :param alpha_sr: Replace synonym control param. bigger means more words are Replace\n :param alpha_ri: Random insert. bigger means more words are Insert\n :param alpha_rs: Random swap. bigger means more words are swap\n :param p_rd: Random delete. bigger means more words are deleted\n :param num_aug: How many times do you repeat each method\n\n- you can use parameters alpha_sr, alpha_ri, alpha_rs, p_rd, num_aug can control ouput.\n\n if you set alpha_ri and alpha_rs is 0 that means use **linear classifier** for it, and insensitive to word location\n\n like this:\n ```python\n\n from textda.data_expansion import *\n\n print(data_expansion('\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u65e0\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00', alpha_ri=0, alpha_rs=0))\n\n ```\n output:\n\n ```python\n ['\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u65e0\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n '\uff0c\u65e0\u9700\u6625\u6696\u82b1\u5f00', \n '\u751f\u6d3b\u91cc\u9762\u7684\u60ec\u610f\uff0c\u65e0\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00', \n '\u751f\u6d3b\u91cc\u7684\u60ec\u610f\uff0c\u9700\u7b49\u5230\u6625\u6696\u82b1\u5f00']\n\n ```\n\n\n\n2. you can use **translate_batch** like this:\n\n```python\nfrom textda.youdao_translate import *\ndir = './data'\ntranslate_batch(os.path.join(dir, 'insurance_train'), batch_num=30)\n\n```\n\n```\n# translate results: chinese->english and english -> chinese\n\n\u989c\u8272\u78b0\u6389\u4e86\u4e00\u4e2a\u89d2\u4e0d\u5ef6\u8fdf,\u4f46\u4e8b\u60c5\u6216\u4ed6\u4eec\u4e0d\u8d60\u9001,\u6216\u53d1\u9001,\u7709\u7b14\u6253\u5f00\u5df2\u7ecf\u7834\u788e,\u78e8\u5c71\u6942,\u4e5f\u4e0d\u6253\u7834\u4e00\u53ea\u624b,\u8f7b\u8f7b\u5237\u6389,\u6301\u4e45\u6027\u4e0d\u957f,\n\u8fd9\u4e2a\u7528\u6237\u6ca1\u6709\u586b\u5199\u8bc4\u4ef7\u5185\u5bb9\n\u989c\u8272\u975e\u5e38\u4e0d\u559c\u6b22\u5b83\n\u4e0d\u8bf4\u8bdd,\u7f13\u6162\u7684\u65b0\u9886\u57df\n\u4e0d\u592a\u5bb9\u6613\u67d3\u597d\u9a91\u5417\n\u4e0d\u662f\u5f88\u597d\u6211\u559c\u6b22!\n\u6ca1\u6709\u989c\u8272\u7684\u773c\u5f71\n\u5e94\u8be5\u6709\u5927\u793c\u7269\u76d2\u773c\u5f71,\u793c\u7269\u4e0d\u793c\u7269\u76d2,\u6ca1\u6709\u4e00\u8d77\u7834\u788e\u7c89\u788e\u597d\u7684\u773c\u5f71\u4e0d\u4e70\u793c\u7269\u6e05\u6d01\u5242\u810f\u5c31\u50cf\u5546\u54c1\u662f\u538b\u529b\n\u6ca1\u6709\u751f\u4ea7\u65e5\u671f,\u6211\u4e0d\u77e5\u9053\u662f\u5426\u771f\u5b9e,\u603b\u662f\u89c9\u5f97\u6709\u70b9\u5947\u602a\n\u662f\u4e00\u4e2a\u5c0f\u98de\u7c89\u5417\n\u4f46\u662f\u4e00\u4e9b\u6df7\u5408\u7684\u989c\u8272\n\u6709\u51e0\u6b21,\u73b0\u5728\u8fd9\u4e2a\u4e1c\u897f,\u7b14\u662f\u7a7a\u7684\n\u773c\u5f71\u6709\u70b9\u5c0f,\u5c11\u4e00\u70b9\u3002\n\u4e0d\u597d\u7684\u989c\u8272,\u7c89\u7ea2\u8272\n\u660e\u661f\u4e0d\u60f3\u4e70,\u574f\u4e86,\u4e0d\u5bb9\u6613,\u4e0d\u8981\u5728\u4e4e\u592a\u591a!\n\u4e00\u5f00\u59cb\u6211\u5df2\u7ecf\u8054\u7cfb\u5feb\u9012,\u5feb\u9012\u4e00\u76f4\u62d6,\u8bf4\u4ed6\u5c06\u8fd4\u56de\u5c06\u8054\u7cfb\u5feb\u9012\u670d\u52a1\n\u753b\u4e0d\u662f,\u662f\u4e0d\u597d\u7684\n\u7269\u7406\u548c\u7167\u7247\u6709\u5f88\u5927\u7684\u533a\u522b\n\u4e0d\u8981\u628a\u773c\u5f71\u5237\u4e0d\u662f\u5f88\u65b9\u4fbf\n\u611f\u89c9\u597d\u5e72,\u989c\u8272\u66f4\u6697\n\u6253\u7834\u4e86\u5728\u8fd0\u8f93\u9014\u4e2d,\u6709\u70b9\u592a\u8106\u5f31\u2026\n\u76d2\u5b50\u6709\u70b9\u574f\u4e86,\u8fd8\u6ca1\u6709\u53d1\u9001\u3002\n\n```\n\nparam explain\uff1a\n\n :param file_path: src file path\n :param batch_num: default 30\n :param reWrite: default True. means you can rewrite file , False means you can append data after this file.\n :param suffix: new file suffix\n\n\n\n## Reference:\n\nhttps://github.com/jasonwei20/eda_nlp\n\nCode for the ICLR 2019 Workshop paper: Easy data augmentation techniques for boosting performance on text classification tasks. https://arxiv.org/abs/1901.11196\n\n\n## License\n\n[MIT](./LICENSE)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/wac81/textda", "keywords": "classification,expansion,augmentation,addition,data,text,chinese", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "textda", "package_url": "https://pypi.org/project/textda/", "platform": "", "project_url": "https://pypi.org/project/textda/", "project_urls": { "Homepage": "https://github.com/wac81/textda" }, "release_url": "https://pypi.org/project/textda/0.1.0.6/", "requires_dist": [ "jieba", "synonyms" ], "requires_python": "", "summary": "this is data augmentation for chinese text", "version": "0.1.0.6" }, "last_serial": 5661184, "releases": { "0.1.0.5": [ { "comment_text": "", "digests": { "md5": "03fc5b3a2f57f0a5593e49033780cd79", "sha256": "aaa8b0b93ef5d540fe40a2843cf2251c596b1e0d027f3fc58a42fe220a9d7127" }, "downloads": -1, "filename": "textda-0.1.0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "03fc5b3a2f57f0a5593e49033780cd79", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 13989, "upload_time": "2019-05-29T06:29:03", "url": "https://files.pythonhosted.org/packages/d6/b8/ddba9f383e1d728a945b8c6371ff43662a5fb002d51eb551bc72c6d128b9/textda-0.1.0.5-py2.py3-none-any.whl" } ], "0.1.0.6": [ { "comment_text": "", "digests": { "md5": "8618be2a6192ad5cc0f776ec852d696b", "sha256": "28c6baabd9ca539648cb8c8cb68c34bf1dfdfaf4fdeb61638bb6adbd5da2fb34" }, "downloads": -1, "filename": "textda-0.1.0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "8618be2a6192ad5cc0f776ec852d696b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14007, "upload_time": "2019-08-11T06:53:07", "url": "https://files.pythonhosted.org/packages/b3/ed/091104cd0788ee166ecc8b6e4e90b4360a5397355052725e6d42937d97c4/textda-0.1.0.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2457625e6ba1c9cbc9539788420e335b", "sha256": "e3564367c85bd915eede083bcea2537559d209b85c3b1fa5ca6272e800298647" }, "downloads": -1, "filename": "textda-0.1.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "2457625e6ba1c9cbc9539788420e335b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13987, "upload_time": "2019-05-29T06:59:32", "url": "https://files.pythonhosted.org/packages/45/c3/28473db1835202ce6c2f16393273cef29662e84eef662cd108ac82611247/textda-0.1.0.6-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8618be2a6192ad5cc0f776ec852d696b", "sha256": "28c6baabd9ca539648cb8c8cb68c34bf1dfdfaf4fdeb61638bb6adbd5da2fb34" }, "downloads": -1, "filename": "textda-0.1.0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "8618be2a6192ad5cc0f776ec852d696b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14007, "upload_time": "2019-08-11T06:53:07", "url": "https://files.pythonhosted.org/packages/b3/ed/091104cd0788ee166ecc8b6e4e90b4360a5397355052725e6d42937d97c4/textda-0.1.0.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2457625e6ba1c9cbc9539788420e335b", "sha256": "e3564367c85bd915eede083bcea2537559d209b85c3b1fa5ca6272e800298647" }, "downloads": -1, "filename": "textda-0.1.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "2457625e6ba1c9cbc9539788420e335b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13987, "upload_time": "2019-05-29T06:59:32", "url": "https://files.pythonhosted.org/packages/45/c3/28473db1835202ce6c2f16393273cef29662e84eef662cd108ac82611247/textda-0.1.0.6-py3-none-any.whl" } ] }