{ "info": { "author": "cocodrips", "author_email": "cocodrips@gmail.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 3" ], "description": "# Negima\n\nNegima is a Python package to extract phrases in Japanese text by using the part-of-speeches based rules you defined.\n\n*Negima\u306f\u65e5\u672c\u8a9e\u306e\u6587\u7ae0\u306e\u4e2d\u304b\u3089\u5b9a\u7fa9\u3057\u305f\u54c1\u8a5e\u306e\u30eb\u30fc\u30eb\u306b\u3042\u3066\u306f\u307e\u308b\u30d5\u30ec\u30fc\u30ba\u3092\u62bd\u51fa\u3059\u308bPython\u30d1\u30c3\u30b1\u30fc\u30b8\u3067\u3059\u3002*\n\n\n## Installing\n\nInstall and update using pip:\n\n```bash\n$ pip install -U negima\n```\n\nInstall using `setup.py`:\n\n```bash\n$ python setup.py install\n```\n\n\n## Dependencies\n\n- `mecab`: http://taku910.github.io/mecab/\n\n\n## A Simple Example\n\nsample.py\n\n```python\nfrom negima import MorphemeMerger\nmm = MorphemeMerger()\n# csv\nmm.set_rule_from_csv('rules/1_noun.csv')\n# tsv\n# mm.set_rule_from_csv('rules/1_noun.tsv', sep='\\t')\n# # excel\n# mm.set_rule_from_excel('rules/rules.xlsx', sheet_name='1_noun')\n\nwords, _ = mm.get_rule_pattern('\u4eca\u65e5\u306f\u3044\u3044\u5929\u6c17')\nprint(words)\n```\n\n```bash\n$ python sample.py\n ['\u4eca\u65e5', '\u5929\u6c17']\n```\n\n## Rule\n\nYou can define\u3000rules in a csv, tsv or excel format. \nA rule file requires following 9 columns. \nDefine one of part-of-speeches each row.\n\n\n*\u30eb\u30fc\u30eb\u306fcsv, tsv, excel\u30d5\u30a1\u30a4\u30eb\u306e\u5f62\u5f0f\u3067\u5b9a\u7fa9\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u307e\u3059\u3002 \n\u30eb\u30fc\u30eb\u306b\u306f\u4ee5\u4e0b\u306e9\u7a2e\u306e\u30ab\u30e9\u30e0\u304c\u5fc5\u8981\u306b\u306a\u308a\u307e\u3059\u3002\u307e\u305f\u30011\u884c\u306b\u306f1\u5f62\u614b\u7d20\u306e\u54c1\u8a5e\u306e\u60c5\u5831\u3092\u5b9a\u7fa9\u3057\u307e\u3059\u3002\n* \n\n\n- id\n - A rule starts with non-empty id column. \n *id\u304c\u7a7a\u3067\u306a\u3051\u308c\u3070\u3001\u30eb\u30fc\u30eb\u306e\u30b9\u30bf\u30fc\u30c8\u3092\u793a\u3059*\n - id has to be unique. \n *id\u306f\u30e6\u30cb\u30fc\u30af\u3067\u3042\u308b\u5fc5\u8981\u304c\u3042\u308b*\n - Rules are applied in ascendings order of id (ids are compared as UTF-8 strings, not as byte arrays). \n ex: id:000_XXX has priority over id:999_ZZZ \n *id\u306f\u6587\u5b57\u5217\u3068\u3057\u3066sort\u3055\u308c\u3066\u5c0f\u3055\u3044\u9806\u306b\u305d\u306e\u30eb\u30fc\u30eb\u306e\u512a\u5148\u5ea6\u304c\u5b9a\u7fa9\u3055\u308c\u308b \n \u4f8b: id:000_XXX\u306e\u30eb\u30fc\u30eb\u306fid:999_ZZZ\u306e\u30eb\u30fc\u30eb\u3088\u308a\u3082\u512a\u5148\u5ea6\u304c\u9ad8\u3044*\n- min\n - Minimum repeat number. 0 means that morpheme is optional. \n *\u5f62\u614b\u7d20\u306e\u6700\u5c0f\u7e70\u308a\u8fd4\u3057\u56de\u6570\u30020\u306b\u8a2d\u5b9a\u3059\u308b\u3068\u305d\u306e\u30d1\u30fc\u30c4\u306f\u3042\u3063\u3066\u3082\u306a\u304f\u3066\u3082\u826f\u3044*\n - default=1\n- max\n - Maximum repeat number \n *\u5f62\u614b\u7d20\u306e\u6700\u5927\u7e70\u308a\u8fd4\u3057\u56de\u6570*\n - default=1\n- pos0, pos1, pos2, pos3, pos4, pos5\n - Part of speeches of morphemes parsed by mecab. \n *mecab\u3067parse\u3055\u308c\u305f\u5f62\u614b\u7d20\u306e\u54c1\u8a5e\u3084\u6d3b\u7528\u306e\u540d\u524d*\n - pos0: \u8868\u5c64 (ex: \u540d\u8a5e)\n - pos1: \u54c1\u8a5e1 (ex: \u526f\u8a5e\u53ef\u80fd)\n - pos2: \u54c1\u8a5e2\n - pos3: \u54c1\u8a5e3\n - pos4: \u6d3b\u75281\n - pos5: \u6d3b\u75282\n - To represent OR condition, concatenate part-of-speeches with `|` as a separator. \n `|`\u3067\u54c1\u8a5e\u3092\u63a5\u7d9a\u3059\u308b\u3053\u3068\u3067OR\u6761\u4ef6\u306e\u5b9a\u7fa9\u304c\u53ef\u80fd\u3067\u3042\u308b\n\n\nYou can add arbitrary columns to your rule file. other columns are just ignored.\nAn example is available at `rule/3_independent_phrase.csv`, which has a row example that describes an example sentence for the rule.\n\n*\u4e0a\u8a18\u4ee5\u5916\u306b\u3082\u4efb\u610f\u306e\u5217\u306e\u8ffd\u52a0\u304c\u53ef\u80fd\u3067\u3059\u3002 \n`rule/3_independent_phrase.csv`\u3067\u306f`example`\u3068\u3044\u3046\u5217\u3092\u8ffd\u52a0\u3057\u3001\u30eb\u30fc\u30eb\u306b\u3042\u3066\u306f\u307e\u308b\u30b5\u30f3\u30d7\u30eb\u3092\u8a18\u8ff0\u3057\u3066\u3044\u307e\u3059\u3002*\n\n\n\n### Simple rule (csv)\n\nA rule to extract compound noun.\n*\u3053\u306e\u3088\u3046\u306a\u30eb\u30fc\u30eb\u3092\u5b9a\u7fa9\u3059\u308b\u3053\u3068\u3067\u3001\u8907\u5408\u540d\u8a5e\u3092\u62bd\u51fa\u3067\u304d\u307e\u3059*\n\n|id|min|max|pos0|pos1|pos2|pos3|pos4|pos5|\n|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n|1|0|2|\u63a5\u982d\u8a5e|||||\n| |1|4|\u540d\u8a5e|\u4e00\u822c|\u30b5\u5909\u63a5\u7d9a|\u6570||||\n| |0|2|\u540d\u8a5e|\u63a5\u5c3e||||\n\n\n**Caution**\n*Don't insert empty row between rules.*\n\n\n**\u6ce8\u610f**\n*\u30eb\u30fc\u30eb\u540c\u58eb\u306e\u9593\u306b\u7a7a\u884c\u3092\u306f\u3055\u307e\u306a\u3044\u3088\u3046\u306b\u3059\u308b\u3053\u3068*\n\n### Rule samples\n\n#### rule/1_noun.csv\nExtract nouns. \n*\u540d\u8a5e\u306e\u62bd\u51fa* \n\n- `\u7d045000\u4eba\u304c\u56fd\u7acb\u7af6\u6280\u5834\u306b\u99c6\u3051\u3064\u3051\u305f` -> `5000` `\u4eba` `\u56fd\u7acb` `\u7af6\u6280` `\u5834`\n- `\u5834\u6240\u304c\u308f\u304b\u308a\u306b\u304f\u3044\u306e\u3067\u305f\u3069\u308a\u7740\u3051\u306a\u304b\u3063\u305f` -> `\u5834\u6240`\n\n#### rule/2_nouns.csv\nExtract compound nouns. \n*\u8907\u5408\u540d\u8a5e\u306e\u62bd\u51fa* \n\n- `\u7d045000\u4eba\u304c\u56fd\u7acb\u7af6\u6280\u5834\u306b\u99c6\u3051\u3064\u3051\u305f` -> `\u7d045000\u4eba` `\u56fd\u7acb\u7af6\u6280\u5834` \n- `\u5834\u6240\u304c\u308f\u304b\u308a\u306b\u304f\u3044\u306e\u3067\u305f\u3069\u308a\u7740\u3051\u306a\u304b\u3063\u305f` -> `\u5834\u6240`\n\n\n#### rule/3_independent_phrase.csv\nExtract a little complex phrase. \n*\u5f62\u5bb9\u8a5e\u3084\u5426\u5b9a\u306e\u300c\u306a\u3044\u300d\u3092\u542b\u3093\u3060\u5c11\u3057\u8907\u96d1\u306a\u30eb\u30fc\u30eb\u306e\u30d5\u30a7\u30fc\u30ba\u306e\u62bd\u51fa* \n\n- `\u65b0\u4eba\u7814\u4fee\u306e\u30ec\u30d9\u30eb\u306f\u9ad8\u3044` -> `\u65b0\u4eba\u7814\u4fee` `\u30ec\u30d9\u30eb\u306f\u9ad8\u3044`\n- `\u3042\u306e\u30b5\u30a4\u30c8\u306f\u30db\u30c6\u30eb\u306e\u6bd4\u8f03\u304c\u3057\u3084\u3059\u304f\u306a\u3044\u306e\u3067\u597d\u304d\u3067\u306f\u306a\u3044` -> `\u30b5\u30a4\u30c8` `\u30db\u30c6\u30eb` `\u6bd4\u8f03\u304c\u3057\u3084\u3059\u304f\u306a\u3044` `\u597d\u304d\u3067\u306f\u306a\u3044`\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cocodrips/negima", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "negima", "package_url": "https://pypi.org/project/negima/", "platform": "", "project_url": "https://pypi.org/project/negima/", "project_urls": { "Homepage": "https://github.com/cocodrips/negima" }, "release_url": "https://pypi.org/project/negima/0.1.3/", "requires_dist": [ "mecab-python3 (>=0.7)", "pandas (>=0.19)", "xlrd (>=1.1.0)", "pytest (>=3); extra == 'dev'" ], "requires_python": ">=3.4", "summary": "Extract phrases in Japanese text by using the part-of-speeches based rules you defined.", "version": "0.1.3" }, "last_serial": 4185390, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "7de0b97fb925823619b5603fac0d73f0", "sha256": "c1e9eb3994589cc0c84d47602ed6b94df35088972099dd38c934055e8d882101" }, "downloads": -1, "filename": "negima-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "7de0b97fb925823619b5603fac0d73f0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5741, "upload_time": "2018-08-16T13:06:20", "url": "https://files.pythonhosted.org/packages/6e/89/fcb669564857d3eecbae880af70fd77da8ee52423b4099970a5a05b225ca/negima-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "161d620d2783d6de94ec97883a854402", "sha256": "16eed5af868b9b71b251e444fc7329e81da25ef8e92b44eb0582d5cfe8938cdf" }, "downloads": -1, "filename": "negima-0.1.1.tar.gz", "has_sig": false, "md5_digest": "161d620d2783d6de94ec97883a854402", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4822, "upload_time": "2018-08-16T13:06:22", "url": "https://files.pythonhosted.org/packages/ed/6a/42ef6449c72439595c420261875e3ce4ab30a30d37a9fcb0eea754918329/negima-0.1.1.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "59568855dcbb90416c16c2e19c70f6dc", "sha256": "51a226b88ab0038dae9679de3c5dd1aee02f7c0de3955357187aecc9e60aa0dd" }, "downloads": -1, "filename": "negima-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "59568855dcbb90416c16c2e19c70f6dc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 7358, "upload_time": "2018-08-19T14:42:30", "url": "https://files.pythonhosted.org/packages/c9/6e/8a416d9cb6c3f479404e3d48a0d32a326de09745550658ba0989aac8acfb/negima-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3a56cda731667064800cbed41fa807c3", "sha256": "70d13d89c63cde2d978349debaba69f1b2baac96f3c2d9e9600ed1fe15a2c350" }, "downloads": -1, "filename": "negima-0.1.3.tar.gz", "has_sig": false, "md5_digest": "3a56cda731667064800cbed41fa807c3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 6726, "upload_time": "2018-08-19T14:42:32", "url": "https://files.pythonhosted.org/packages/dd/e2/920889fc985905e54c0dc5a4f51f87f71b743398714a126fcca70cc653a1/negima-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "59568855dcbb90416c16c2e19c70f6dc", "sha256": "51a226b88ab0038dae9679de3c5dd1aee02f7c0de3955357187aecc9e60aa0dd" }, "downloads": -1, "filename": "negima-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "59568855dcbb90416c16c2e19c70f6dc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 7358, "upload_time": "2018-08-19T14:42:30", "url": "https://files.pythonhosted.org/packages/c9/6e/8a416d9cb6c3f479404e3d48a0d32a326de09745550658ba0989aac8acfb/negima-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3a56cda731667064800cbed41fa807c3", "sha256": "70d13d89c63cde2d978349debaba69f1b2baac96f3c2d9e9600ed1fe15a2c350" }, "downloads": -1, "filename": "negima-0.1.3.tar.gz", "has_sig": false, "md5_digest": "3a56cda731667064800cbed41fa807c3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 6726, "upload_time": "2018-08-19T14:42:32", "url": "https://files.pythonhosted.org/packages/dd/e2/920889fc985905e54c0dc5a4f51f87f71b743398714a126fcca70cc653a1/negima-0.1.3.tar.gz" } ] }