{ "info": { "author": "haminiku", "author_email": "haminiku1129@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "TF-IDF calculator system of Japanese language\n====================================================\n\nPython \u65e5\u672c\u8a9e\u306eTF-IDF\u8a08\u7b97\u6a5f\n\n\u65e5\u672c\u8a9e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8: `White Paper`_\n\nFeatures\n--------\n- TF-IDF from Web\n\nInstallation\n-----------------\n\n.. code-block:: bash\n\n $ pip install simple_tfidf_japanese\n\nSample Code\n-----------------\n\n.. code-block:: python\n\n # \u6587\u7ae0\u304b\u3089tfidf\u3092\u51fa\u529b(Get TF-IDF from text)\n from simple_tfidf_japanese.tfidf import TFIDF\n text = \"\u8089\u30d5\u30a7\u30b9NIIGATA\u3067\u8089\u4e09\u6627\u306e\u591c\u3054\u306f\u3093\u2764\ufe0e\u30b9\u30c6\u30fc\u30ad\u30cf\u30a6\u30b9\u3042\u3065\u307e\u3055\u3093\u306e\u96ea\u5ba4\u719f\u6210\u65b0\u6f5f\u770c\u7523\u725b\u30b9\u30c6\u30fc\u30ad\u304a\u3044\u3057\u3044*\\(^o^)/*\u304a\u5869\u3067\u3082\u30ef\u30b5\u30d3\u3067\u3082\u3074\u3063\u305f\u308a\uff01\"\n tfidf1 = TFIDF.gen(text, enable_one_char=1)\n for key, value in tfidf1:\n print key, value\n\n >>> \u8089 0.0952380952381\n >>> \u30b9\u30c6\u30fc\u30ad 0.0952380952381\n >>> \u304a 0.047619047619\n >>> \u3054\u306f\u3093 0.047619047619\n >>> \u96ea 0.047619047619\n >>> \u65b0\u6f5f 0.047619047619\n >>> \u719f\u6210 0.047619047619\n ...\n\n # Web\u304b\u3089tfidf\u3092\u51fa\u529b(Get TF-IDF from Web)\n url = \"https://ja.wikipedia.org/wiki/%E6%B7%A1%E8%B7%AF%E3%83%93%E3%83%BC%E3%83%95\"\n tfidf2 = TFIDF.gen_web(url)\n for key, value in tfidf2:\n print key, value\n\n >>> \u6de1\u8def 0.0453257790368\n >>> \u30d3\u30fc\u30d5 0.0396600566572\n >>> \u4f46\u99ac 0.0198300283286\n >>> \u6de1\u8def\u5cf6 0.0169971671388\n >>> \u30da\u30fc\u30b8 0.0169971671388\n >>> \u8868\u793a 0.014164305949\n\n # TF-IDF Cosine Similarity\u3067\u985e\u4f3c\u5ea6\u3092\u8a08\u7b97(calc TF-IDF Cosine Similarity)\n tfidf1 = [['Apple', 1], ['Orange', 2], ['Banana', 1], ['Kiwi', 0]]\n tfidf2 = [['Apple', 1], ['Orange', 0], ['Banana', 2], ['Kiwi', 1]]\n print TFIDF.similarity(tfidf1, tfidf2)\n >>> 0.5\n ...\n\nSample Code2\n-----------------\n\n.. code-block:: python\n\n from simple_tfidf_japanese.tfidf import TFIDF\n\n # \u5c71\u672c\u660c\n _base_url = \"https://ja.wikipedia.org/wiki/%E5%B1%B1%E6%9C%AC%E6%98%8C\"\n\n # \u6bd4\u8f03\u5bfe\u8c61\n data = [\n ['\u30e4\u30af\u30eb\u30c8', 'https://ja.wikipedia.org/wiki/%E6%9D%B1%E4%BA%AC%E3%83%A4%E3%82%AF%E3%83%AB%E3%83%88%E3%82%B9%E3%83%AF%E3%83%AD%E3%83%BC%E3%82%BA'],\n ['\u5de8\u4eba', 'https://ja.wikipedia.org/wiki/%E8%AA%AD%E5%A3%B2%E3%82%B8%E3%83%A3%E3%82%A4%E3%82%A2%E3%83%B3%E3%83%84'],\n ['\u962a\u795e', 'https://ja.wikipedia.org/wiki/%E9%98%AA%E7%A5%9E%E3%82%BF%E3%82%A4%E3%82%AC%E3%83%BC%E3%82%B9'],\n ['\u5e83\u5cf6', 'https://ja.wikipedia.org/wiki/%E5%BA%83%E5%B3%B6%E6%9D%B1%E6%B4%8B%E3%82%AB%E3%83%BC%E3%83%97'],\n ['\u4e2d\u65e5', 'https://ja.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E3%83%89%E3%83%A9%E3%82%B4%E3%83%B3%E3%82%BA'],\n ['\u6a2a\u6d5c', 'https://ja.wikipedia.org/wiki/%E6%A8%AA%E6%B5%9CDeNA%E3%83%99%E3%82%A4%E3%82%B9%E3%82%BF%E3%83%BC%E3%82%BA'],\n ['\u30bd\u30d5\u30d0\u30f3', 'https://ja.wikipedia.org/wiki/%E7%A6%8F%E5%B2%A1%E3%82%BD%E3%83%95%E3%83%88%E3%83%90%E3%83%B3%E3%82%AF%E3%83%9B%E3%83%BC%E3%82%AF%E3%82%B9'],\n ['\u65e5\u30cf\u30e0', 'https://ja.wikipedia.org/wiki/%E5%8C%97%E6%B5%B7%E9%81%93%E6%97%A5%E6%9C%AC%E3%83%8F%E3%83%A0%E3%83%95%E3%82%A1%E3%82%A4%E3%82%BF%E3%83%BC%E3%82%BA'],\n ['\u30ed\u30c3\u30c6', 'https://ja.wikipedia.org/wiki/%E5%8D%83%E8%91%89%E3%83%AD%E3%83%83%E3%83%86%E3%83%9E%E3%83%AA%E3%83%BC%E3%83%B3%E3%82%BA'],\n ['\u897f\u6b66', 'https://ja.wikipedia.org/wiki/%E5%9F%BC%E7%8E%89%E8%A5%BF%E6%AD%A6%E3%83%A9%E3%82%A4%E3%82%AA%E3%83%B3%E3%82%BA'],\n ['\u30aa\u30ea\u30c3\u30af\u30b9', 'https://ja.wikipedia.org/wiki/%E3%82%AA%E3%83%AA%E3%83%83%E3%82%AF%E3%82%B9%E3%83%BB%E3%83%90%E3%83%95%E3%82%A1%E3%83%AD%E3%83%BC%E3%82%BA'],\n ['\u697d\u5929', 'https://ja.wikipedia.org/wiki/%E6%9D%B1%E5%8C%97%E6%A5%BD%E5%A4%A9%E3%82%B4%E3%83%BC%E3%83%AB%E3%83%87%E3%83%B3%E3%82%A4%E3%83%BC%E3%82%B0%E3%83%AB%E3%82%B9'],\n ['\u30b5\u30c3\u30ab\u30fc\u65e5\u672c\u4ee3\u8868', 'https://ja.wikipedia.org/wiki/%E3%82%B5%E3%83%83%E3%82%AB%E3%83%BC%E6%97%A5%E6%9C%AC%E4%BB%A3%E8%A1%A8'],\n ]\n\n # \u8a08\u7b97\n result = TFIDF.some_similarity(_base_url, data)\n\n # \u7d50\u679c\u8868\u793a\n result.sord(key=lambda x: x[2], reverse=True)\n for title, url, value in result:\n print title, value\n\n \"\"\"\n \u5de8\u4eba 0.437053886215\n \u30e4\u30af\u30eb\u30c8 0.399745780763\n \u962a\u795e 0.383247816027\n \u5e83\u5cf6 0.356147904333\n \u30ed\u30c3\u30c6 0.351312791912\n \u4e2d\u65e5 0.344772305253\n \u6a2a\u6d5c 0.334360056622\n \u65e5\u30cf\u30e0 0.326226324436\n \u30aa\u30ea\u30c3\u30af\u30b9 0.317250711462\n \u30bd\u30d5\u30d0\u30f3 0.285703674673\n \u897f\u6b66 0.283181229507\n \u697d\u5929 0.275111280558\n \u30b5\u30c3\u30ab\u30fc\u65e5\u672c\u4ee3\u8868 0.177026402257\n \"\"\"\n\n\n.. _`White Paper`: http://qiita.com/haminiku/items/4106395b9580fbd6edf2", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/subc/simple_tfidf_japanese", "keywords": "tfidf,TF-IDF,Japanese", "license": "MIT License", "maintainer": null, "maintainer_email": null, "name": "simple_tfidf_japanese", "package_url": "https://pypi.org/project/simple_tfidf_japanese/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/simple_tfidf_japanese/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/subc/simple_tfidf_japanese" }, "release_url": "https://pypi.org/project/simple_tfidf_japanese/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "Japanese TF-IDF", "version": "0.1.3" }, "last_serial": 1777691, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9f150ebdaba338ab3a9049b798ea7aa2", "sha256": "79ddf8eeff336fe7e2167f5322136a1f37ed7407f83a6e93441f24e94b9bb5a7" }, "downloads": -1, "filename": "simple_tfidf_japanese-0.1.0.tar.gz", "has_sig": false, "md5_digest": "9f150ebdaba338ab3a9049b798ea7aa2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3848, "upload_time": "2015-10-20T07:51:34", "url": "https://files.pythonhosted.org/packages/24/9b/f6d83083ded54e654228ca86c93b1f5e6144fd3713c30c902e53aab6de12/simple_tfidf_japanese-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "04eb2e3f3e1d2b23a2481fa544cda9ac", "sha256": "0a81cbdce64f8fd50d130a5a35aa69351a57bd58719c4d2a2a1b4aafdbb7f7b4" }, "downloads": -1, "filename": "simple_tfidf_japanese-0.1.1.tar.gz", "has_sig": false, "md5_digest": "04eb2e3f3e1d2b23a2481fa544cda9ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3853, "upload_time": "2015-10-20T07:59:53", "url": "https://files.pythonhosted.org/packages/7a/80/59993c0056afd613154b935d62e887b509e25601e50a164a640f2f6a7540/simple_tfidf_japanese-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "590187f6ac6a342152a00f85929267a4", "sha256": "3d007378ec9b5c5b4e907263c2d5546c1da58944979161f8de8e747ac82255ec" }, "downloads": -1, "filename": "simple_tfidf_japanese-0.1.2.tar.gz", "has_sig": false, "md5_digest": "590187f6ac6a342152a00f85929267a4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4889, "upload_time": "2015-10-20T08:07:19", "url": "https://files.pythonhosted.org/packages/b3/d1/33ec6e7affc97c978c59c7c86636760d25084424d37e1d46518699f9cab1/simple_tfidf_japanese-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "82a1089bd19d9bfe5757bcb8a3877432", "sha256": "46a32a42d4dfff96255cb88336c7d4c85b8c397dea19d53a9453e4f06e236180" }, "downloads": -1, "filename": "simple_tfidf_japanese-0.1.3.tar.gz", "has_sig": false, "md5_digest": "82a1089bd19d9bfe5757bcb8a3877432", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4892, "upload_time": "2015-10-20T08:45:33", "url": "https://files.pythonhosted.org/packages/8b/31/a15436f4e233a15e578f071f3c6a620ad70fd922f8dfe9a9606745a5f2c1/simple_tfidf_japanese-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "82a1089bd19d9bfe5757bcb8a3877432", "sha256": "46a32a42d4dfff96255cb88336c7d4c85b8c397dea19d53a9453e4f06e236180" }, "downloads": -1, "filename": "simple_tfidf_japanese-0.1.3.tar.gz", "has_sig": false, "md5_digest": "82a1089bd19d9bfe5757bcb8a3877432", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4892, "upload_time": "2015-10-20T08:45:33", "url": "https://files.pythonhosted.org/packages/8b/31/a15436f4e233a15e578f071f3c6a620ad70fd922f8dfe9a9606745a5f2c1/simple_tfidf_japanese-0.1.3.tar.gz" } ] }