{ "info": { "author": "Sho Yokoi", "author_email": "yokoi@ecei.tohoku.ac.jp", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: BSD License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# Pointwise Hilbert\u7ab6\u9244chmidt Independence Criterion (PHSIC)\n\nCompute *co-occurrence* between two objects utilizing *similarities*.\n\nFor example, given consistent sentence pairs:\n\n| X | Y |\n| ------------------------------------------------------------ | ------------------ |\n| They had breakfast at the hotel. | They are full now. |\n| They had breakfast at ten. | I'm full. |\n| She had breakfast with her friends. | She felt happy. |\n| They had breakfast with their friends at the Japanese restaurant. | They felt happy. |\n| He have trouble with his homework. | He cries. |\n| I have trouble associating with others. | I cry. |\n\nPHSIC can give high scores to consistent pairs in terms of the given pairs:\n\n| X | Y | score |\n| -------------------------------------------- | --------------------- | ------ |\n| They had breakfast at the hotel. | They are full now. | 0.1134 |\n| They had breakfast at an Italian restaurant. | They are stuffed now. | 0.0023 |\n| I have dinner. | I have dinner again. | 0.0023 |\n\n## Installation\n\n```\n$ pip install phsic\n```\n\nThis will install `phsic` command to your environment:\n\n```\n$ phsic --help\n```\n\n## Basic Usage\n\nDownload pre-trained wordvecs (e.g. fasttext):\n\n```\n$ wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M.vec.zip\n$ unzip crawl-300d-2M.vec.zip\n```\n\nPrepare dataset:\n\n```\n$ TAB=\"$(printf '\\t')\"\n$ cat << EOF > train.txt\nThey had breakfast at the hotel.${TAB}They are full now.\nThey had breakfast at ten.${TAB}I'm full.\nShe had breakfast with her friends.${TAB}She felt happy.\nThey had breakfast with their friends at the Japanese restaurant.${TAB}They felt happy.\nHe have trouble with his homework.${TAB}He cries.\nI have trouble associating with others.${TAB}I cry.\nEOF\n$ cut -f 1 train.txt > train_X.txt\n$ cut -f 2 train.txt > train_Y.txt\n$ cat << EOF > test.txt\nThey had breakfast at the hotel.${TAB}They are full now.\nThey had breakfast at an Italian restaurant.${TAB}They are stuffed now.\nI have dinner.${TAB}I have dinner again.\nEOF\n$ cut -f 1 test.txt > test_X.txt\n$ cut -f 2 test.txt > test_Y.txt\n```\n\nThen, train and predict:\n\n```\n$ phsic train_X.txt train_Y.txt --kernel1 Gaussian 1.0 --encoder1 SumBov FasttextEn --emb1 crawl-300d-2M.vec --kernel2 Gaussian 1.0 --encoder2 SumBov FasttextEn --emb2 crawl-300d-2M.vec --limit_words1 10000 --limit_words2 10000 --dim1 3 --dim2 3 --out_prefix toy --out_dir out --X_test test_X.txt --Y_test test_Y.txt\n$ cat toy.Gaussian-1.0-SumBov-FasttextEn.Gaussian-1.0-SumBov-FasttextEn.3.3.phsic\n1.134489336180434238e-01\n2.320408776101631244e-03\n2.321869174772554344e-03\n```\n\n## Citation\n\n```\n@InProceedings{D18-1203,\n author = \t\"Yokoi, Sho\n and Kobayashi, Sosuke\n and Fukumizu, Kenji\n and Suzuki, Jun\n and Inui, Kentaro\",\n title = \t\"Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions\",\n booktitle = \t\"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing\",\n year = \t\"2018\",\n publisher = \t\"Association for Computational Linguistics\",\n pages = \t\"1763--1775\",\n location = \t\"Brussels, Belgium\",\n url = \t\"http://aclweb.org/anthology/D18-1203\"\n}\n```\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cl-tohoku/phsic", "keywords": "", "license": "BSD-3-Clause", "maintainer": "Sho Yokoi", "maintainer_email": "yokoi@ecei.tohoku.ac.jp", "name": "phsic-cli", "package_url": "https://pypi.org/project/phsic-cli/", "platform": "", "project_url": "https://pypi.org/project/phsic-cli/", "project_urls": { "Homepage": "https://github.com/cl-tohoku/phsic", "Repository": "https://github.com/cl-tohoku/phsic" }, "release_url": "https://pypi.org/project/phsic-cli/0.1.0/", "requires_dist": [ "dill (>=0.2.8,<0.3.0)", "numpy (>=1.15,<2.0)", "gensim (>=3.6,<4.0)", "sklearn (>=0.0.0,<0.0.1)", "tensorflow (>=1.7,<2.0)", "tensorflow-hub (>=0.1.1,<0.2.0)", "scipy (>=1.1,<2.0)", "statsmodels (>=0.9.0,<0.10.0)", "pandas (>=0.23.4,<0.24.0)" ], "requires_python": ">=3.6,<4.0", "summary": "Pointwise Hilbert\u2013Schmidt Independence Criterion (PHSIC)", "version": "0.1.0" }, "last_serial": 4445726, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "c96834a010e903c0088984ffd23e8776", "sha256": "085579e96158ebe8e69ecb5f4252d101654c7fa65c4e8c8606d7cd6e585fa9db" }, "downloads": -1, "filename": "phsic_cli-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c96834a010e903c0088984ffd23e8776", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 39204, "upload_time": "2018-11-02T18:48:36", "url": "https://files.pythonhosted.org/packages/68/a3/5c906dd2d6c40f9d4e8f047ca603824bf8428b484c707c963cde4b137d6e/phsic_cli-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "72990d7778b639290b68bde5b9aa88f7", "sha256": "42dace8336026d0dcfbe2f7e062fff644e4cf908f0d65abee558a10950a59096" }, "downloads": -1, "filename": "phsic-cli-0.1.0.tar.gz", "has_sig": false, "md5_digest": "72990d7778b639290b68bde5b9aa88f7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 11590, "upload_time": "2018-11-02T18:48:02", "url": "https://files.pythonhosted.org/packages/f9/e6/aa405ee91d870b99a5491f6fcf8026406029b067e2c739ecbe801bfc3104/phsic-cli-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c96834a010e903c0088984ffd23e8776", "sha256": "085579e96158ebe8e69ecb5f4252d101654c7fa65c4e8c8606d7cd6e585fa9db" }, "downloads": -1, "filename": "phsic_cli-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c96834a010e903c0088984ffd23e8776", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 39204, "upload_time": "2018-11-02T18:48:36", "url": "https://files.pythonhosted.org/packages/68/a3/5c906dd2d6c40f9d4e8f047ca603824bf8428b484c707c963cde4b137d6e/phsic_cli-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "72990d7778b639290b68bde5b9aa88f7", "sha256": "42dace8336026d0dcfbe2f7e062fff644e4cf908f0d65abee558a10950a59096" }, "downloads": -1, "filename": "phsic-cli-0.1.0.tar.gz", "has_sig": false, "md5_digest": "72990d7778b639290b68bde5b9aa88f7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 11590, "upload_time": "2018-11-02T18:48:02", "url": "https://files.pythonhosted.org/packages/f9/e6/aa405ee91d870b99a5491f6fcf8026406029b067e2c739ecbe801bfc3104/phsic-cli-0.1.0.tar.gz" } ] }