{ "info": { "author": "Kyubyong Park, Dongwoo Kim, Yo Joong Choe", "author_email": "kbpark.linguist@gmail.com, kimdwkimdw@gmail.com, yjchoe33@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "[![image](https://img.shields.io/pypi/v/word2word.svg)](https://pypi.org/project/word2word/)\n[![image](https://img.shields.io/pypi/l/word2word.svg)](https://pypi.org/project/word2word/)\n[![image](https://img.shields.io/pypi/pyversions/word2word.svg)](https://pypi.org/project/word2word/)\n[![image](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/kimdwkimdw)\n\n# word2word\n\nEasy-to-use word-to-word translations for 3,564 language pairs.\n\n## Key Features\n\n* A large collection of freely & publicly available word-to-word translations \n **for 3,564 language pairs across 62 unique languages.** \n* Easy-to-use Python interface.\n* Constructed using an efficient approach that is quantitatively examined by \n proficient bilingual human labelers.\n\n## Usage\n\nFirst, install the package using `pip`:\n```bash\npip install word2word\n```\n\nOR\n\n```\ngit clone https://github.com/Kyubyong/word2word.git\npython setup.py install\n```\n\nThen, in Python, download the model and retrieve top-5 word translations \nof any given word to the desired language:\n```python\nfrom word2word import Word2word\nen2fr = Word2word(\"en\", \"fr\")\nprint(en2fr(\"apple\"))\n# out: ['pomme', 'pommes', 'pommier', 'tartes', 'fleurs']\n```\n\n![gif](./word2word.gif)\n\n## Supported Languages\n\nWe provide top-k word-to-word translations across all available pairs \n from [OpenSubtitles2018](http://opus.nlpl.eu/OpenSubtitles2018.php). \nThis amounts to a total of 3,564 language pairs across 62 unique languages. \n\nThe full list is provided [here](word2word/supporting_languages.txt).\n\n## Methodology\n\nOur approach computes the top-k word-to-word translations based on \nthe co-occurrence statistics between cross-lingual word pairs in a parallel corpus.\nWe additionally introduce a correction term that controls for any confounding effect\ncoming from other source words within the same sentence.\nThe resulting method is an efficient and scalable approach that allows us to\nconstruct large bilingual dictionaries from any given parallel corpus. \n\nFor more details, see the Methods section of [our paper draft](word2word-draft.pdf).\n\n\n## Comparisons with Existing Software\n\nA popular publicly available dataset of word-to-word translations is \n[`facebookresearch/MUSE`](https://github.com/facebookresearch/MUSE), which \nincludes 110 bilingual dictionaries that are built from Facebook's internal translation tool.\nIn comparison to MUSE, `word2word` does not rely on a translation software\nand contains much larger sets of language pairs (3,564). \n`word2word` also provides the top-k word-to-word translations for up to 100k words \n(compared to 5~10k words in MUSE) and can be applied to any language pairs\nfor which there is a parallel corpus. \n\nIn terms of quality, while a direct comparison between the two methods is difficult, \nwe did notice that MUSE's bilingual dictionaries involving non-European languages may be not as useful. \nFor English-Vietnamese, we found that 80% of the 1,500 word pairs in \nthe validation set had the same word twice as a pair\n(e.g. crimson-crimson, Suzuki-Suzuki, Randall-Randall). \n\nFor more details, see Appendix in [our paper draft](word2word-draft.pdf). \n\n\n## References\n\nIf you use our software for research, please cite:\n```bibtex\n@misc{word2word2019,\n author = {Park, Kyubyong and Kim, Dongwoo and Choe, Yo Joong},\n title = {word2word},\n year = {2019},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/Kyubyong/word2word}}\n}\n```\n(We may later update this bibtex with a reference to [our paper report](word2word-draft.pdf).)\n\nAll of our word-to-word translations were constructed from the publicly available\n [OpenSubtitles2018](http://opus.nlpl.eu/OpenSubtitles2018.php) dataset:\n```bibtex\n@article{opensubtitles2016,\n title={Opensubtitles2016: Extracting large parallel corpora from movie and tv subtitles},\n author={Lison, Pierre and Tiedemann, J{\\\"o}rg},\n year={2016},\n publisher={European Language Resources Association}\n}\n```\n\n## Authors\n\n[Kyubyong Park](https://github.com/Kyubyong), \n[Dongwoo Kim](https://github.com/kimdwkimdw), and \n[YJ Choe](https://github.com/yjchoe)\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Kyubyong/word2word", "keywords": "", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "word2word", "package_url": "https://pypi.org/project/word2word/", "platform": "", "project_url": "https://pypi.org/project/word2word/", "project_urls": { "Homepage": "https://github.com/Kyubyong/word2word" }, "release_url": "https://pypi.org/project/word2word/0.1.6/", "requires_dist": [ "requests", "wget" ], "requires_python": ">=3.6", "summary": "Word Translator for 3,564 Language Pairs", "version": "0.1.6" }, "last_serial": 5112387, "releases": { "0.1.6": [ { "comment_text": "", "digests": { "md5": "94ed33e6ad94684f8044d64918b81de8", "sha256": "d8cfe449d713010a3428c37bc2322fbc01b656b6223e65956a752518f27414c4" }, "downloads": -1, "filename": "word2word-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "94ed33e6ad94684f8044d64918b81de8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 17770, "upload_time": "2019-04-08T07:34:18", "url": "https://files.pythonhosted.org/packages/2f/51/0f28402ff8b92be33fd5d9d0a83f1d608b9298d6802bacabd3da283deac7/word2word-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2b39f34a1c4656f04ebe49706ebb6cb5", "sha256": "1e7b8b6076f8108c5b399215f6b16594645eb86b7e4f97cf60c8726767f611be" }, "downloads": -1, "filename": "word2word-0.1.6.tar.gz", "has_sig": false, "md5_digest": "2b39f34a1c4656f04ebe49706ebb6cb5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 15965, "upload_time": "2019-04-08T07:34:19", "url": "https://files.pythonhosted.org/packages/3d/57/0b228993a23958778b98ae51a0a8f089c62396e068186747545a0db1280f/word2word-0.1.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "94ed33e6ad94684f8044d64918b81de8", "sha256": "d8cfe449d713010a3428c37bc2322fbc01b656b6223e65956a752518f27414c4" }, "downloads": -1, "filename": "word2word-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "94ed33e6ad94684f8044d64918b81de8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 17770, "upload_time": "2019-04-08T07:34:18", "url": "https://files.pythonhosted.org/packages/2f/51/0f28402ff8b92be33fd5d9d0a83f1d608b9298d6802bacabd3da283deac7/word2word-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2b39f34a1c4656f04ebe49706ebb6cb5", "sha256": "1e7b8b6076f8108c5b399215f6b16594645eb86b7e4f97cf60c8726767f611be" }, "downloads": -1, "filename": "word2word-0.1.6.tar.gz", "has_sig": false, "md5_digest": "2b39f34a1c4656f04ebe49706ebb6cb5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 15965, "upload_time": "2019-04-08T07:34:19", "url": "https://files.pythonhosted.org/packages/3d/57/0b228993a23958778b98ae51a0a8f089c62396e068186747545a0db1280f/word2word-0.1.6.tar.gz" } ] }