{ "info": { "author": "Samiksha Manjunath", "author_email": "samiksha.manjunath@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "Wikipedia2Vec\n=============\n\n[![Fury badge](https://badge.fury.io/py/wikipedia2vec.png)](http://badge.fury.io/py/wikipedia2vec)\n[![CircleCI](https://circleci.com/gh/wikipedia2vec/wikipedia2vec.svg?style=svg)](https://circleci.com/gh/wikipedia2vec/wikipedia2vec)\n\nWikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.\nIt is developed and maintained by [Studio Ousia](http://www.ousia.jp).\n\nThis tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space.\nEmbeddings can be easily trained by a single command with a publicly available Wikipedia dump as input.\n\nThis tool implements the [conventional skip-gram model](https://en.wikipedia.org/wiki/Word2vec) to learn the embeddings of words, and its extension proposed in [Yamada et al. (2016)](https://arxiv.org/abs/1601.01343) to learn the embeddings of entities.\nThis tool has been used in several state-of-the-art NLP models such as [entity linking](https://arxiv.org/abs/1601.01343), [named entity recognition](http://www.aclweb.org/anthology/I17-2017), [knowledge graph completion](https://www.aaai.org/Papers/AAAI/2019/AAAI-ShahH.6029.pdf), [entity relatedness](https://arxiv.org/abs/1601.01343), and [question answering](https://arxiv.org/abs/1803.08652).\n\nThis tool has been tested on Linux, Windows, and macOS.\n\nAn empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).\n\nDocumentation and pretrained embeddings for 12 languages (English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).\n\nBasic Usage\n-----------\n\nWikipedia2Vec can be installed via PyPI:\n\n```bash\n% pip install wikipedia2vec\n```\n\nWith this tool, embeddings can be learned by running a *train* command with a Wikipedia dump as input.\nFor example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:\n\n```bash\n% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2\n% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE\n```\n\nThen, the learned embeddings are written to *MODEL\\_FILE*.\nNote that this command can take many optional parameters.\nPlease refer to [our documentation](https://wikipedia2vec.github.io/wikipedia2vec/commands/) for further details.\n\nReference\n---------\n\nIf you use Wikipedia2Vec in a scientific publication, please cite the following paper:\n\nIkuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, [Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia](https://arxiv.org/abs/1812.06280).\n\n```text\n@article{yamada2018wikipedia2vec,\n title={Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia},\n author={Yamada, Ikuya and Asai, Akari and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},\n journal={arXiv preprint 1812.06280},\n year={2018}\n}\n```\n\nLicense\n-------\n\n[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/samikshm/wikipedia2vecsm/archive/0.2.3.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/samikshm/wikipedia2vecsm", "keywords": "wikipedia,embedding,wikipedia2vec", "license": "", "maintainer": "", "maintainer_email": "", "name": "wikipedia2vecsm", "package_url": "https://pypi.org/project/wikipedia2vecsm/", "platform": "", "project_url": "https://pypi.org/project/wikipedia2vecsm/", "project_urls": { "Download": "https://github.com/samikshm/wikipedia2vecsm/archive/0.2.3.tar.gz", "Homepage": "https://github.com/samikshm/wikipedia2vecsm" }, "release_url": "https://pypi.org/project/wikipedia2vecsm/0.2.3/", "requires_dist": [ "click", "jieba", "joblib", "lmdb", "marisa-trie", "mwparserfromhell", "numpy", "scipy", "six", "tqdm" ], "requires_python": "", "summary": "A tool for learning vector representations of words and entities from Wikipedia", "version": "0.2.3" }, "last_serial": 5147997, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "56cc47a28180686b7d6f354e82ae44f8", "sha256": "d218b0d6819b645cb299f6a8c827481cc91cd08ff84ea290a2d2231b77fddbd7" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.tar.gz", "has_sig": false, "md5_digest": "56cc47a28180686b7d6f354e82ae44f8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 996638, "upload_time": "2019-04-16T03:36:48", "url": "https://files.pythonhosted.org/packages/f3/22/448ef327f59c67485112be23612954c6bc6bc521e52c7c6d5209f90d1dea/wikipedia2vecsm-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "dce2987c3a1017fc091a6a35a0feda77", "sha256": "5415acb06705dcc161377d59bc8e0d76fd9234420f30a99ca30f267475c9a44f" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.1.tar.gz", "has_sig": false, "md5_digest": "dce2987c3a1017fc091a6a35a0feda77", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1180617, "upload_time": "2019-04-16T03:42:24", "url": "https://files.pythonhosted.org/packages/00/c9/9d71685718f21664e80ada78f4c11ad5b4856ea45ffc8f1ccf31e1918111/wikipedia2vecsm-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "01b47cba12aa0cdd5971931227f4636f", "sha256": "e1dbe6624313517ef02969b305dff83d69cad6ca5515c092c78fe548c335ef40" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.2.tar.gz", "has_sig": false, "md5_digest": "01b47cba12aa0cdd5971931227f4636f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1180636, "upload_time": "2019-04-16T03:46:26", "url": "https://files.pythonhosted.org/packages/89/3d/b15fe219bc829d1848f0b72d5de4a475f0dca7cb06a25c23207493876cae/wikipedia2vecsm-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "73b26477f24e3ff43bf013abee13722e", "sha256": "8b8fa6770aa4db82fe1b6d5f30b2de28fcafdefe0b4c1be7707572859ad2cdd6" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.3-py2-none-any.whl", "has_sig": false, "md5_digest": "73b26477f24e3ff43bf013abee13722e", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 1223714, "upload_time": "2019-04-16T04:00:09", "url": "https://files.pythonhosted.org/packages/d1/0d/0bf3bcf743043e6d010017024df8293e4c7a4ef8d92330967f955a88653c/wikipedia2vecsm-0.2.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bd879f48f4b801128fd48a284c75e697", "sha256": "5f4c86d860124d80606cba860886fda8bfe22e645e204359914ccf71ac1dae94" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.3.tar.gz", "has_sig": false, "md5_digest": "bd879f48f4b801128fd48a284c75e697", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1180644, "upload_time": "2019-04-16T04:00:16", "url": "https://files.pythonhosted.org/packages/25/fb/ac4ee10db4587f9edd6028839b51f7e04223d81a084dab2cba379733b8a9/wikipedia2vecsm-0.2.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "73b26477f24e3ff43bf013abee13722e", "sha256": "8b8fa6770aa4db82fe1b6d5f30b2de28fcafdefe0b4c1be7707572859ad2cdd6" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.3-py2-none-any.whl", "has_sig": false, "md5_digest": "73b26477f24e3ff43bf013abee13722e", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 1223714, "upload_time": "2019-04-16T04:00:09", "url": "https://files.pythonhosted.org/packages/d1/0d/0bf3bcf743043e6d010017024df8293e4c7a4ef8d92330967f955a88653c/wikipedia2vecsm-0.2.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bd879f48f4b801128fd48a284c75e697", "sha256": "5f4c86d860124d80606cba860886fda8bfe22e645e204359914ccf71ac1dae94" }, "downloads": -1, "filename": "wikipedia2vecsm-0.2.3.tar.gz", "has_sig": false, "md5_digest": "bd879f48f4b801128fd48a284c75e697", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1180644, "upload_time": "2019-04-16T04:00:16", "url": "https://files.pythonhosted.org/packages/25/fb/ac4ee10db4587f9edd6028839b51f7e04223d81a084dab2cba379733b8a9/wikipedia2vecsm-0.2.3.tar.gz" } ] }