{ "info": { "author": "Samiksha Manjunath", "author_email": "samiksha.manjunath@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "Wikipedia2Vec\n=============\n\n[![Fury badge](https://badge.fury.io/py/wikipedia2vec.png)](http://badge.fury.io/py/wikipedia2vec)\n[![CircleCI](https://circleci.com/gh/wikipedia2vec/wikipedia2vec.svg?style=svg)](https://circleci.com/gh/wikipedia2vec/wikipedia2vec)\n\nWikipedia2Vec is a tool used for obtaining embeddings (or vector representations) of words and entities (i.e., concepts that have corresponding pages in Wikipedia) from Wikipedia.\nIt is developed and maintained by [Studio Ousia](http://www.ousia.jp).\n\nThis tool enables you to learn embeddings of words and entities simultaneously, and places similar words and entities close to one another in a continuous vector space.\nEmbeddings can be easily trained by a single command with a publicly available Wikipedia dump as input.\n\nThis tool implements the [conventional skip-gram model](https://en.wikipedia.org/wiki/Word2vec) to learn the embeddings of words, and its extension proposed in [Yamada et al. (2016)](https://arxiv.org/abs/1601.01343) to learn the embeddings of entities.\nThis tool has been used in several state-of-the-art NLP models such as [entity linking](https://arxiv.org/abs/1601.01343), [named entity recognition](http://www.aclweb.org/anthology/I17-2017), [knowledge graph completion](https://www.aaai.org/Papers/AAAI/2019/AAAI-ShahH.6029.pdf), [entity relatedness](https://arxiv.org/abs/1601.01343), and [question answering](https://arxiv.org/abs/1803.08652).\n\nThis tool has been tested on Linux, Windows, and macOS.\n\nAn empirical comparison between Wikipedia2Vec and existing embedding tools (i.e., FastText, Gensim, RDF2Vec, and Wiki2vec) is available [here](https://arxiv.org/abs/1812.06280).\n\nDocumentation and pretrained embeddings for 12 languages (English, Arabic, Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).\n\nBasic Usage\n-----------\n\nWikipedia2Vec can be installed via PyPI:\n\n```bash\n% pip install wikipedia2vec\n```\n\nWith this tool, embeddings can be learned by running a *train* command with a Wikipedia dump as input.\nFor example, the following commands download the latest English Wikipedia dump and learn embeddings from this dump:\n\n```bash\n% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2\n% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE\n```\n\nThen, the learned embeddings are written to *MODEL\\_FILE*.\nNote that this command can take many optional parameters.\nPlease refer to [our documentation](https://wikipedia2vec.github.io/wikipedia2vec/commands/) for further details.\n\nReference\n---------\n\nIf you use Wikipedia2Vec in a scientific publication, please cite the following paper:\n\nIkuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, [Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia](https://arxiv.org/abs/1812.06280).\n\n```text\n@article{yamada2018wikipedia2vec,\n title={Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia},\n author={Yamada, Ikuya and Asai, Akari and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},\n journal={arXiv preprint 1812.06280},\n year={2018}\n}\n```\n\nLicense\n-------\n\n[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/samikshm/wikipedia2vec_SM002583/archive/0.2.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/samikshm/wikipedia2vec_SM002583", "keywords": "wikipedia,embedding,wikipedia2vec", "license": "", "maintainer": "", "maintainer_email": "", "name": "wikipedia2vec-SM002583", "package_url": "https://pypi.org/project/wikipedia2vec-SM002583/", "platform": "", "project_url": "https://pypi.org/project/wikipedia2vec-SM002583/", "project_urls": { "Download": "https://github.com/samikshm/wikipedia2vec_SM002583/archive/0.2.tar.gz", "Homepage": "https://github.com/samikshm/wikipedia2vec_SM002583" }, "release_url": "https://pypi.org/project/wikipedia2vec-SM002583/0.2/", "requires_dist": null, "requires_python": "", "summary": "A tool for learning vector representations of words and entities from Wikipedia", "version": "0.2" }, "last_serial": 5147166, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "db943dd6921813a01f3aa6ae1b96ebd8", "sha256": "9c54da11e2abdc6ac80731b8b9a1dbfee524121fafc01450d822e70fcf8006db" }, "downloads": -1, "filename": "wikipedia2vec_SM002583-0.1.tar.gz", "has_sig": false, "md5_digest": "db943dd6921813a01f3aa6ae1b96ebd8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1158771, "upload_time": "2019-04-15T19:49:47", "url": "https://files.pythonhosted.org/packages/1f/83/182e0fde4146dc85e81b83de701e9b4402956169058a131fb7711a949fd1/wikipedia2vec_SM002583-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "c74209546827eb819e22019b3b66baeb", "sha256": "5eb45c6b294d24653802d65e0ac2006942a77e6b2b34b77e156e679ea65cfddc" }, "downloads": -1, "filename": "wikipedia2vec_SM002583-0.2.tar.gz", "has_sig": false, "md5_digest": "c74209546827eb819e22019b3b66baeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1158772, "upload_time": "2019-04-15T23:12:13", "url": "https://files.pythonhosted.org/packages/63/04/fafc781b7a119bda987908bfa956bdc800e06e4523a99b5995acd86f4c8e/wikipedia2vec_SM002583-0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c74209546827eb819e22019b3b66baeb", "sha256": "5eb45c6b294d24653802d65e0ac2006942a77e6b2b34b77e156e679ea65cfddc" }, "downloads": -1, "filename": "wikipedia2vec_SM002583-0.2.tar.gz", "has_sig": false, "md5_digest": "c74209546827eb819e22019b3b66baeb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1158772, "upload_time": "2019-04-15T23:12:13", "url": "https://files.pythonhosted.org/packages/63/04/fafc781b7a119bda987908bfa956bdc800e06e4523a99b5995acd86f4c8e/wikipedia2vec_SM002583-0.2.tar.gz" } ] }