{ "info": { "author": "Jan-Christoph Klie", "author_email": "git@mrklie.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3 :: Only", "Topic :: Internet :: WWW/HTTP :: Dynamic Content :: Wiki", "Topic :: Scientific/Engineering :: Human Machine Interfaces", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries", "Topic :: Text Processing :: Linguistic" ], "description": "\nwikimapper\n==========\n\n.. image:: https://travis-ci.org/jcklie/wikimapper.svg?branch=master\n :target: https://travis-ci.org/jcklie/wikimapper\n\n.. image:: https://codecov.io/gh/jcklie/wikimapper/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/jcklie/wikimapper\n\n.. image:: https://img.shields.io/pypi/l/wikimapper.svg\n :alt: PyPI - License\n :target: https://pypi.org/project/wikimapper/\n\n.. image:: https://img.shields.io/pypi/pyversions/wikimapper.svg\n :alt: PyPI - Python Version\n :target: https://pypi.org/project/wikimapper/\n\n.. image:: https://img.shields.io/pypi/v/wikimapper.svg\n :alt: PyPI\n :target: https://pypi.org/project/wikimapper/\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n :target: https://github.com/ambv/black \n\nThis small Python library helps you to map Wikipedia page titles (e.g. `Manatee\n`_ to `Q42797 `_)\nand vice versa. This is done by creating an index of these mappings from a Wikipedia SQL dump.\nPrecomputed indices can be found under `Precomputed indices`_. Redirects are taken into account.\n\nInstallation\n------------\n\nThis package can be installed via ``pip``, the Python package manager.\n\n.. code:: bash\n\n pip install wikimapper\n\nIf all you want is just mapping, then you can also just download ``wikimapper/mapper.py`` and\nadd it to your project. It does not have any external dependencies.\n\nUsage\n-----\n\nUsing the mapping functionality requires a precomputed index. It is created from Wikipedia\nSQL dumps (see `Create your own index`_) or can be downloaded for certain languages\n(see `Precomputed indices`_). For the following to work, it is assumed that an index either\nhas been created or downloaded. Using the command line for batch mapping is not recommended,\nas it requires repeated opening and closing the database, leading to a speed penalty.\n\nMap Wikipedia page title to Wikidata id\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n from wikimapper import WikiMapper\n\n mapper = WikiMapper(\"index_enwiki-latest.db\")\n wikidata_id = mapper.title_to_id(\"Python_(programming_language)\")\n print(wikidata_id) # Q28865\n\nor from the command line via\n\n.. code:: bash\n\n $ wikimapper title2id index_enwiki-latest.db Germany\n Q183\n\nMap Wikipedia URL to Wikidata id\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n from wikimapper import WikiMapper\n\n mapper = WikiMapper(\"index_enwiki-latest.db\")\n wikidata_id = mapper.url_to_id(\"https://en.wikipedia.org/wiki/Python_(programming_language)\")\n print(wikidata_id) # Q28865\n\nor from the command line via\n\n.. code:: bash\n\n $ wikimapper url2id index_enwiki-latest.db https://en.wikipedia.org/wiki/Germany\n Q183\n\nIt is not checked whether the URL origins from the same Wiki as the index you created!\n\nMap Wikidata id to Wikipedia page title\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n from wikimapper import WikiMapper\n\n mapper = WikiMapper(\"index_enwiki-latest.db\")\n titles = mapper.id_to_titles(\"Q183\")\n print(titles) # Germany, Deutschland, ...\n\nor from the command line via\n\n.. code:: bash\n\n $ wikimapper id2titles data/index_enwiki-latest.db Q183\n Germany\n Bundesrepublik_Deutschland\n Land_der_Dichter_und_Denker\n Jerman\n ...\n\nMapping id to title can lead to more than one result, as some pages in Wikipedia are\nredirects, all linking to the same Wikidata item.\n\nCreate your own index\n~~~~~~~~~~~~~~~~~~~~~\n\nWhile some indices are precomupted, it is sometimes useful to create your own. The\nfollowing section describes the steps need. Regarding creation speed: The index creation\ncode works, but is not optimized. It takes around 10 minutes on my Notebook (T480s)\nto create it for English Wikipedia if the data is already downloaded.\n\n**1. Download the data**\n\nThe easiest way is to use the command line tool that ships with this package. It\ncan be e.g. invoked by\n\n.. code:: bash\n\n $ wikimapper download enwiki-latest --dir data\n\nUse ``wikimapper download --help`` for a full description of the tool.\n\nThe abbreviation for the Wiki of your choice can be found on `Wikipedia\n`_. Available SQL dumps can be\ne.g. found on `Wikimedia `_, you need to suffix\nthe Wiki name, e.g. ``https://dumps.wikimedia.org/dewiki/`` for the German one.\nIf possible, use a different mirror than the default in order to spread the resource usage.\n\n**2. Create the index**\n\nThe next step is to create an index from the downloaded dump. The easiest way is to use\nthe command line tool that ships with this package. It can be e.g. invoked by\n\n.. code:: bash\n\n $ wikimapper create enwiki-latest --dumpdir data --target data/index_enwiki-latest.db\n\nThis creates an index for the previously downloaded dump and saves it in ``data/index_enwiki-latest.db``.\nUse ``wikimapper create --help`` for a full description of the tool.\n\nPrecomputed indices\n-------------------\n\n.. _precomputed:\n\nSeveral precomputed indices can be found `here `_ .\n\nCommand line interface\n----------------------\n\nThis package comes with a command line interface that is automatically available\nwhen installing via ``pip``. It can be invoked by ``wikimapper`` from the command\nline.\n\n::\n\n $ wikimapper\n\n usage: wikimapper [-h] [--version]\n {download,create,title2id,url2id,id2titles} ...\n\n Map Wikipedia page titles to Wikidata IDs and vice versa.\n\n positional arguments:\n {download,create,title2id,url2id,id2titles}\n sub-command help\n download Download Wikipedia dumps for creating a custom index.\n create Use a previously downloaded Wikipedia dump to create a\n custom index.\n title2id Map a Wikipedia title to a Wikidata ID.\n url2id Map a Wikipedia URL to a Wikidata ID.\n id2titles Map a Wikidata ID to one or more Wikipedia titles.\n\n optional arguments:\n -h, --help show this help message and exit\n --version show program's version number and exit\n\nSee ``wikimapper ${sub-command} --help`` for more information.\n\nDevelopment\n-----------\n\nThe required dependencies are managed by **pip**. A virtual environment\ncontaining all needed packages for development and production can be\ncreated and activated by\n\n::\n\n virtualenv venv --python=python3 --no-site-packages\n source venv/bin/activate\n pip install -e \".[test, dev, doc]\"\n\nThe tests can be run in the current environment by invoking\n\n::\n\n make test\n\nor in a clean environment via\n\n::\n\n tox\n\nFAQ\n---\n\nHow does the parsing of the dump work?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n`jamesmishra `__ has noticed that\nSQL dumps from Wikipedia almost look like CSV. He provides some basic functions\nto parse insert statements into tuples. We then use the Wikipedia SQL page\ndump to get the mapping between title and internal id, page props to get\nthe Wikidata ID for a title and then the redirect dump in order to fill\ntitles that are only redirects and do not have an entry in the page props table.\n\nWhy do you not use the Wikidata SPARQL endpoint for that?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIt is possible to query the official Wikidata SPARQL endpoint to do the mapping:\n\n.. code:: sparql\n\n prefix schema: \n SELECT * WHERE {\n schema:about ?item .\n }\n\nThis has several issues: First, it uses the network, which is slow. Second, I try to use\nthat endpoint as infrequent as possible to save their resources (my use case is to map\ndata sets that have easily tens of thousands of entries). Third, I had coverage issues due\nto redirects in Wikipedia not being resolved (around ~20% of the time for some older data sets).\nSo I created this package to do the mapping offline instead.\n\nAcknowledgements\n----------------\n\nI am very thankful for `jamesmishra `__ to provide\n`mysqldump-to-csv `__ . Also,\n`mbugert `__ helped me tremendously understanding\nWikipedia dumps and giving me the idea on how to map.\n\n\n", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jcklie/wikimapper", "keywords": "wikidata wikipedia wiki kb knowledge-base", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "wikimapper", "package_url": "https://pypi.org/project/wikimapper/", "platform": "", "project_url": "https://pypi.org/project/wikimapper/", "project_urls": { "Bug Tracker": "https://github.com/jcklie/wikimapper/issues", "Documentation": "https://github.com/jcklie/wikimapper", "Homepage": "https://github.com/jcklie/wikimapper", "Source Code": "https://github.com/jcklie/wikimapper" }, "release_url": "https://pypi.org/project/wikimapper/0.1.5/", "requires_dist": [ "black ; extra == 'dev'", "twine ; extra == 'dev'", "pygments ; extra == 'dev'", "wheel ; extra == 'dev'", "tox ; extra == 'test'", "pytest ; extra == 'test'", "codecov ; extra == 'test'", "pytest-cov ; extra == 'test'" ], "requires_python": ">=3.5.0", "summary": "Mapping Wikidata and Wikipedia entities to each other", "version": "0.1.5" }, "last_serial": 5192950, "releases": { "0.1.5": [ { "comment_text": "", "digests": { "md5": "2fef9e0cad13976901b074ae3615eee7", "sha256": "3a1af26a44d4250af2226db858bb07dc9baeaad555c12c4bec5c133d636032a8" }, "downloads": -1, "filename": "wikimapper-0.1.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2fef9e0cad13976901b074ae3615eee7", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5.0", "size": 15562, "upload_time": "2019-04-26T13:34:51", "url": "https://files.pythonhosted.org/packages/89/74/7f645ca670c2dbf813c1ccb179fb55e79a96bd73beb067476bb025342c04/wikimapper-0.1.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4fa14e3ba23f1722412805d6c938cf5b", "sha256": "073cf14fa82acbf2f7ae7538c77e501399380b3758f9feaff2128aaf4c9c8930" }, "downloads": -1, "filename": "wikimapper-0.1.5.tar.gz", "has_sig": false, "md5_digest": "4fa14e3ba23f1722412805d6c938cf5b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 17919, "upload_time": "2019-04-26T13:34:54", "url": "https://files.pythonhosted.org/packages/b3/04/9c487c543c3e8f603f4cfb751f0389d6feccfc2aab51d320b376b0171679/wikimapper-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2fef9e0cad13976901b074ae3615eee7", "sha256": "3a1af26a44d4250af2226db858bb07dc9baeaad555c12c4bec5c133d636032a8" }, "downloads": -1, "filename": "wikimapper-0.1.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2fef9e0cad13976901b074ae3615eee7", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.5.0", "size": 15562, "upload_time": "2019-04-26T13:34:51", "url": "https://files.pythonhosted.org/packages/89/74/7f645ca670c2dbf813c1ccb179fb55e79a96bd73beb067476bb025342c04/wikimapper-0.1.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4fa14e3ba23f1722412805d6c938cf5b", "sha256": "073cf14fa82acbf2f7ae7538c77e501399380b3758f9feaff2128aaf4c9c8930" }, "downloads": -1, "filename": "wikimapper-0.1.5.tar.gz", "has_sig": false, "md5_digest": "4fa14e3ba23f1722412805d6c938cf5b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 17919, "upload_time": "2019-04-26T13:34:54", "url": "https://files.pythonhosted.org/packages/b3/04/9c487c543c3e8f603f4cfb751f0389d6feccfc2aab51d320b376b0171679/wikimapper-0.1.5.tar.gz" } ] }