{ "info": { "author": "Abel 'Akronix' Serrano Juste", "author_email": "akronix5@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "wiki dump parser\n================\n\nA simple but fast python script that reads the XML dump of a wiki and\noutput the processed data in a CSV file.\n\n`All revisions history of a mediawiki wiki can be backed up as an XML\nfile, known as a XML\ndump. `__\nThis file is a record of all the edits made in a wiki with all the\ncorresponding data regarding date, page, author and the full content\nwithin the edit.\n\nVery often we just want the metadata for the edit regarding date, author\nand page; and therefore, we do not need the content of the edit, which\nby far the longest piece of data.\n\nThis script converts this very long XML dump in csv files much smaller\nand easiest to read and work with. It takes care of\n\nUsage\n-----\n\nInstall the package using pip:\n\n``pip install wiki_dump_parser``\n\nThen, use it directly from command line:\n\n``python -m wiki_dump_parser ``\n\nOr from python code:\n\n.. code:: python\n\n import wiki_dump_parser as parser\n parser.xml_to_csv('dump.xml')\n\nThe output csv files should be loaded using '\\|' as an escape character\nfor quoting string. An example to load the output file \"dump.csv\"\ngenerated by this script using pandas would be:\n\n.. code:: python\n\n df = pd.read_csv('dump.csv', quotechar='|', index_col = False)\n df['timestamp'] = pd.to_datetime(df['timestamp'],format='%Y-%m-%dT%H:%M:%SZ')\n\nDependencies\n------------\n\n- python 3\n\n*Yes, nothing more.*\n\nHow to get a wiki history dump\n------------------------------\n\nThere are several ways to get the wiki dump:\n\n- If you have access to the server, follow the `instructions in the\n mediawiki\n docs `__.\n- For **Wikia wikis** and `many other\n domains `__,\n you can use our in-house developed script made to accomplish this\n task. It is straightforward to use and very fast on it.\n- **Wikimedia project wikis**: For wikis belonging to the Wikimedia\n project, you already have a regular updated repo with all the dumps\n here: http://dumps.wikimedia.org. `Select your target wiki from the\n list `__ and\n download the complete edit history dump and uncompress it.\n- For **other wikis**, like self-hosted wikis, you should use the\n wikiteam's dumpgenerator.py script. You have a simple tutorial `in\n their\n wiki `__.\n Its usage is very straightforward and the script is well maintained.\n Remember to use the --xml option to download the full history dump.", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Grasia/wiki-scripts/tree/master/wiki_dump_parser", "keywords": "wiki dump parser Wikia xml csv pandas proccessing history data", "license": "AGPL-3.0", "maintainer": "", "maintainer_email": "", "name": "wiki-dump-parser", "package_url": "https://pypi.org/project/wiki-dump-parser/", "platform": "", "project_url": "https://pypi.org/project/wiki-dump-parser/", "project_urls": { "Homepage": "https://github.com/Grasia/wiki-scripts/tree/master/wiki_dump_parser" }, "release_url": "https://pypi.org/project/wiki-dump-parser/2.0.1/", "requires_dist": null, "requires_python": ">=3", "summary": "A simple but fast python script that reads the XML dump of a wiki and output the processed data in a CSV file.", "version": "2.0.1" }, "last_serial": 4694724, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "eb06bf5c229e60361d668c08ae41948c", "sha256": "3f1dfadb390ca27000ef877a080a3b9ec2552e8b8d0af3eaa1541f3a230e02c7" }, "downloads": -1, "filename": "wiki_dump_parser-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "eb06bf5c229e60361d668c08ae41948c", "packagetype": "bdist_wheel", "python_version": "3.5", "requires_python": null, "size": 15141, "upload_time": "2018-10-17T11:26:25", "url": "https://files.pythonhosted.org/packages/b7/25/266fa6407c83131f931922ca963d096d47861f3f5517f3fa717f3a26b19c/wiki_dump_parser-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5239fa63715ca2ff1770d78c27696947", "sha256": "f516738ad605b44495cacb31a36ec79c99e4c91c3051d4994713ddc73a5a537c" }, "downloads": -1, "filename": "wiki_dump_parser-1.0.0.tar.gz", "has_sig": false, "md5_digest": "5239fa63715ca2ff1770d78c27696947", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2620, "upload_time": "2018-10-17T11:26:27", "url": "https://files.pythonhosted.org/packages/a2/6c/d688294c355f96bcb365d12b1b32e758ff74d4ceeb591542cccb8d22a5e8/wiki_dump_parser-1.0.0.tar.gz" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "e5cfa3e037c838650554b7576c4dbc71", "sha256": "1cf7885408b7032510b79210271922d9e5bde357361fa86be0f84b50c33e7e43" }, "downloads": -1, "filename": "wiki_dump_parser-2.0.0.tar.gz", "has_sig": false, "md5_digest": "e5cfa3e037c838650554b7576c4dbc71", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4121, "upload_time": "2019-01-14T15:28:56", "url": "https://files.pythonhosted.org/packages/79/a9/e26a0e1077d641c0f7cf3018d1d5d4ce17657822b5ca1485e05539189527/wiki_dump_parser-2.0.0.tar.gz" } ], "2.0.1": [ { "comment_text": "", "digests": { "md5": "b3b982c2e2665e217d409c50fe5cbd34", "sha256": "05d6a6e2af0d7faf57b4d69f7155a6028d24991a80361b22c8f8848b580842f7" }, "downloads": -1, "filename": "wiki_dump_parser-2.0.1.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "b3b982c2e2665e217d409c50fe5cbd34", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 5628, "upload_time": "2019-01-14T15:42:27", "url": "https://files.pythonhosted.org/packages/79/ff/e06e6bfa775e6e2cffe1945ac85fbca22bf9c0c177cabd39109a9a5c11b6/wiki_dump_parser-2.0.1.linux-x86_64.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b3b982c2e2665e217d409c50fe5cbd34", "sha256": "05d6a6e2af0d7faf57b4d69f7155a6028d24991a80361b22c8f8848b580842f7" }, "downloads": -1, "filename": "wiki_dump_parser-2.0.1.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "b3b982c2e2665e217d409c50fe5cbd34", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 5628, "upload_time": "2019-01-14T15:42:27", "url": "https://files.pythonhosted.org/packages/79/ff/e06e6bfa775e6e2cffe1945ac85fbca22bf9c0c177cabd39109a9a5c11b6/wiki_dump_parser-2.0.1.linux-x86_64.tar.gz" } ] }