{ "info": { "author": "Erik Rose, Peter Potrowl", "author_email": "grinch@grinchcentral.com, peter.potrowl@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python", "Topic :: Software Development", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing" ], "description": ".. image:: https://travis-ci.org/peter17/mediawiki-parser.svg?branch=master\n :alt: Build Status\n :target: https://travis-ci.org/peter17/mediawiki-parser\n\nPresentation\n============\n\nThis is a parser for MediaWiki's (MW) syntax. It's goal is to transform wikitext into an abstract syntax tree (AST) and then render this AST into various formats such as plain text and HTML.\n\nIt is an original work by Peter Potrowl and his mentor Erik Rose achieved during the Google Summer of Code 2011.\n\n\nRequirements\n============\n\nThis parser relies on Pijnu. You must install the latest version of Pijnu, available at: https://github.com/peter17/pijnu\n\nDo not use the version from http://spir.wikidot.com, which is outdated.\n\nFor basic and simple installation, just try:\n\n::\n\n pip install mediawiki-parser\n\nHow it works\n============\n\nTwo files, preprocessor.pijnu and mediawiki.pijnu describe the MW syntax using patterns that form a grammar. Another Python tool called Pijnu will interpret those grammars and use them to match the wikitext content and build the AST.\n\nThen, specific Python functions will render the leaves of the AST into the wanted format.\n\nThe reason why we use two grammars is that we will first substitute the templates in the wikitext with a preprocessor before actually parsing the content of the page.\n\nBuilding the parsers\n====================\n\nThe preprocessor and mediwiki parsers must be built from the Pijnu\ngrammars before you can use mediawiki-parser. You can build them through\nsetup.py, possibly setting the PYTHONPATH to point at pijnu:\n\n::\n\n cd /PATH/TO/mediawiki-parser/\n env PYTHONPATH=/PATH/TO/pijnu python setup.py build_parsers\n\nHow to test\n===========\n\nThe current simplest way to test the tool is to put wikitext inside the wikitext.txt file. Then, run:\n\n::\n\n python parser.py\n\nand the wikitext will be rendered as HTML in the article.htm file.\n\nOther ways might be implemented in the future.\n\nUnit tests\n----------\n\nInstall nose and run:\n\n::\n\n cd /PATH/TO/mediawiki-parser/\n env PYTHONPATH=/PATH/TO/pijnu/ nosetests tests\n\nHow to use in a program\n=======================\n\nExample for HTML\n----------------\nIn order to use this tool to render wikitext into HTML in a Python program, you can use the following lines:\n\n::\n\n templates = {}\n allowed_tags = []\n allowed_self_closing_tags = []\n allowed_attributes = []\n interwiki = {}\n namespaces = {}\n\n from mediawiki_parser.preprocessor import make_parser\n preprocessor = make_parser(templates)\n\n from mediawiki_parser.html import make_parser\n parser = make_parser(allowed_tags, allowed_self_closing_tags, allowed_attributes, interwiki, namespaces)\n\n preprocessed_text = preprocessor.parse(source)\n output = parser.parse(preprocessed_text.leaves())\n\nThe `output` string will contain the rendered HTML. You should describe the behavior you expect by filling the variables of the first lines:\n * if the wikitext calls foreign templates, put their names and content in the `templates` dict (e.g.: `{'my template': 'my template content'}`)\n * if some HTML tags are allowed on your wiki, list them in the `allowed_tags` list (e.g.: `['center', 'big', 'small', 'span']`; avoid `'script'` and some others, for security reasons)\n * if some self-closing HTML tags are allowed on your wiki, list them in the `allowed_self_closing_tags` list (e.g.: `['br', 'hr']`; avoid `'script'` and some others, for security reasons)\n * if some HTML tags are allowed on your wiki, list the attributes they can use the `allowed_attributes` list (e.g.: `['style', 'class']`; avoid `'onclick'` and some others, for security reasons)\n * if you want to be able to use interwiki links, list the foreign wikis in the `interwiki` dict (e.g.: `{'fr': 'http://fr.wikipedia.org/wiki/'}`)\n * if you want to be able to distinguish between standard links, file inclusions or categories, list the namespaces of your wiki in the `namespaces` dict (e.g.: `{'Template': 10, 'Category': 14, 'File': 6}` where the numbers are the namespace codes used in MW)\n\nExample for text\n----------------\nIn order to use this tool to render wikitext into text in a Python program, you can use the following lines:\n\n::\n\n templates = {}\n\n from mediawiki_parser.preprocessor import make_parser\n preprocessor = make_parser(templates)\n\n from mediawiki_parser.text import make_parser\n parser = make_parser()\n\n preprocessed_text = preprocessor.parse(source)\n output = parser.parse(preprocessed_text.leaves())\n\nThe `output` string will contain the rendered text.\nIf the wikitext calls foreign templates, put their names and content in the `templates` dict (e.g.: `{'my template': 'my template content'}`)\n\nExample for templates substitution\n----------------------------------\nIf you just want to replace the templates in a given wikitext, you can just call the preprocessor and no rendering postprocessor:\n\n::\n\n templates = {}\n\n from mediawiki_parser.preprocessor import make_parser\n preprocessor = make_parser(templates)\n\n output = preprocessor.parse(source)\n\nThe `output` string will contain the rendered wikitext.\nPut the templates names and content in the `templates` dict (e.g.: `{'my template': 'my template content'}`)\n\nPostprocessors\n--------------\n\nThe parser produces an AST. In order to provide human readable output, three postprocessors are provided:\n * html.py, for HTML output\n * text.py, for text output\n * raw.py, for raw output\n\nFor now, we mainly focused on HTML postprocessor. The text output might not be as cleaned as expected.\n\nYou can adapt them according to your needs.\n\nKnown bugs\n==========\n\nThis tool should be able to render any wikitext page into text or HTML.\n\nHowever, it does not intent to be bug-for-bug compatible with MW. For instance, using HTML entities in template calls (e.g.: `'{{temp©late}}`') is currently not supported.\n\nPlease don't hesitate to report bugs that you may find when using this tool.\n\nSpecial thanks\n==============\n * To Nicholas Burlett for his directory restructure, performance improvements and other fixes", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/peter17/mediawiki-parser", "keywords": "MediaWiki,parser,syntax", "license": "GPL v3", "maintainer": "", "maintainer_email": "", "name": "mediawiki-parser", "package_url": "https://pypi.org/project/mediawiki-parser/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/mediawiki-parser/", "project_urls": { "Homepage": "https://github.com/peter17/mediawiki-parser" }, "release_url": "https://pypi.org/project/mediawiki-parser/0.4.1/", "requires_dist": null, "requires_python": "", "summary": "A parser for the MediaWiki syntax, based on Pijnu.", "version": "0.4.1" }, "last_serial": 2610377, "releases": { "0.3.0": [], "0.3.1": [ { "comment_text": "", "digests": { "md5": "0757194cd0ab902034f835ccd731d0f9", "sha256": "bf3680d12fd13266df3c418903dfbc4b65d1bd4ffabbaf113e9fbb23ca059d94" }, "downloads": -1, "filename": "mediawiki-parser-0.3.1.tar.gz", "has_sig": false, "md5_digest": "0757194cd0ab902034f835ccd731d0f9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30146, "upload_time": "2015-04-29T22:07:34", "url": "https://files.pythonhosted.org/packages/49/21/83d8037f8066a22f859a70c3710d7912e0d5e2ebffe2c09384aaa9636f8f/mediawiki-parser-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "69f03bd501790000d419b4e68c88f507", "sha256": "58a65b6fc9c1526ef7fdf62e6e0b1785860d5d16cf2e8df20a9a073e2c03fd80" }, "downloads": -1, "filename": "mediawiki-parser-0.3.2.tar.gz", "has_sig": false, "md5_digest": "69f03bd501790000d419b4e68c88f507", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34030, "upload_time": "2015-04-30T20:07:07", "url": "https://files.pythonhosted.org/packages/98/b2/88ff0e008234e65e6192caf4de2d52b11c9a54ab35296b8975beef42ca4b/mediawiki-parser-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "10f89dd4883000e3e0ad961d4165a7df", "sha256": "b3519cc653288344af701bcbb195864fdbd68578dda6e06b1b4f028b2684c54b" }, "downloads": -1, "filename": "mediawiki-parser-0.3.3.tar.gz", "has_sig": false, "md5_digest": "10f89dd4883000e3e0ad961d4165a7df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34137, "upload_time": "2015-04-30T20:34:18", "url": "https://files.pythonhosted.org/packages/2f/78/253f477d08cd3cb943208a879586428ea6db8fecbe8bdd3facd102e77dfe/mediawiki-parser-0.3.3.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "815fc7fa8a7096a8c14656a57adc517c", "sha256": "476aa4e33dacb5f276aac7c79a96352aa3a8883d840c9aea0cc8de3ac30dbb87" }, "downloads": -1, "filename": "mediawiki-parser-0.4.0.tar.gz", "has_sig": false, "md5_digest": "815fc7fa8a7096a8c14656a57adc517c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 35136, "upload_time": "2016-07-27T22:01:02", "url": "https://files.pythonhosted.org/packages/a6/b4/5e757443d29bd9c0c757130063ccf9737deedb23a8461024c67de8306479/mediawiki-parser-0.4.0.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "4fb971ec525e5438365949016a5a5b3a", "sha256": "1bb0119fa9a77e4bc72804f93299047d85eb6d25e7aad8fcccb24574b212d0df" }, "downloads": -1, "filename": "mediawiki-parser-0.4.1.tar.gz", "has_sig": false, "md5_digest": "4fb971ec525e5438365949016a5a5b3a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34478, "upload_time": "2017-01-31T22:44:59", "url": "https://files.pythonhosted.org/packages/46/96/84d6cfc99aa21663af8ffccd6618481a0da21a4f18daba51c727f716ad42/mediawiki-parser-0.4.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4fb971ec525e5438365949016a5a5b3a", "sha256": "1bb0119fa9a77e4bc72804f93299047d85eb6d25e7aad8fcccb24574b212d0df" }, "downloads": -1, "filename": "mediawiki-parser-0.4.1.tar.gz", "has_sig": false, "md5_digest": "4fb971ec525e5438365949016a5a5b3a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34478, "upload_time": "2017-01-31T22:44:59", "url": "https://files.pythonhosted.org/packages/46/96/84d6cfc99aa21663af8ffccd6618481a0da21a4f18daba51c727f716ad42/mediawiki-parser-0.4.1.tar.gz" } ] }