{ "info": { "author": "Joshua Hruzik", "author_email": "joshua.hruzik@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Text Processing" ], "description": "# pybundestag\nSince the 19th plenary period of the German Bundestag, all protocols are published as highly structured xml files. While all former protocols are also available as xml files, there is very little structure in them, making them nearly equivalent to unstructured txt files. \n\nAlso, the Bundestag publishes information on all current and former members (MdB) in a highly structured XML file, containing personal information, a short CV, and period specific infos.\n\nThis Python CLI tool is designed to convert protocols and the file on all MdBs into single csv or json files. This way, data scientists can use the output for further processing.\n\n### Installation\nYou can install pybundestag via pip.\n\n```bash\npip install pybundestag\n```\n\nAfter installation has finished, you have the choice to use pybundestag as a CLI tool or input its parsers into one of your scripts.\n\n### Usage\n\nYou can use a consistent syntax within pybundestag for converting both protocols and the MdB XML file. The general syntax in CLI mode is like this:\n\n```bash\npybundestag entity input output\n```\n\nentity can either be 'protocol' or 'mdb', depending on the task at hand. If you want to parse protocols, choose 'protocol'. If you want to convert the MdB XML file, use 'mdb'. 'input' should be the path of your protocol(s) or the MdB XML file, while output is the desired path of your output. When parsing Bundestag protocols, you can also specify a folder and pybundestag will parse all XML files in that folder. Both 'protocol' and 'mdb' support several optional arguments.\n\n#### Parsing of Protcols\nMake sure that you download all the protocols you want to convert as XML files. Keep in mind that you can convert only those documents that were created during or after the 19th parliamentary period. If you want to convert multiple protocols at once, put them into a single folder.\nAfter you've downloaded all the XML files that you would like to convert, open you operating system's command line tool and go into where pybundestag resides.\n\nYou can use the CLI interface as discussed above. However, you can also use the following optional arguments:\n* -m [--meta]: If present, pybundestag will add meta information to every speech (Date, Location, Plenary Period, and Plenary Session).\n* -s [--seperator]: A custom seperator for your csv file (defaults to \",\"). Make sure that you put quotation marks around your seperator.\n\nAssume that you want to convert a single file in */home/MaxMustermann/rede.xml* and you want to convert it into a csv file under */home/MaxMustermann/output.csv* without meta data and using the default seperator. You can use pybundestag like so:\n\n```bash\npybundestag protocol /home/MaxMustermann/rede.xml /home/MaxMustermann/output.csv\n```\n\nAfter the script is done, you will get a message that your output was written to the desired path.\n\nIf we assume that there are multiple XML files under */home/MaxMustermann/reden/* and you want to convert those files into a single csv file with meta data and a semicolon as a seperator, you can invoke pybundestag like so:\n\n```bash\npybundestag protocol /home/MaxMustermann/reden/ /home/MaxMustermann/output.csv -m -s \";\"\n```\n\nYou will see the current progress of the program printed to the screen and you will receive a message that your output was written to the desired path. The resulting csv file will contain the speeches' unique Id, Date, Faction of speaker, Location, Parliamentary Period, Role of Speaker, Session, Name of Speaker, ID of Speaker, and the raw text (stripped of comments).\n\nOf course, you can change the name of your output file from *output.csv* to *output.json* if you prefer to write to a json file. Note, that your choice of a seperator will be ignored then.\n\n#### Parsing MdB XML\nIf you would like to convert the XML file containing information on all MdBs, you can use 'mdb' as the first postional argument to pybundestag. The input must be a XML file as provided by the Bundestag. Just as you would do when converting protocols, specify your output as either a csv or json file. \n\nThere are also some optional arguments you can use:\n* -p [--period]: You can specify a certain parliamentary period. pybundestag will only convert and export members of that parliamentary period and add information specific to that period (e.g. type of mandate and electoral district). This should be an integer.\n* -i [--institutions]: Here you can insert multiple institution names (e.g. committees or factions). pybundestag will check if the MdB was also a member of that institution. If there is more than one institution to check, you must seperate them by using a \";\". Note that you must also specify a period if you use this option. In your final result, you will get a boolean variable for every institution you put in. True will imply that the MdB was part of that institution during the period specified. False would imply otherwise. There is no sanity check, so make sure that your spelling is correct.\n\nIf you simply want to convert all MdBs in */home/MaxMustermann/mdbs.xml* to */home/MaxMustermann/mdbs.csv*, you would use pybundestag like so:\n```bash\npybundestag mdb /home/MaxMustermann/mdbs.xml /home/MaxMustermann/mdbs.csv\n```\n\nThis will yield a csv file containing personal information (e.g. name, and date of birth, gender, etc.) as well as a short cv and a unique ID.\n\nA more complex example would be if you want to extract all MdBs of the 19th parliamentary period to a json file, while also checking if a given MdB was member of the 'Verteidigungsausschuss' and/or 'Ausschuss f\u00fcr Arbeit und Soziales':\n\n```Bash\npybundestag mdb /home/MaxMustermann/mdbs.xml /home/MaxMustermann/mdbs.json -p 19 -i \"Verteidigungsausschuss;Ausschuss f\u00fcr Arbeit und Soziales\"\n```\n\nThis file will not only contain personal information, a unique ID, and a short CV but also period specific information and two dummy variables 'member_Verteidigungsausschuss' and 'member_Ausschuss f\u00fcr Arbeit und Soziales'.\n\n### Links\nYou can find all the protocols of the German Bundestag and data on all MdBs (former and current) as XML files at the [official website](https://www.bundestag.de/services/opendata).\nThere is a GitHub organization centered around the German Bundestag called [bundestag](https://github.com/bundestag). There you can find many more repositories for Python and other languages.\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Jhruzik/pybundestag", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pybundestag", "package_url": "https://pypi.org/project/pybundestag/", "platform": "", "project_url": "https://pypi.org/project/pybundestag/", "project_urls": { "Homepage": "https://github.com/Jhruzik/pybundestag" }, "release_url": "https://pypi.org/project/pybundestag/0.0.21/", "requires_dist": [ "pandas", "bs4", "lxml" ], "requires_python": "", "summary": "Package to parse Bundestag data", "version": "0.0.21" }, "last_serial": 5337750, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "62e4a274ac161c7dd0b0ee475d583d5b", "sha256": "7282f6ccfc46d311c26cb5639c80ec5bbdc30aeb5428a5e2df460ffe80c85189" }, "downloads": -1, "filename": "pybundestag-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "62e4a274ac161c7dd0b0ee475d583d5b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13090, "upload_time": "2019-05-25T17:38:07", "url": "https://files.pythonhosted.org/packages/df/66/bf2f717d52c6b1e89ab2178276582fb58c4554b264695a2ab0b10fed9a39/pybundestag-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e5b7b51346f907802204bfb2a9b62169", "sha256": "223312e6373a015ce15ccd9874fb201496d1036a3dd63258cc3ac7ea20afec0e" }, "downloads": -1, "filename": "pybundestag-0.0.2.tar.gz", "has_sig": false, "md5_digest": "e5b7b51346f907802204bfb2a9b62169", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12277, "upload_time": "2019-05-25T17:38:09", "url": "https://files.pythonhosted.org/packages/09/66/7ebdba71273a4950248a02129a861fb1c31b72c7aaaec823ec96393a132e/pybundestag-0.0.2.tar.gz" } ], "0.0.21": [ { "comment_text": "", "digests": { "md5": "dc176aac880e06dc65fce68e09636ced", "sha256": "2fc0069b6d1ed306e88f33f365a9978a35fa0345c2e1cee00e1eda0ed5f0c1e0" }, "downloads": -1, "filename": "pybundestag-0.0.21-py3-none-any.whl", "has_sig": false, "md5_digest": "dc176aac880e06dc65fce68e09636ced", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14125, "upload_time": "2019-05-30T15:02:14", "url": "https://files.pythonhosted.org/packages/fe/30/c8a5764201d9a7a6a87f1367f94fb3b91e8c500b0c91bf9ca0b430f64d7b/pybundestag-0.0.21-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1d3c936d34b79edace2336707912fcef", "sha256": "962cfa03b459754e6ad06bada5e9558c26dd4680035fa5f3dad1b9a7cf0f663e" }, "downloads": -1, "filename": "pybundestag-0.0.21.tar.gz", "has_sig": false, "md5_digest": "1d3c936d34b79edace2336707912fcef", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12823, "upload_time": "2019-05-30T15:02:15", "url": "https://files.pythonhosted.org/packages/e9/4e/1c45b340e4f60488673103651b52c90adff3cbcb3e37c1d5010a4777625a/pybundestag-0.0.21.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dc176aac880e06dc65fce68e09636ced", "sha256": "2fc0069b6d1ed306e88f33f365a9978a35fa0345c2e1cee00e1eda0ed5f0c1e0" }, "downloads": -1, "filename": "pybundestag-0.0.21-py3-none-any.whl", "has_sig": false, "md5_digest": "dc176aac880e06dc65fce68e09636ced", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14125, "upload_time": "2019-05-30T15:02:14", "url": "https://files.pythonhosted.org/packages/fe/30/c8a5764201d9a7a6a87f1367f94fb3b91e8c500b0c91bf9ca0b430f64d7b/pybundestag-0.0.21-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1d3c936d34b79edace2336707912fcef", "sha256": "962cfa03b459754e6ad06bada5e9558c26dd4680035fa5f3dad1b9a7cf0f663e" }, "downloads": -1, "filename": "pybundestag-0.0.21.tar.gz", "has_sig": false, "md5_digest": "1d3c936d34b79edace2336707912fcef", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12823, "upload_time": "2019-05-30T15:02:15", "url": "https://files.pythonhosted.org/packages/e9/4e/1c45b340e4f60488673103651b52c90adff3cbcb3e37c1d5010a4777625a/pybundestag-0.0.21.tar.gz" } ] }