{ "info": { "author": "B-Souty", "author_email": "benjamin.souty@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6" ], "description": "\n#### \u26a0Warning: This script is not ready for production use.\u26a0\n*Not all tables are parseable yet. Please refer to the \"Capabilities\" section for a list of supported table types.*\n\n# Html2Dict\n\nSimple html tables extractor.\n\n## Prerequisite\n\n* Python 3.6+\n* Python module:\n * [lxml](https://lxml.de/)\n * [requests](http://docs.python-requests.org/en/master/)\n\n## Installing\n\nCreate and activate a new Python virtual environment then install this dev branch with: \n * `pip3 install git+https://github.com/B-Souty/html2dict@wip/issue2/main` \n\n## Capabilities\n\nList of table types currently supported:\n * Basic table without headers. \n * Basic table with headers.\n * Complex tables with merged headers.\n\nList of table types **not** currently supported:\n * Any tables embedded in iframes.\n * Tables with vertical headers (scope=\u201ccol\u201d)\n * Tables with new header row after first set of data.\n * Tables with merged tables accross multiple levels\n\nThis project is still very new, if the type of table you are parsing is not in this list, please let me know the outcome.\n\n## Usage\n\nStart by importing the desired type of extractor. (Only one available currently). \n```Python\nfrom html2dict.extractors import BasicTableExtractor\n``` \n\nThen instantiate an object with one of the 3 constructors provided\n```python\nmy_extractor = BasicTableExtractor.from_html_string(html_string=)\n\n# or \n\nmy_extractor = BasicTableExtractor.from_html_file(html_file=)\n\n# or\n\nmy_extractor = BasicTableExtractor.from_url(url=)\n``` \n\nYou can access the extracted tables from the basic_tables attribute.\n\n```python\nmy_extractor.basic_tables\n```\n\nFinally, the data of the table can be accessed from the attributes data_rows or rows.\n\n```python\nmy_extractor.basic_tables[].rows\n```\n\n## Examples\n\n* for https://www.python.org/downloads/release/python-370/\n\n```python\nmy_extractor = BasicTableExtractor.from_url(url=\"https://www.python.org/downloads/release/python-370/\")\nmy_extractor.basic_tables\n\n{'table_0': }\n\npprint(my_extractor.basic_tables['table_0'].rows)\n\n{'data': [{'Description': 'n/a',\n 'File Size': '22745726',\n 'GPG': 'SIG',\n 'MD5 Sum': '41b6595deb4147a1ed517a7d9a580271',\n 'Operating System': 'Source release',\n 'Version': 'Gzipped source tarball'},\n {'Description': 'n/a',\n 'File Size': '16922100',\n 'GPG': 'SIG',\n 'MD5 Sum': 'eb8c2a6b1447d50813c02714af4681f3',\n 'Operating System': 'Source release',\n 'Version': 'XZ compressed source tarball'},\n {'Description': 'for Mac OS X 10.6 and later',\n 'File Size': '34274481',\n 'GPG': 'SIG',\n 'MD5 Sum': 'ca3eb84092d0ff6d02e42f63a734338e',\n 'Operating System': 'Mac OS X',\n 'Version': 'macOS 64-bit/32-bit installer'},\n {'Description': 'for OS X 10.9 and later',\n 'File Size': '27651276',\n 'GPG': 'SIG',\n 'MD5 Sum': 'ae0717a02efea3b0eb34aadc680dc498',\n 'Operating System': 'Mac OS X',\n 'Version': 'macOS 64-bit installer'},\n {'Description': 'n/a',\n 'File Size': '8547689',\n 'GPG': 'SIG',\n 'MD5 Sum': '46562af86c2049dd0cc7680348180dca',\n 'Operating System': 'Windows',\n 'Version': 'Windows help file'},\n {'Description': 'for AMD64/EM64T/x64',\n 'File Size': '6946082',\n 'GPG': 'SIG',\n 'MD5 Sum': 'cb8b4f0d979a36258f73ed541def10a5',\n 'Operating System': 'Windows',\n 'Version': 'Windows x86-64 embeddable zip file'},\n {'Description': 'for AMD64/EM64T/x64',\n 'File Size': '26262280',\n 'GPG': 'SIG',\n 'MD5 Sum': '531c3fc821ce0a4107b6d2c6a129be3e',\n 'Operating System': 'Windows',\n 'Version': 'Windows x86-64 executable installer'},\n {'Description': 'for AMD64/EM64T/x64',\n 'File Size': '1327160',\n 'GPG': 'SIG',\n 'MD5 Sum': '3cfdaf4c8d3b0475aaec12ba402d04d2',\n 'Operating System': 'Windows',\n 'Version': 'Windows x86-64 web-based installer'},\n {'Description': 'n/a',\n 'File Size': '6395982',\n 'GPG': 'SIG',\n 'MD5 Sum': 'ed9a1c028c1e99f5323b9c20723d7d6f',\n 'Operating System': 'Windows',\n 'Version': 'Windows x86 embeddable zip file'},\n {'Description': 'n/a',\n 'File Size': '25506832',\n 'GPG': 'SIG',\n 'MD5 Sum': 'ebb6444c284c1447e902e87381afeff0',\n 'Operating System': 'Windows',\n 'Version': 'Windows x86 executable installer'},\n {'Description': 'n/a',\n 'File Size': '1298280',\n 'GPG': 'SIG',\n 'MD5 Sum': '779c4085464eb3ee5b1a4fffd0eabca4',\n 'Operating System': 'Windows',\n 'Version': 'Windows x86 web-based installer'}],\n 'headers': [['Version',\n 'Operating System',\n 'Description',\n 'MD5 Sum',\n 'File Size',\n 'GPG']]}\n\n```\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/B-Souty/html2dict", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "html2dict", "package_url": "https://pypi.org/project/html2dict/", "platform": "", "project_url": "https://pypi.org/project/html2dict/", "project_urls": { "Homepage": "https://github.com/B-Souty/html2dict" }, "release_url": "https://pypi.org/project/html2dict/0.2/", "requires_dist": [ "lxml", "requests" ], "requires_python": ">=3.6.0", "summary": "Simple html tables extractor.", "version": "0.2" }, "last_serial": 4183640, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "4cde9a5c13f85836a6c96215dfd0bc05", "sha256": "00b89a3799b3c03eedc0f1bb176787f6eb4413062e61d30317350c4ba2904db6" }, "downloads": -1, "filename": "html2dict-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4cde9a5c13f85836a6c96215dfd0bc05", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 3616, "upload_time": "2018-07-22T23:00:03", "url": "https://files.pythonhosted.org/packages/63/f0/b99d1146a1c8ca650f2b53ba8df9e3ed526f5a968f70f26b24818a46040f/html2dict-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3a7dbd9f2afac92a09fc75b9f45fd5ee", "sha256": "63494c97dc9de1706b226e0926be0100a4fc88a0f001bc3f6a93d99fd7388a11" }, "downloads": -1, "filename": "html2dict-0.1.tar.gz", "has_sig": false, "md5_digest": "3a7dbd9f2afac92a09fc75b9f45fd5ee", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 3558, "upload_time": "2018-07-22T23:00:04", "url": "https://files.pythonhosted.org/packages/45/6b/6aad6645271327ba9a453028e291349846b8787d9b19837164c967b681c4/html2dict-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "6d3988af6b7a8b61f9566e5853994bdf", "sha256": "9c0114cc7b8f74dc9bdc4fa1bc7997e6d1f66c5bb817c751a152213369b38cc1" }, "downloads": -1, "filename": "html2dict-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "6d3988af6b7a8b61f9566e5853994bdf", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 3670, "upload_time": "2018-07-23T13:35:14", "url": "https://files.pythonhosted.org/packages/29/98/da740cd81e7f6b431ea22d9afa7f2efc78e0c7475a5e846d53550fbed942/html2dict-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0b153f10897c594028aeb777948083a2", "sha256": "06a79cef9d638a40ed6718a056eb1a39af992cf75ea1744c3023a1003af22bb5" }, "downloads": -1, "filename": "html2dict-0.1.1.tar.gz", "has_sig": false, "md5_digest": "0b153f10897c594028aeb777948083a2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 3588, "upload_time": "2018-07-23T13:35:15", "url": "https://files.pythonhosted.org/packages/b0/74/b1e23037a7a1558bf94684812b8df29e47fae3c0b0b9fbb9f532c7328862/html2dict-0.1.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "e5c059516149e84356e8fdcd53e16fd8", "sha256": "f693cdde39898abb3f8fdec09957774ff8dfcba0471cb5e4649e25a547d6c028" }, "downloads": -1, "filename": "html2dict-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e5c059516149e84356e8fdcd53e16fd8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 8679, "upload_time": "2018-08-18T18:48:09", "url": "https://files.pythonhosted.org/packages/5c/12/a9ffbdc855dc92dede2fb889ce01ae21fb2127769b26c839a1a7ffaa085d/html2dict-0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4182a6b5dc8ba13e809b4fbf927377b2", "sha256": "8c76e36ab53ab3f042ebf111f8dbfd3c6ddd087bd72108284619e158020e8e0d" }, "downloads": -1, "filename": "html2dict-0.2.tar.gz", "has_sig": false, "md5_digest": "4182a6b5dc8ba13e809b4fbf927377b2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 6164, "upload_time": "2018-08-18T18:48:11", "url": "https://files.pythonhosted.org/packages/14/98/d0de4ad52fa9f63fb7623a6e9dcc5f06a6f7a743d202ad12aff95f989fee/html2dict-0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e5c059516149e84356e8fdcd53e16fd8", "sha256": "f693cdde39898abb3f8fdec09957774ff8dfcba0471cb5e4649e25a547d6c028" }, "downloads": -1, "filename": "html2dict-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e5c059516149e84356e8fdcd53e16fd8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 8679, "upload_time": "2018-08-18T18:48:09", "url": "https://files.pythonhosted.org/packages/5c/12/a9ffbdc855dc92dede2fb889ce01ae21fb2127769b26c839a1a7ffaa085d/html2dict-0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4182a6b5dc8ba13e809b4fbf927377b2", "sha256": "8c76e36ab53ab3f042ebf111f8dbfd3c6ddd087bd72108284619e158020e8e0d" }, "downloads": -1, "filename": "html2dict-0.2.tar.gz", "has_sig": false, "md5_digest": "4182a6b5dc8ba13e809b4fbf927377b2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 6164, "upload_time": "2018-08-18T18:48:11", "url": "https://files.pythonhosted.org/packages/14/98/d0de4ad52fa9f63fb7623a6e9dcc5f06a6f7a743d202ad12aff95f989fee/html2dict-0.2.tar.gz" } ] }