{
"info": {
"author": "Justin Li",
"author_email": "yuanxu.lee@gmail.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 2",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.5"
],
"description": "# HTML Table Extractor\n[](https://travis-ci.org/yuanxu-li/html-table-extractor)\n\n_HTML Table Extractor is a python library that uses [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) to extract data from complicated and messy html table_\n\n## Important links\n* Repository: https://github.com/yuanxu-li/html-table-extractor\n* Issues: https://github.com/yuanxu-li/html-table-extractor/issues\n\n## Installation\n\n```bash\npip install 'beautifulsoup4==4.5.3'\npip install html-table-extractor\n```\n\n## Usage\n\n### Example 1 - Simple\n\n
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n\"\"\"\nextractor = Extractor(table_doc)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2'], [u'3', u'4']]\n```\n\n### Example 2 - Transformer\n\n\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n\"\"\"\nextractor = Extractor(table_doc, transformer=int)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[1, 2], [3, 4]]\n```\n\n### Example 3 - Pass BS4 Tag\n\n\n\n```python\nfrom html_table_extractor.extractor import Extractor\nfrom bs4 import BeautifulSoup\ntable_doc = \"\"\"\n\n\"\"\"\nsoup = BeautifulSoup(table_doc, 'html.parser')\nextractor = Extractor(soup, id_='wanted')\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2'], [u'3', u'4']]\n```\n\n### Example 4 - Complex\n\n\n \n | 1 | \n 2 | \n 3 | \n
\n \n | 4 | \n
\n \n | 5 | \n
\n
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n \n | 1 | \n 2 | \n 3 | \n
\n \n | 4 | \n
\n \n | 5 | \n
\n
\n\"\"\"\nextractor = Extractor(table_doc)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2', u'3'], [u'1', u'4', u'4'], [u'5', u'5', u'5']]\n```\n\n### Example 5 - Conflicted\n\n\n \n | 1 | \n 2 | \n 3 | \n
\n \n | 4 | \n
\n \n | 5 | \n
\n
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n \n | 1 | \n 2 | \n 3 | \n
\n \n | 4 | \n
\n \n | 5 | \n
\n
\n\"\"\"\nextractor = Extractor(table_doc)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2', u'3'], [u'1', u'4', u'3'], [u'5', u'5', u'3']]\n```\n\n### Example 6 - Write to file\n\n\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n\"\"\"\nextractor = Extractor(table_doc).parse()\nextractor.write_to_csv(path='.')\n```\nIt will write to a given path and create a new csv file called `output.csv`:\n```\n1,2\n3,4\n\n```\n\n## Team\n\n* [@yuanxu-li](https://github.com/yuanxu-li)\n\n## Errors/ Bugs\n\nIf something is not working correctly, or if you have any suggestion on improvements, [report it here](https://github.com/yuanxu-li/table-extractor/issues)\n\n## Copyright\n\nCopyright (c) 2017 Justin Li. Released under the [MIT License](https://github.com/yuanxu-li/html-table-extractor/blob/master/README.md)\n\nThird-party copyright in this distribution is noted where applicable.\n\n\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/yuanxu-li/html-table-extractor",
"keywords": "html table beautifulsoup crawler scrape",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "html-table-extractor",
"package_url": "https://pypi.org/project/html-table-extractor/",
"platform": "",
"project_url": "https://pypi.org/project/html-table-extractor/",
"project_urls": {
"Homepage": "https://github.com/yuanxu-li/html-table-extractor"
},
"release_url": "https://pypi.org/project/html-table-extractor/1.4.0/",
"requires_dist": [
"beautifulsoup4 (==4.5.3)"
],
"requires_python": "",
"summary": "A python library for extracting data from html table",
"version": "1.4.0"
},
"last_serial": 4859595,
"releases": {
"1.0.0": [
{
"comment_text": "",
"digests": {
"md5": "0607da7efd994c51f830d69c96ebd92c",
"sha256": "13ebb924564d7f2132a0912e6f748f792119bed2d87b489766c6acb2bbae9390"
},
"downloads": -1,
"filename": "html_table_extractor-1.0.0-py2-none-any.whl",
"has_sig": false,
"md5_digest": "0607da7efd994c51f830d69c96ebd92c",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 3760,
"upload_time": "2017-04-25T02:57:39",
"url": "https://files.pythonhosted.org/packages/0e/a5/7f9c3000872e6c46533aa42a64d6ee9b18ff2cc7a38c4c3a488875e09ea4/html_table_extractor-1.0.0-py2-none-any.whl"
}
],
"1.0.1": [
{
"comment_text": "",
"digests": {
"md5": "e6a2d6d05ed0608867eb1a2efe57df5c",
"sha256": "47024e8328c02d3e71f33b00fbd2b17cecd2a47692171d94c439389d4b47eaf6"
},
"downloads": -1,
"filename": "html_table_extractor-1.0.1-py2-none-any.whl",
"has_sig": false,
"md5_digest": "e6a2d6d05ed0608867eb1a2efe57df5c",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 4714,
"upload_time": "2017-05-02T01:56:46",
"url": "https://files.pythonhosted.org/packages/57/f8/8ecc659429f0128503e7940823d8450472e961c05f69c0d19a58160aeffa/html_table_extractor-1.0.1-py2-none-any.whl"
}
],
"1.1.0": [
{
"comment_text": "",
"digests": {
"md5": "d90d80d9127843b4a353fe161cf42662",
"sha256": "46d4e7298f0d21cc181a89f2f83c700e04f183ef34f1e05fa1b8ff546438ecfb"
},
"downloads": -1,
"filename": "html_table_extractor-1.1.0-py2-none-any.whl",
"has_sig": false,
"md5_digest": "d90d80d9127843b4a353fe161cf42662",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 5047,
"upload_time": "2017-05-02T01:59:28",
"url": "https://files.pythonhosted.org/packages/00/62/c2810cd13348a35c81544f11de56cbf4f9f8b41359fa95f0c79741eb298c/html_table_extractor-1.1.0-py2-none-any.whl"
}
],
"1.2.0": [
{
"comment_text": "",
"digests": {
"md5": "255587bfe3a6ac73e6323639b9cd2e76",
"sha256": "d8b9a6510d7e0b7b0d363036b20dc7a25f11f5ea2db32e033acbc2f439c24070"
},
"downloads": -1,
"filename": "html_table_extractor-1.2.0-py2-none-any.whl",
"has_sig": false,
"md5_digest": "255587bfe3a6ac73e6323639b9cd2e76",
"packagetype": "bdist_wheel",
"python_version": "py2",
"requires_python": null,
"size": 5529,
"upload_time": "2017-05-09T04:52:42",
"url": "https://files.pythonhosted.org/packages/ac/b6/e483a56af55c749bcb8f28b2d9eb24b08ce0bac4201a26dca081192ddaf8/html_table_extractor-1.2.0-py2-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "351e30fca6094834cff81b9e523691a7",
"sha256": "5bb1e427dfd8ddddc51ec4f3e3dcf9eba09b25b44f3723f150a153c339958266"
},
"downloads": -1,
"filename": "html-table-extractor-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "351e30fca6094834cff81b9e523691a7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3138,
"upload_time": "2019-02-21T06:57:28",
"url": "https://files.pythonhosted.org/packages/02/37/6bea43e3400de9cb8c9238bd923a365f9b9e42b86fb96725651f4cf8ea1b/html-table-extractor-1.2.0.tar.gz"
}
],
"1.3.0": [
{
"comment_text": "",
"digests": {
"md5": "9d8989d816846eb8e1361b8701356700",
"sha256": "1b3244d13bc9e65355c54853d5b0795105c0d726d43bdaab1dafc8a95e440c37"
},
"downloads": -1,
"filename": "html_table_extractor-1.3.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "9d8989d816846eb8e1361b8701356700",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 5561,
"upload_time": "2017-05-10T03:27:23",
"url": "https://files.pythonhosted.org/packages/5b/df/f6db8d825eb524e45d1043122127a74f8e80790fc8afed1535e2056ef319/html_table_extractor-1.3.0-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "877f9077818240ee0f4337cef7db8117",
"sha256": "c91891a9b77e44d2282b4b3cdf50aa5f925bb7837bca47263b5d16bc1dc67573"
},
"downloads": -1,
"filename": "html-table-extractor-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "877f9077818240ee0f4337cef7db8117",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3151,
"upload_time": "2019-02-21T06:57:29",
"url": "https://files.pythonhosted.org/packages/c1/0c/fc754c24b1a298c3300ed73beb0ca5349e1fda0bb8b039be208e3762d0c0/html-table-extractor-1.3.0.tar.gz"
}
],
"1.3.1": [
{
"comment_text": "",
"digests": {
"md5": "d32aeed80cb18e1c650befd85071a928",
"sha256": "4fa86ac077e5567473d7dc7b75855a5bdb06aa804aea3a399ed77e3ac1546248"
},
"downloads": -1,
"filename": "html_table_extractor-1.3.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "d32aeed80cb18e1c650befd85071a928",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 4661,
"upload_time": "2019-02-21T06:57:27",
"url": "https://files.pythonhosted.org/packages/6c/f7/3888db506caf6ec8c75d7b5b354bda92da43ff8db29ac9a3e4f0b2cb1e23/html_table_extractor-1.3.1-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "81dc535adf8708c60043dc0bcf6e6065",
"sha256": "65dd4d77aff447b3ad25118e56dcdbac9033d0a64df6897b85c0ba21ef003374"
},
"downloads": -1,
"filename": "html-table-extractor-1.3.1.tar.gz",
"has_sig": false,
"md5_digest": "81dc535adf8708c60043dc0bcf6e6065",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3616,
"upload_time": "2019-02-21T06:57:31",
"url": "https://files.pythonhosted.org/packages/7a/e7/52b287f561bb5dddde999f12ee97e250869357299b3d1e44b0416c23dc62/html-table-extractor-1.3.1.tar.gz"
}
],
"1.4.0": [
{
"comment_text": "",
"digests": {
"md5": "57dd72a62509ed7626246e21a877fb69",
"sha256": "840cdf3d3a2d9a41b27ca54b95355e934b8efcaa341b2cc1f013e0f98c2ce0fb"
},
"downloads": -1,
"filename": "html_table_extractor-1.4.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "57dd72a62509ed7626246e21a877fb69",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 4752,
"upload_time": "2019-02-24T00:26:01",
"url": "https://files.pythonhosted.org/packages/79/a4/2845f07c4034cd95c15630cdc77c9430ae763ebde41454b2249852cbbcea/html_table_extractor-1.4.0-py2.py3-none-any.whl"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "57dd72a62509ed7626246e21a877fb69",
"sha256": "840cdf3d3a2d9a41b27ca54b95355e934b8efcaa341b2cc1f013e0f98c2ce0fb"
},
"downloads": -1,
"filename": "html_table_extractor-1.4.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "57dd72a62509ed7626246e21a877fb69",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 4752,
"upload_time": "2019-02-24T00:26:01",
"url": "https://files.pythonhosted.org/packages/79/a4/2845f07c4034cd95c15630cdc77c9430ae763ebde41454b2249852cbbcea/html_table_extractor-1.4.0-py2.py3-none-any.whl"
}
]
}