{ "info": { "author": "Justin Li", "author_email": "yuanxu.lee@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5" ], "description": "# HTML Table Extractor\n[![Build Status](https://travis-ci.org/yuanxu-li/html-table-extractor.svg?branch=master)](https://travis-ci.org/yuanxu-li/html-table-extractor)\n\n_HTML Table Extractor is a python library that uses [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) to extract data from complicated and messy html table_\n\n## Important links\n* Repository: https://github.com/yuanxu-li/html-table-extractor\n* Issues: https://github.com/yuanxu-li/html-table-extractor/issues\n\n## Installation\n\n```bash\npip install 'beautifulsoup4==4.5.3'\npip install html-table-extractor\n```\n\n## Usage\n\n### Example 1 - Simple\n\n
12
34
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n
12
34
\n\"\"\"\nextractor = Extractor(table_doc)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2'], [u'3', u'4']]\n```\n\n### Example 2 - Transformer\n\n
12
34
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n
12
34
\n\"\"\"\nextractor = Extractor(table_doc, transformer=int)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[1, 2], [3, 4]]\n```\n\n### Example 3 - Pass BS4 Tag\n\n
12
34
\n\n```python\nfrom html_table_extractor.extractor import Extractor\nfrom bs4 import BeautifulSoup\ntable_doc = \"\"\"\n
12
34
not wanted
\n\"\"\"\nsoup = BeautifulSoup(table_doc, 'html.parser')\nextractor = Extractor(soup, id_='wanted')\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2'], [u'3', u'4']]\n```\n\n### Example 4 - Complex\n\n\n \n \n \n \n \n \n \n \n \n \n \n
123
4
5
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n \n \n \n \n \n \n \n \n \n \n \n
123
4
5
\n\"\"\"\nextractor = Extractor(table_doc)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2', u'3'], [u'1', u'4', u'4'], [u'5', u'5', u'5']]\n```\n\n### Example 5 - Conflicted\n\n\n \n \n \n \n \n \n \n \n \n \n \n
123
4
5
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n\n \n \n \n \n \n \n \n \n \n \n \n
123
4
5
\n\"\"\"\nextractor = Extractor(table_doc)\nextractor.parse()\nextractor.return_list()\n```\nIt will print out:\n```python\n[[u'1', u'2', u'3'], [u'1', u'4', u'3'], [u'5', u'5', u'3']]\n```\n\n### Example 6 - Write to file\n\n
12
34
\n\n```python\nfrom html_table_extractor.extractor import Extractor\ntable_doc = \"\"\"\n
12
34
\n\"\"\"\nextractor = Extractor(table_doc).parse()\nextractor.write_to_csv(path='.')\n```\nIt will write to a given path and create a new csv file called `output.csv`:\n```\n1,2\n3,4\n\n```\n\n## Team\n\n* [@yuanxu-li](https://github.com/yuanxu-li)\n\n## Errors/ Bugs\n\nIf something is not working correctly, or if you have any suggestion on improvements, [report it here](https://github.com/yuanxu-li/table-extractor/issues)\n\n## Copyright\n\nCopyright (c) 2017 Justin Li. Released under the [MIT License](https://github.com/yuanxu-li/html-table-extractor/blob/master/README.md)\n\nThird-party copyright in this distribution is noted where applicable.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/yuanxu-li/html-table-extractor", "keywords": "html table beautifulsoup crawler scrape", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "html-table-extractor", "package_url": "https://pypi.org/project/html-table-extractor/", "platform": "", "project_url": "https://pypi.org/project/html-table-extractor/", "project_urls": { "Homepage": "https://github.com/yuanxu-li/html-table-extractor" }, "release_url": "https://pypi.org/project/html-table-extractor/1.4.0/", "requires_dist": [ "beautifulsoup4 (==4.5.3)" ], "requires_python": "", "summary": "A python library for extracting data from html table", "version": "1.4.0" }, "last_serial": 4859595, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "0607da7efd994c51f830d69c96ebd92c", "sha256": "13ebb924564d7f2132a0912e6f748f792119bed2d87b489766c6acb2bbae9390" }, "downloads": -1, "filename": "html_table_extractor-1.0.0-py2-none-any.whl", "has_sig": false, "md5_digest": "0607da7efd994c51f830d69c96ebd92c", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 3760, "upload_time": "2017-04-25T02:57:39", "url": "https://files.pythonhosted.org/packages/0e/a5/7f9c3000872e6c46533aa42a64d6ee9b18ff2cc7a38c4c3a488875e09ea4/html_table_extractor-1.0.0-py2-none-any.whl" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "e6a2d6d05ed0608867eb1a2efe57df5c", "sha256": "47024e8328c02d3e71f33b00fbd2b17cecd2a47692171d94c439389d4b47eaf6" }, "downloads": -1, "filename": "html_table_extractor-1.0.1-py2-none-any.whl", "has_sig": false, "md5_digest": "e6a2d6d05ed0608867eb1a2efe57df5c", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 4714, "upload_time": "2017-05-02T01:56:46", "url": "https://files.pythonhosted.org/packages/57/f8/8ecc659429f0128503e7940823d8450472e961c05f69c0d19a58160aeffa/html_table_extractor-1.0.1-py2-none-any.whl" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "d90d80d9127843b4a353fe161cf42662", "sha256": "46d4e7298f0d21cc181a89f2f83c700e04f183ef34f1e05fa1b8ff546438ecfb" }, "downloads": -1, "filename": "html_table_extractor-1.1.0-py2-none-any.whl", "has_sig": false, "md5_digest": "d90d80d9127843b4a353fe161cf42662", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 5047, "upload_time": "2017-05-02T01:59:28", "url": "https://files.pythonhosted.org/packages/00/62/c2810cd13348a35c81544f11de56cbf4f9f8b41359fa95f0c79741eb298c/html_table_extractor-1.1.0-py2-none-any.whl" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "255587bfe3a6ac73e6323639b9cd2e76", "sha256": "d8b9a6510d7e0b7b0d363036b20dc7a25f11f5ea2db32e033acbc2f439c24070" }, "downloads": -1, "filename": "html_table_extractor-1.2.0-py2-none-any.whl", "has_sig": false, "md5_digest": "255587bfe3a6ac73e6323639b9cd2e76", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 5529, "upload_time": "2017-05-09T04:52:42", "url": "https://files.pythonhosted.org/packages/ac/b6/e483a56af55c749bcb8f28b2d9eb24b08ce0bac4201a26dca081192ddaf8/html_table_extractor-1.2.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "351e30fca6094834cff81b9e523691a7", "sha256": "5bb1e427dfd8ddddc51ec4f3e3dcf9eba09b25b44f3723f150a153c339958266" }, "downloads": -1, "filename": "html-table-extractor-1.2.0.tar.gz", "has_sig": false, "md5_digest": "351e30fca6094834cff81b9e523691a7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3138, "upload_time": "2019-02-21T06:57:28", "url": "https://files.pythonhosted.org/packages/02/37/6bea43e3400de9cb8c9238bd923a365f9b9e42b86fb96725651f4cf8ea1b/html-table-extractor-1.2.0.tar.gz" } ], "1.3.0": [ { "comment_text": "", "digests": { "md5": "9d8989d816846eb8e1361b8701356700", "sha256": "1b3244d13bc9e65355c54853d5b0795105c0d726d43bdaab1dafc8a95e440c37" }, "downloads": -1, "filename": "html_table_extractor-1.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9d8989d816846eb8e1361b8701356700", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 5561, "upload_time": "2017-05-10T03:27:23", "url": "https://files.pythonhosted.org/packages/5b/df/f6db8d825eb524e45d1043122127a74f8e80790fc8afed1535e2056ef319/html_table_extractor-1.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "877f9077818240ee0f4337cef7db8117", "sha256": "c91891a9b77e44d2282b4b3cdf50aa5f925bb7837bca47263b5d16bc1dc67573" }, "downloads": -1, "filename": "html-table-extractor-1.3.0.tar.gz", "has_sig": false, "md5_digest": "877f9077818240ee0f4337cef7db8117", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3151, "upload_time": "2019-02-21T06:57:29", "url": "https://files.pythonhosted.org/packages/c1/0c/fc754c24b1a298c3300ed73beb0ca5349e1fda0bb8b039be208e3762d0c0/html-table-extractor-1.3.0.tar.gz" } ], "1.3.1": [ { "comment_text": "", "digests": { "md5": "d32aeed80cb18e1c650befd85071a928", "sha256": "4fa86ac077e5567473d7dc7b75855a5bdb06aa804aea3a399ed77e3ac1546248" }, "downloads": -1, "filename": "html_table_extractor-1.3.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d32aeed80cb18e1c650befd85071a928", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 4661, "upload_time": "2019-02-21T06:57:27", "url": "https://files.pythonhosted.org/packages/6c/f7/3888db506caf6ec8c75d7b5b354bda92da43ff8db29ac9a3e4f0b2cb1e23/html_table_extractor-1.3.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "81dc535adf8708c60043dc0bcf6e6065", "sha256": "65dd4d77aff447b3ad25118e56dcdbac9033d0a64df6897b85c0ba21ef003374" }, "downloads": -1, "filename": "html-table-extractor-1.3.1.tar.gz", "has_sig": false, "md5_digest": "81dc535adf8708c60043dc0bcf6e6065", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3616, "upload_time": "2019-02-21T06:57:31", "url": "https://files.pythonhosted.org/packages/7a/e7/52b287f561bb5dddde999f12ee97e250869357299b3d1e44b0416c23dc62/html-table-extractor-1.3.1.tar.gz" } ], "1.4.0": [ { "comment_text": "", "digests": { "md5": "57dd72a62509ed7626246e21a877fb69", "sha256": "840cdf3d3a2d9a41b27ca54b95355e934b8efcaa341b2cc1f013e0f98c2ce0fb" }, "downloads": -1, "filename": "html_table_extractor-1.4.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "57dd72a62509ed7626246e21a877fb69", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 4752, "upload_time": "2019-02-24T00:26:01", "url": "https://files.pythonhosted.org/packages/79/a4/2845f07c4034cd95c15630cdc77c9430ae763ebde41454b2249852cbbcea/html_table_extractor-1.4.0-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "57dd72a62509ed7626246e21a877fb69", "sha256": "840cdf3d3a2d9a41b27ca54b95355e934b8efcaa341b2cc1f013e0f98c2ce0fb" }, "downloads": -1, "filename": "html_table_extractor-1.4.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "57dd72a62509ed7626246e21a877fb69", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 4752, "upload_time": "2019-02-24T00:26:01", "url": "https://files.pythonhosted.org/packages/79/a4/2845f07c4034cd95c15630cdc77c9430ae763ebde41454b2249852cbbcea/html_table_extractor-1.4.0-py2.py3-none-any.whl" } ] }