{ "info": { "author": "jhyao", "author_email": "yaojinhonggg@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Template Parser HTML\nThis tool can help get useful data from html web page. It parses html page with the template file which marks data that you need with special attributes. The template file positions html blocks that contain data, and describes types, names and structures of data. You can modify an example page file to get it, or write a basic html structure that can position your data. I suggest you to use the first method, the tool can delete irrelevant parts and organize html tree automatically.\n## install\n```\npip install tp_html\n```\n## How to use\n```python\nfrom tp_html import Template, ThtmlParser\n\n# get template\ntemplate = Template(template_file='samples/basic_template.html')\n\n# save template\ntemplate.save('samples/basic_template.min.html')\n\n# get parser\nparser = ThtmlParser(template_file='samples/basic_template.html')\nparser = ThtmlParser(template_text='...')\nparser = ThtmlParser(template=template)\n\n# parse data\ndata = parser.parse(page_file='samples/basic_sample.html', encoding='urf-8')\ndata = parser.parse(page_url='http://.....')\ndata = parser.parse(page_text='.....')\n```\n## Template file\n### string\nTo get data from content or attributes of element.\n```html\nlink\n```\nTo get content. This will get data {'name': 'link'} \n```html\n\n```\nTo get href. This will get data {'name': '...'}\n```html\n\n```\n### list\nFor HTML\n```html\n
\n```\ntemplate:\n```html\n\n```\ndata:\n```json\n{\n \"images\": [\n \"/image/1\",\n \"/image/2\",\n \"/image/3\",\n \"/image/4\"\n ]\n}\n```\nIn list template, in the element which is marked with p-type=list, require one child p-value node and just one that is for selecting item data. If list item is dict or list, structure in item is also allowed.\n### dict\nFor HTML\n```html\n\n```\ntemplate:\n```html\n\n```\ndata:\n```json\n{\n \"user_link\": {\n \"name\": \"xxx\",\n \"link\": \"/user/13456\",\n \"title\": \"user xxx\",\n \"age\": \"20\",\n \"fans_num\": \"10\",\n \"follow_num\": \"20\"\n }\n}\n```\nIn dict template, p-name is required for key of dictionary. Multiple p-item is allowed, split with space, and \"string\" means content of element, others items are attributies name.\n## complex nesting\nhtml\n```html\n\n\n\n \n