{ "info": { "author": "Thomas Amland", "author_email": "thomas.amland@googlemail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Markup :: HTML", "Topic :: Text Processing :: Markup :: XML" ], "description": "parsedom\n=========\n\nThis is a fork of `Common Functions and ParseDOM `_ for use outside of XBMC.\n\n\nGetting element content.\n-------------------------\n\n.. code:: python\n\n from parsedom import parseDOM\n link_html = \"Link Test\"\n ret = parseDOM(link_html, \"a\")\n print repr(ret) # Prints ['Link Test']\n\n\n\nGetting an element attribute.\n-----------------------------\n\n.. code:: python\n\n link_html = \"Link Test\"\n ret = parseDOM(link_html, \"a\", ret = \"href\")\n print repr(ret) # Prints ['bla.html']\n\n\nGet element with matching attribute.\n---------------------------------------\n\n.. code:: python\n\n link_html = \"Link Test1Link Test2Link Test3\"\n ret1 = parseDOM(link_html, \"a\", attrs = { \"id\": \"link1\" }, ret = \"href\")\n ret2 = parseDOM(link_html, \"a\", attrs = { \"id\": \"link2\" })\n ret3 = parseDOM(link_html, \"a\", attrs = { \"id\": \"link3\" }, ret = \"id\")\n print repr(ret1) # Prints ['bla1.html']\n print repr(ret2) # Prints ['Link Test2']\n print repr(ret3) # Prints ['link3']\n\nWhen scraping sites it is prudent to scrape in steps, since real websites are often complicated.\n\nTake this example where you want to get all the user uploads.\n\n.. code:: html\n\n <div id=\"content\">\n <div id=\"sidebar\">\n <div id=\"latest\">\n Miley Cyrus - When I Look At You>br /<\n Puppet theater<br />\n VBLOG #42<br />\n Fourth upload<br />\n </div>\n </div>\n <div id=\"user\">\n <div id=\"uploads\">\n First upload<br />\n Second upload<br />\n Third upload<br />\n Fourth upload<br />\n </div>\n </div>\n </div>\n\n\nThe first step is to limit your search to the correct area.\n\nOne should always find the inner most DOM element that contains the needed data.\n\n.. code:: python\n\n ret = parseDOM(html, \"div\", attrs = { \"id\": \"uploads\" })\n\n\nThe variable ret now contains\n\n.. code:: python\n\n ['First upload<br />\n Second upload<br />\n Third upload<br />\n Fourth upload<br />']\n\nAnd now we get the video url.\n\n.. code:: python\n\n videos = parseDOM(ret, \"a\", ret = \"href\")\n print repr(videos) # Prints [ \"video?12\", \"video?23\", \"video?34\", \"video?41\" ]", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tamland/parsedom", "keywords": null, "license": "GPLv3", "maintainer": null, "maintainer_email": null, "name": "parsedom", "package_url": "https://pypi.org/project/parsedom/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/parsedom/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/tamland/parsedom" }, "release_url": "https://pypi.org/project/parsedom/1.0.0/", "requires_dist": null, "requires_python": null, "summary": "A fast DOM parser", "version": "1.0.0" }, "last_serial": 1039274, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "4247bc3bab09a6166773cf55d398ce2c", "sha256": "09c15a77c9115127d38b330bc8d688506d5282b4ed0aaa910604587f23ca43b8" }, "downloads": -1, "filename": "parsedom-1.0.0.tar.gz", "has_sig": false, "md5_digest": "4247bc3bab09a6166773cf55d398ce2c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16789, "upload_time": "2014-03-24T12:56:40", "url": "https://files.pythonhosted.org/packages/b2/cb/dd97f8e212095cb947b40cbc3748d05a003b0bfe1ca85a1a4a24548305ae/parsedom-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4247bc3bab09a6166773cf55d398ce2c", "sha256": "09c15a77c9115127d38b330bc8d688506d5282b4ed0aaa910604587f23ca43b8" }, "downloads": -1, "filename": "parsedom-1.0.0.tar.gz", "has_sig": false, "md5_digest": "4247bc3bab09a6166773cf55d398ce2c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16789, "upload_time": "2014-03-24T12:56:40", "url": "https://files.pythonhosted.org/packages/b2/cb/dd97f8e212095cb947b40cbc3748d05a003b0bfe1ca85a1a4a24548305ae/parsedom-1.0.0.tar.gz" } ] }