{ "info": { "author": "Roman Koblov", "author_email": "pingu.g@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "Leaf\n====\n\n.. image:: https://travis-ci.org/penpen/leaf.png?branch=master\n :target: https://travis-ci.org/penpen/leaf\n\n.. image:: https://coveralls.io/repos/penpen/leaf/badge.png?branch=master \n :target: https://coveralls.io/r/penpen/leaf?branch=master\n\n.. image:: https://pypip.in/d/leaf/badge.png\n :target: https://pypi.python.org/pypi//leaf/\n :alt: Downloads\n\n.. image:: https://pypip.in/v/leaf/badge.png\n :target: https://pypi.python.org/pypi/leaf/\n :alt: Latest Version\n\n.. image:: https://pypip.in/license/leaf/badge.png\n :target: https://pypi.python.org/pypi/leaf/\n :alt: License\n\nWhat is this?\n-------------\n\nThis is a simple wrapper around `lxml `_ which adds some nice\nfeatures to make working with lxml better. This library covers all my needs in\nHTML parsing.\n\nDependencies\n------------\n\n`lxml `_ obviously :3\n\nFeatures\n--------\n\n* Nice jquery-like CSS selectors\n* Simple access to element attributes\n* Easy way to convert HTML to other formats (bbcode, markdown, etc.)\n* A few nice functions for working with text\n* And, of course, all original features of lxml\n\nDescription\n-----------\n\nThe main function of the module (for my purposes) is ``leaf.parse``. \nThis function takes an HTML string as argument, and returns a ``leaf.Parser``\nobject, which wraps an lxml object.\n\nWith this object you can do anything you want, for example::\n\n document = leaf.parse(sample)\n # get the links from the DIV with id 'menu' using CSS selectors\n links = document('div#menu a')\n\nOr you can do this::\n\n # get first link or return None\n link = document.get('div#menu a')\n\nAnd you can get attributes from these results like this::\n\n print link.onclick\n\nYou can also use standard lxml methods like ``object.xpath``,\nand they return results as ``leaf.Parser`` objects.\n\nMy favorite feature is parsing HTML into bbcode (markdown, etc.)::\n\n # Let's define simple formatter, which passes text\n # and wraps links into [url][/url] (like bbcode)\n def omgcode_formatter(element, children):\n # Replace
tag with line break\n if element.tag == 'br':\n return '\\n'\n # Wrap links into [url][/url]\n if element.tag == 'a':\n return u\"[url=link}]{text}[/url]\".format(link=element.href, text=children)\n # Return children only for other elements.\n if children:\n return children\n\nThis function will be recursively called with element and children (this is\nstring with children parsing result).\n\nSo, let's call this parser on some ``leaf.Parser`` object::\n\n document.parse(omgcode_formatter)\n\nMore detailed examples available in the tests.\n\nFinally, this library has some nice functions for working with text:\n\n``to_unicode``\n Convert string to unicode string\n\n``strip_accents``\n Strip accents from a string\n\n``strip_symbols``\n Strip ugly unicode symbols from a string\n\n``strip_spaces``\n Strip excess spaces from a string\n\n``strip_linebreaks``\n Strip excess line breaks from a string\nChange log\n==========\n\n1.0.1\n===\n - 100% test coverage\n - fixed bug in result wrapping (etree._Element has __iter__ too!)\n\n1.0\n---\n - add python3 support\n - first production release\n\n0.4.4\n-----\n - fix inner_html method\n - added **kwargs to the parse function, added inner_html method to the Parser class\n - cssselect in deps\n\n0.4.2\n-----\n - Node attribute modification via node.href = '/blah'\n - Custom default value for get: document.get(selector, default=None)\n - Get element by index: document.get(selector, index)\n\n0.4.1\n-----\n - bool(node) returns True if element exists and False if element is None\n\n0.4\n---\n - First public version", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/penpen/leaf", "keywords": "html,parsing,web scrapping", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "leaf", "package_url": "https://pypi.org/project/leaf/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/leaf/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/penpen/leaf" }, "release_url": "https://pypi.org/project/leaf/1.0.3/", "requires_dist": null, "requires_python": null, "summary": "Simple Python library for HTML parsing", "version": "1.0.3" }, "last_serial": 1228229, "releases": { "0.4": [ { "comment_text": "", "digests": { "md5": "3b3491cb8e3a571a9c5f22c7cef5649f", "sha256": "5ce9d3fd142c0c083dd4dd66f2bea127ce24b3c9ab76a043b0a4b2b848506afb" }, "downloads": -1, "filename": "leaf-0.4.tar.gz", "has_sig": false, "md5_digest": "3b3491cb8e3a571a9c5f22c7cef5649f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4750, "upload_time": "2011-03-08T21:00:07", "url": "https://files.pythonhosted.org/packages/c6/74/6d26008be3be905abe8364a2d02fa007e713c51c1a755a9929f589a53caf/leaf-0.4.tar.gz" } ], "0.4.1": [], "0.4.2": [ { "comment_text": "", "digests": { "md5": "8a7ed906402e2ee3d16852b5f6e1f942", "sha256": "6aa0cc488be4d06a2c57ccc18377848316a47e3a2af54ef2db7c3c0770847f53" }, "downloads": -1, "filename": "leaf-0.4.2.tar.gz", "has_sig": false, "md5_digest": "8a7ed906402e2ee3d16852b5f6e1f942", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4921, "upload_time": "2011-05-15T14:05:04", "url": "https://files.pythonhosted.org/packages/17/c1/0187a41ccb6c73cd8a33d7df6a04868ed7052ef465b50161c55a73e0e625/leaf-0.4.2.tar.gz" } ], "0.4.3": [ { "comment_text": "", "digests": { "md5": "fc190ca703f0c4da796ee6e727eb2553", "sha256": "8babfd710db8485623b3bef157414e6ba53f3a59789fb05e229b83b016635480" }, "downloads": -1, "filename": "leaf-0.4.3.tar.gz", "has_sig": false, "md5_digest": "fc190ca703f0c4da796ee6e727eb2553", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4997, "upload_time": "2013-01-18T12:05:06", "url": "https://files.pythonhosted.org/packages/30/bd/23549ee8c56cbf2bc91f05104b7e04c75b13a85cf296492035986b866421/leaf-0.4.3.tar.gz" } ], "0.4.4": [ { "comment_text": "", "digests": { "md5": "0082c70f6670e802ab9301d582087b15", "sha256": "9c8ff5c54c2b34e8e70dec76669f971936411fad4cf59e50ea44ca2e814c4574" }, "downloads": -1, "filename": "leaf-0.4.4.tar.gz", "has_sig": false, "md5_digest": "0082c70f6670e802ab9301d582087b15", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4991, "upload_time": "2013-01-18T12:05:22", "url": "https://files.pythonhosted.org/packages/66/52/e37ab7470d75dcb5cc6bf44c3f6fbba46a9e1243f4bd80a5974686dd60cb/leaf-0.4.4.tar.gz" } ], "0.4.5": [ { "comment_text": "", "digests": { "md5": "37527a859571c13c24c4cc4ee3b4c804", "sha256": "60e469da0ea1b195816f8ef5d78966c3475a15ce8e11668a01d1ecf28aa9c84d" }, "downloads": -1, "filename": "leaf-0.4.5.tar.gz", "has_sig": false, "md5_digest": "37527a859571c13c24c4cc4ee3b4c804", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5060, "upload_time": "2013-08-16T09:44:13", "url": "https://files.pythonhosted.org/packages/81/0a/6f13a08e3ff0fc24832abc4f958626254b04c77ac7bbf2ca0aced46979fc/leaf-0.4.5.tar.gz" } ], "1.0": [ { "comment_text": "", "digests": { "md5": "b26df96abc209313ac10249edede6daa", "sha256": "7fd309af6e812eba3951875ee9d2ff15a28c49db5c288b9d9bce94bd4fabb051" }, "downloads": -1, "filename": "leaf-1.0.tar.gz", "has_sig": false, "md5_digest": "b26df96abc209313ac10249edede6daa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5373, "upload_time": "2014-03-10T18:42:54", "url": "https://files.pythonhosted.org/packages/c0/1e/3aed1d5eb572c7c9dfe57fe58aa76dea03521a790674d833b6b7593833c0/leaf-1.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "0176e7f21347743a14c2aa5bdeb8fc2e", "sha256": "8525ee519931f6707985d2e46b26827d416a849da1951d06b0063b01292eb6a8" }, "downloads": -1, "filename": "leaf-1.0.1.tar.gz", "has_sig": false, "md5_digest": "0176e7f21347743a14c2aa5bdeb8fc2e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5501, "upload_time": "2014-03-12T17:54:07", "url": "https://files.pythonhosted.org/packages/3f/f2/aa32f400d8ea0cc3051916bee9b8e682547e165ec9120107b50f9c5c51be/leaf-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "b927214615296fe961600d919a8eb0a9", "sha256": "5976ea6faac1155cd3636a831be22c364372c65c23d312691630bd456f33b478" }, "downloads": -1, "filename": "leaf-1.0.2.tar.gz", "has_sig": false, "md5_digest": "b927214615296fe961600d919a8eb0a9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5499, "upload_time": "2014-03-13T10:18:17", "url": "https://files.pythonhosted.org/packages/ef/20/2f45dfc965f2bdbaa534c9c6e877781ffd06f3f9abbd11b7a9e3e5dec708/leaf-1.0.2.tar.gz" } ], "1.0.3": [ { "comment_text": "", "digests": { "md5": "3ddbb6aa229ced930ab700845eb1c110", "sha256": "582ad3a2a5a0e2d650562386b37bf4f64efd6192641f741214ed50e0cf07ef33" }, "downloads": -1, "filename": "leaf-1.0.3.tar.gz", "has_sig": false, "md5_digest": "3ddbb6aa229ced930ab700845eb1c110", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5698, "upload_time": "2014-09-17T17:15:29", "url": "https://files.pythonhosted.org/packages/fc/c0/156fd0cd8c074c7440dbfa36e25b6255cb3d3533601fc2d44ffe4079a05d/leaf-1.0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3ddbb6aa229ced930ab700845eb1c110", "sha256": "582ad3a2a5a0e2d650562386b37bf4f64efd6192641f741214ed50e0cf07ef33" }, "downloads": -1, "filename": "leaf-1.0.3.tar.gz", "has_sig": false, "md5_digest": "3ddbb6aa229ced930ab700845eb1c110", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5698, "upload_time": "2014-09-17T17:15:29", "url": "https://files.pythonhosted.org/packages/fc/c0/156fd0cd8c074c7440dbfa36e25b6255cb3d3533601fc2d44ffe4079a05d/leaf-1.0.3.tar.gz" } ] }