{ "info": { "author": "Taras Gaidukov", "author_email": "kemaweyan@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "=========================================================================\n EasyHTML :: A package that provides an easy access to elements\n of HTML and XHTML documents through the Document Object Model.\n=========================================================================\n\nHomepage: http://easyhtml.sourceforge.net/\n\nAuthor: Taras Gaidukov\n\n================================\n Installation Instructions:\n================================\n\nDependencies:\n\n Python 3.4+\n\nBuild and install by running:\n\n $ python setup.py build\n $ sudo python setup.py install\n\n============\n Overview:\n============\n\nThe package contains a module easyhtml.parser that provides a class\neasyhtml.parser.DOMParser (a subclass of html.parser.HTMLParser).\nThis class has a method get_dom() that returns a DOM of parsed document\nwhich is an instance of easyhtml.dom.HTMLDocument class:\n\nfrom easyhtml import parser\n\ndom_parser = parser.DOMParser()\ndom_parser.feed(''\n ' '\n '

Hello world!

'\n '
'\n '

First paragraph

'\n '

Second paragraph

'\n '

Third paragraph

'\n '
'\n ' '\n '')\ndocument = dom_parser.get_dom()\n\nThe HTMLDocument class provides an API for access to all elements, their\nattributes and contents. Elements of DOM are instances of one of following\nclasses:\n\nHTMLDocument - a whole document\nHTMLTag - a tag elements\nTextNode - a text elements including HTML entities\nHTMLComment - a comments in HTML code\nDoctypeDeclaration - a doctype declaration of the document\n\nEach HTML element provides two variants of its view: raw HTML code and its\n\"clear\" text representation (as in web-browsers). A raw HTML code implemented\nas raw_html property of the element:\n\ndocument.raw_html # returns an HTML code of the document\n\nAnd a text representation is just a str version of the element:\n\nstr(document) # returns a text representation of the document\n\nText representations of comments and a doctype declaration are empty since\nthese elements should be invisible on the page.\n\nTo access nested elements inside other ones, there are following methods:\n\nget_tags_by_name(name) - returns all tags with specified name\nget_children(query) - returns all tags that match a query*\nget_element_by_id(id) - returns a tag with specified id\n\n* query is a string that consists of conditions separated with semicolons.\nA condition is pair of an attribute name and its value written through the\nequality sign: \"attr1=value1; attr2=value2\".\n\nFirst two methods return an instance of easyhtml.dom.HTMLCollection class that\ncontains found tags. An HTMLCollection object could be empty if there are no\ntags that satisfy the conditions. The get_element_by_id() method returns a tag\nwith specified id since it's assumed that the id is unique and there is only\none tag with such id. If the tag is not found, None is returned.\n\nAlso, the method get_tags_by_name() is implemented as the magic method \n__getattr__ that allows to get elements by their name just as an attribute:\ndocument.div is eqivalent to document.get_tags_by_name('div')\n\nThe HTMLCollection class is a list-based collection that contains DOM elements\nor other HTMLCollection objects. Like a list the collection provides an access\nto its elements by their indices using get_element() method and it could be\niterated in the for-loop:\n\ncollection.get_element(0) # returns the first element of the collection\n # or None if it does not exist\nfor element in collection:\n # do something with each element\n\nAlso, the method get_element() is implemented as the magic method __getitem__\nthat provides a simplier syntax through the use of the square brackets\noperator, so collection[0] is equivalent to collection.get_element(0)\n\nMethods of the HTMLCollection class are similar to those of HTMLTag and\nHTMLDocument:\n\nget_tags_by_name(name) - returns all tags with specified name\nget_children(query) - returns all tags that match a query\nget_element_by_id(id) - returns a tag with specified id\n\nIMPORTANT: Since the HTMLCollection is not an element of the DOM and does not\nkeep the hierarchy of contained elements, each element in the collection is\nconsidered as an independent search result. And when a new search is called,\nthe result would be a new HTMLCollection that contains separate collections\nfor each element of the source HTMLCommection object. For instance, say the\ndocument contains following code:\n\n
\n

\n
\n
\n

\n

\n
\n\nthen a call document.get_tags_by_name('div') returns a collection with two\ntags:\n\nCollection {\n
\n
\n}\n\nThen call collection.get_tags_by_name('p') returns a collection that contains\ntwo collections with

tags in each one.\n\nCollection {\n Collection {\n

\n }\n Collection {\n

\n

\n }\n}\n\nSuch behavior is similar to that the use of the collection in the for-loop and\nmaking new search requests to each element separately:\n\ncollection = document.get_tags_by_name('div')\nfor element in collection:\n sub_collection = element.get_tags_by_name('p')\n # do something with found tags\n\nThe same result would be if you call a new search request to the collection\ndirectly and use the result in the for-loop:\n\ncollection = document.get_tags_by_name('div')\ncollection = collection.get_tags_by_name('p')\nfor sub_collection in collection:\n # do something with found tags\n\nOr even shorter:\n\nfor sub_collection in document.div.p:\n # do something with found tags\n\nPlease note that the get_element_by_id() method returns a single tag as well\nas HTMLTag and HTMLDocument objects do, since it's assumed that the id is\nunique and there is only one tag with such id in the document even if the\nhierarhy of its elements has been destroyed.\n\nIn addition the HTMLCollection class provides a method that helps to refine \nthe request:\n\nfilter_tags_by_attrs(query) - filters found elements by specified query\n\nAlso this method is implemented as the magic method __call__, so it's possible\nto filter tags using simplier syntax with brackets:\n\ndocument.div(class=someclass) # returns a collection of div elements with\n # the class \"someclass\"\n\nNote that the filter_tags_by_attrs() method does not create a new level of\nnested collections.\n\n===========================\n Package API reference:\n===========================\n\nclass easyhtml.parser.DOMParser()\n\n Creates a parser instance. The DOMParser is a subclass of\n html.parser.HTMLParser class. For more details about HTMLParser usage see\n the official documentation of HTMLParser at Python's website.\n\nDOMParser Methods:\n\nDOMParser.get_dom()\n\n Returns an instance of HTMLDocument that is the root object of the\n Document Object Model.\n\n\nclass easyhtml.dom.DoctypeDeclaration(decl)\n\n A doctype declaration of the document. Used as an attribute of\n easyhtml.dom.HTMLDocument objects. Is not visible in the str version of\n the document, but it's present in the HTML code.\n\n :decl: the text of the declaration, type str\n\nDoctypeDeclaration Methods:\n\nDoctypeDeclaration.raw_html\n\n A property that returns a raw HTML code of the declaration. It consists of\n the text passed into constructor between symbols. For instance,\n \n\nDoctypeDeclaration.__str__()\n\n Returns a str version of the declaration. It's implicitly called in the\n str context. Always returns an empty string since the doctype declaration\n is invisible on the page.\n\n\nclass easyhtml.dom.HTMLComment(text)\n\n A comment in the HTML code. It's not visible in the str version of the\n document, but it's present in the HTML code.\n\n :text: a text of the comment, type str\n\nHTMLComment Methods:\n\nHTMLComment.raw_html\n\n A property that returns a raw HTML code of the comment. It consists of the\n text passed into constructor between symbols. For instance,\n \n\nHTMLComment.__str__()\n\n Returns a str version of the comment. It's implicitly called in the str\n context. Always returns an empty string since comments are invisible on\n the page.\n\n\nclass easyhtml.dom.TextNode()\n\n A text data element in the document. It's a container for any visible\n text data on the page. Could contain PlainText, NamedEntity and\n NumEntity objects.\n\nTextNode Methods:\n\nTextNode.raw_html\n\n A property that returns a raw HTML code of all contained objects.\n\nTextNode.__str__()\n\n Returns a str version of the text data. It's implicitly called in the str\n context. Includes corresponding characters instead of contained HTML\n entities.\n\nTextNode.append(element)\n\n Adds an element to the end of the list.\n\n :element: an element to append, type easyhtml.dom.HTMLText (a superclass\n of PlainText, NamedEntity and NumEntity classes.\n\n\nclass easyhtml.dom.PlainText(text)\n\n A plain text on the page (without HTML entities). It's used inside of the\n TextNode element only.\n\n :text: a text of the element, type str\n\nPlainText Methods:\n\nPlainText.raw_html\n\n A property that returns all characters of the text as it is in the HTML\n document.\n\nPlainText.__str__()\n\n Returns a str version of the text. It's implicitly called in the str\n context. Replaces sequences of white space characters with single spaces.\n\n\nclass easyhtml.dom.NamedEntity(name)\n\n A named HTML entity. It's used inside of the TextNode element only. If the\n entity with specified name does not exist, the KeyError would be rised.\n\n :name: a name of the entity, type str\n\nNamedEntity Methods:\n\nNamedEntity.raw_html\n\n A property that returns an HTML code of the entity. It consists of the\n name passed into constructor between & and ; symbols.\n\nNamedEntity.__str__()\n\n Returns a str version of the entity. It's implicitly called in the str\n context. For instance, it returns the \"<\" character for the entity with\n the code <\n\n\nclass easyhtml.dom.NumEntity(num)\n\n A numeric HTML entity specified by decimal or hexadecimal code. It's used\n inside of the TextNode element only. If the entity with specified numeric\n code does not exist, the KeyError would be rised.\n\n :num: a numeric code of the entity, type str\n\nNamedEntity Methods:\n\nNamedEntity.raw_html\n\n A property that returns an HTML code of the entity. It consists of the\n name passed into constructor between &# and ; symbols.\n\nNamedEntity.__str__()\n\n Returns a str version of the entity. It's implicitly called in the str\n context. For instance, it returns the \"<\" character for the entity with\n the code < or <\n\n\nclass HTMLTag(name, attrs)\n\n An HTML tag. Could be single such as
or complex such as

...

.\n Complex tags could contain other elements (TextNode, HTMLComment or\n HTMLTag).\n\n :name: a name of the tag, type str\n :attrs: a list of tuples with attributes of the tag,\n format [(attr1, value1), (attr2: value2)]\n\nHTMLTag Methods:\n\nHTMLTag.single\n\n Aproperty that indicates whether the tag is single, i.e. does not require\n an endtag. The result depends on the name of the tag: there is a list of\n the single tags and if the name matches any item from the list, the tag is\n considered as single.\n\nHTMLTag.raw_html\n\n A property that returns an HTML code of the tag. It consists of the\n start tag, inner HTML code and the end tag.\n\nHTMLTag.__str__()\n\n Returns a str version of the tag. It's implicitly called in the str\n context. It consists of the str versions of all contained elements.\n For instance, it returns \"Hello, world!\" for the tag\n

Hello, world!

\n\nHTMLTag.start_tag\n\n A property that returns an HTML code of the start tag. It consists of the\n name and attributes of the tag in the angle brackets:\n \n\nHTMLTag.end_tag\n\n A property that returns an HTML code the end tag. It consists of the name\n with the slash on the front in the angle brackets: \n\nHTMLTag.inner_html\n\n A property that returns HTML codes of all contained elements.\n\nHTMLTag.get_attributes()\n\n Returns a dictionary of attributes.\n\nHTMLTag.get_attr(name):\n\n Returns a value of the attribute with specified name of None if such\n attribute does not exist.\n\n :name: a name of the attribute, type str\n\nHTMLTag.check_attr(name, value)\n\n Returns True if the value of the attribute with specified name matches\n the specified value. Otherwise returns False.\n\n Note that the \"class\" attribute in the HTML could be defined as a list of\n several CSS classes separated by the white space characters. The method\n returns True if specified value of the \"class\" attribute matches one of\n those classes in theHTML. For instance, if the tag is defined as\n
, then check_attr(\"class\", \"foo\") returns True and\n check_attr(\"class\", \"bar\") returns True as well.\n\n :name: a name of the attribute, tupe str\n :value: a value of the atribute, type str\n\nHTMLTag.check_attrs(query)\n\n Returns True if all attributes of the tag and its values match specified\n query. Otherwise returns False. The query is a string of pairs attr=value\n separated by semicolons and any number of spaces. For instance, \n \"attr1=value1; attr2=value2\"\n\n :query: a query string, type str\n\nHTMLTag.append(element)\n\n Adds an element to the end of the list.\n\n :element: an element to append, allowed types:\n easyhtml.dom.HTMLTextNode\n easyhtml.dom.HTMLTag\n easyhtml.dom.HTMLComment\n\nHTMLTags.tags\n\n A property that returns a generator object that generates all contained\n tags of the tag. \n\nHTMLTag.get_all_tags()\n\n Returns a generator object that generates all contained tags of the tag\n including all their nested tags recursively.\n\nHTMLTag.get_tags_by_name(name)\n\n Returns an easyhtml.dom.HTMLCollection object contains tags with specified\n name including nested tags. The same functionality is provided by getting\n the tag's name as an attribute: tag.get_tags_by_name(\"name\") = tag.name\n\n :name: a name of search tags, type str\n\nHTMLTag.get_children(query)\n\n Returns an easyhtml.dom.HTMLCollection object that contains tags with\n specified in the query attributes. The query is a string of pairs\n attr=value separated by semicolons and any number of spaces. For instance,\n \"attr1=value1; attr2=value2\"\n\n :query: a query string, type str\n\nHTMLTag.get_element_by_id(e_id)\n\n Returns a tag with specified id. If such tag does not exist returns None.\n\n :e_id: an ID of the tag, type str\n\nHTMLTag.filter_tags_by_attrs(query)\n\n Returns the tag itself if it matches the query, otherwise returns None.\n The query is a string of pairs attr=value separated by semicolons and any\n number of spaces. For instance, \"attr1=value1; attr2=value2\"\n\n :query: a query string, type str\n\n\nclass easyhtml.dom.HTMLDocument()\n\n A root document object. Contains all HTML elements and provides an API to\n access them.\n\nHTMLDocument Methods:\n\nHTMLDocument.raw_html\n\n A property that returns an HTML code of the document.\n\nHTMLDocument.__str__()\n\n Returns a str version of the document. It's implicitly called in the str\n context. It consists of the str versions of all contained elements.\n\nHTMLDocument.doctype\n\n A property that contains the DoctypeDeclaration object of the document.\n The property allows to write a new doctype to it.\n\nHTMLDocument.inner_html\n\n A property that returns HTML codes of all contained elements.\n\nHTMLDocument.append(element)\n\n Adds an element to the end of the list.\n\n :element: an element to append, allowed types:\n easyhtml.dom.HTMLTextNode\n easyhtml.dom.HTMLTag\n easyhtml.dom.HTMLComment\n\nHTMLDocument.tags\n\n A property that returns a generator object that generates all contained\n tags of the document. \n\nHTMLDocument.get_all_tags()\n\n Returns a generator object that generates all contained tags of the\n document including all their nested tags recursively.\n\nHTMLDocument.get_tags_by_name(name)\n\n Returns an easyhtml.dom.HTMLCollection object contains tags with specified\n name including nested tags. The same functionality is provided by getting\n the tag's name as an attribute: doc.get_tags_by_name(\"name\") = doc.name\n\n :name: a name of search tags, type str\n\nHTMLDocument.get_children(query)\n\n Returns an easyhtml.dom.HTMLCollection object that contains tags with\n specified in the query attributes. The query is a string of pairs\n attr=value separated by semicolons and any number of spaces. For instance,\n \"attr1=value1; attr2=value2\"\n\n :query: a query string, type str\n\nHTMLDocument.get_element_by_id(e_id)\n\n Returns a tag with specified id. If such tag does not exist returns None.\n\n :e_id: an ID of the tag, type str\n\n\nclass easyhtml.dom.HTMLCollection(items)\n\n A result object returned by get_* methods. Collection is an object that\n contains found tags or collections of tags.\n\n :items: an iterable object contains a content of the collection\n\n The collection object is iterable and provides the __getitem__ method to\n access its items using [] operator. Also it provides the __len__ method\n so len(collection) would return actual count of elements in the\n collection.\n\nHTMLCollection Methods:\n\nHTMLCollection.get_element(index)\n\n Returns an element with specified index. If such element does not exist\n None would be returned. The same functionality is provided by [] operator.\n\n :index: an index of the element, type int\n\nHTMLCollection.get_tags_by_name(name)\n\n Returns an easyhtml.dom.HTMLCollection object contains collections with\n the results of such search request to each contained element. The same\n functionality is provided by getting the tag's name as an attribute:\n collection.get_tags_by_name(\"name\") = collection.name\n\n :name: a name of search tags, type str\n\nHTMLCollection.get_children(query)\n\n Returns an easyhtml.dom.HTMLCollection object that contains collections\n with the results of such search request to each contained element. The\n query is a string of pairs attr=value separated by semicolons and any\n number of spaces. For instance, \"attr1=value1; attr2=value2\"\n\n :query: a query string, type str\n\nHTMLCollection.filter_tags_by_attrs(query)\n\n Filters contained tags or tags in contained collections leaving those\n match specified query. If there is no such tags the collection would be\n empty. The query is a string of pairs attr=value separated by semicolons\n and any number of spaces. For instance, \"attr1=value1; attr2=value2\"\n\n The same functionality is provided by the __call__ method, so \n collection.filter_tags_by_attrs(\"attr=value\") = collection(\"attr=value\")\n\n Note that since the return value of search methods are the HTMLCollection\n object, you could use just document.div(\"class=foo\") instead of \n document.div.filter_tags_by_attrs(\"class=foo\")\n\n :query: a query string, type str\n\nHTMLCollection.get_element_by_id(e_id)\n\n Returns a tag with specified id. If such tag does not exist returns None.\n\n :e_id: an ID of the tag, type str\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://easyhtml.sourceforge.net/", "keywords": "html dom parser", "license": "GPLv3", "maintainer": "", "maintainer_email": "", "name": "easyhtml", "package_url": "https://pypi.org/project/easyhtml/", "platform": "", "project_url": "https://pypi.org/project/easyhtml/", "project_urls": { "Homepage": "http://easyhtml.sourceforge.net/" }, "release_url": "https://pypi.org/project/easyhtml/1.2.0/", "requires_dist": null, "requires_python": "", "summary": "A package that provides an API to create a DOM of HTML documents and access to its elements", "version": "1.2.0" }, "last_serial": 2947784, "releases": { "1.2.0": [ { "comment_text": "", "digests": { "md5": "de95b683fd1f3d2143254dc63f61a745", "sha256": "03b0d08aa5692a0c5ab99e59450170cb571ab46b383431dbb13d7da2f2dca248" }, "downloads": -1, "filename": "easyhtml-1.2.0.tar.gz", "has_sig": false, "md5_digest": "de95b683fd1f3d2143254dc63f61a745", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16077, "upload_time": "2017-06-13T19:02:48", "url": "https://files.pythonhosted.org/packages/b1/9a/7d6c3d4de920d71ebc2a427964700c210333d24f91778c20bc0e6d675a50/easyhtml-1.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "de95b683fd1f3d2143254dc63f61a745", "sha256": "03b0d08aa5692a0c5ab99e59450170cb571ab46b383431dbb13d7da2f2dca248" }, "downloads": -1, "filename": "easyhtml-1.2.0.tar.gz", "has_sig": false, "md5_digest": "de95b683fd1f3d2143254dc63f61a745", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16077, "upload_time": "2017-06-13T19:02:48", "url": "https://files.pythonhosted.org/packages/b1/9a/7d6c3d4de920d71ebc2a427964700c210333d24f91778c20bc0e6d675a50/easyhtml-1.2.0.tar.gz" } ] }