{ "info": { "author": "Konstantin Lopukhin", "author_email": "kostia.lopuhin@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "============\nHTML to Text\n============\n\n\n.. image:: https://img.shields.io/pypi/v/html-text.svg\n :target: https://pypi.python.org/pypi/html-text\n :alt: PyPI Version\n\n.. image:: https://img.shields.io/travis/TeamHG-Memex/html-text.svg\n :target: https://travis-ci.org/TeamHG-Memex/html-text\n :alt: Build Status\n\n.. image:: http://codecov.io/github/TeamHG-Memex/soft404/coverage.svg?branch=master\n :target: http://codecov.io/github/TeamHG-Memex/html-text?branch=master\n :alt: Code Coverage\n\nExtract text from HTML\n\n* Free software: MIT license\n\nHow is html_text different from ``.xpath('//text()')`` from LXML\nor ``.get_text()`` from Beautiful Soup?\n\n* Text extracted with ``html_text`` does not contain inline styles,\n javascript, comments and other text that is not normally visible to users;\n* ``html_text`` normalizes whitespace, but in a way smarter than\n ``.xpath('normalize-space())``, adding spaces around inline elements\n (which are often used as block elements in html markup), and trying to\n avoid adding extra spaces for punctuation;\n* ``html-text`` can add newlines (e.g. after headers or paragraphs), so\n that the output text looks more like how it is rendered in browsers.\n\nInstall\n-------\n\nInstall with pip::\n\n pip install html-text\n\nThe package depends on lxml, so you might need to install additional\npackages: http://lxml.de/installation.html\n\n\nUsage\n-----\n\nExtract text from HTML::\n\n >>> import html_text\n >>> html_text.extract_text('