{ "info": { "author": "John Riebold", "author_email": "jmriebold@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Utilities" ], "description": "# BoilerPy3\n\n\n## About\n\nBoilerPy3 is a native Python [port](https://github.com/natural/java2python) of Christian Kohlsch\u00fctter's [Boilerpipe](https://github.com/kohlschutter/boilerpipe) library, released under the Apache 2.0 Licence.\n\nThis package is based on [sammyer's](https://github.com/sammyer) [BoilerPy](https://github.com/sammyer/BoilerPy), specifically [mercuree's](https://github.com/mercuree) [Python3-compatible fork](https://github.com/mercuree/BoilerPy). This fork updates the codebase to be more Pythonic (proper attribute access, docstrings, type-hinting, snake case, etc.) and make use Python 3.6 features (f-strings), in addition to switching testing frameworks from Unittest to PyTest.\n\n**Note**: This package is based on Boilerpipe 1.2 (at or before [this commit](https://github.com/kohlschutter/boilerpipe/tree/b0816590340f4317f500c64565b23beb4fb9a827)), as that's when the code was originally ported to Python. I experimented with updating the code to match Boilerpipe 1.3, however because it performed worse in my tests, I ultimately decided to leave it at 1.2-equivalent.\n\n\n## Installation\n\nTo install the latest version from PyPI, execute:\n\n```shell\npip install boilerpy3\n```\n\nIf you'd like to try out any unreleased features you can install directly from GitHub like so:\n\n```shell\npip install git+https://github.com/jmriebold/BoilerPy\n```\n\n\n## Usage\n\nThe top-level interfaces are the Extractors. Use the `get_content()` methods to extract the filtered text.\n\n```python\nfrom boilerpy3 import extractors\n\nextractor = extractors.ArticleExtractor()\n\n# From a URL\ncontent = extractor.get_content_from_url('http://www.example.com/')\n\n# From a file\ncontent = extractor.get_content_from_file('tests/test.html')\n\n# From raw HTML\ncontent = extractor.get_content('