{
    "info": {
        "author": "Aleksandr Smechov",
        "author_email": "aleks@smechov.com",
        "bugtrack_url": null,
        "classifiers": [
            "Intended Audience :: Developers",
            "Natural Language :: English",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3"
        ],
        "description": "===============================================\nVellichor: a succinct article text extractor \n===============================================\n\n*Vellichor (n): the strange wistfulness of used bookstores*\n\n`Vellichor <http://www.dictionaryofobscuresorrows.com/post/57250260260/vellichor>`_'s aims aren't ambitious. It does its duty relatively well, living a simple package's life, sustaining itself solely on URL or HTML strings. Provide it with these basic comforts and you shall receive a lean, healthy block of article text.    \n\nQuickstart\n==========\n\nDependencies\n------------\n\nDespite its simple purpose, Vellichor has a few dependencies, as it uses a random forest model to classify a candidate HTML node as relevant or not. These will be installed automatically, if you don't already have them: **urlvalidator**, **requests**, **commonregex**, **lxml**, **beautifulsoup4**, **scipy**, **scikit-learn**, **numpy**. *The library was tested with Python 3.6 only*.\n\n\nInstallation\n------------\n\nOf course, `virtualenv`_ would be a nice idea, considering you may want a few of those important dependencies untouched::\n\n    virtualenv test_env --python==python3.6\n\n.. _virtualenv: http://www.virtualenv.org\n\nYou can use ``pip`` to install Vellichor::\n\n    pip install vellichor\n\n\nUsage\n-----\n\nVellichor extracts relevant text from an article URL or HTML string. To begin, import the Extract class::\n\n    from vellichor.extract import Extract\n\nYou can then create an instance of Extract and feed a URL or HTML string to several methods::\n\n    url = \"http://www.example.com/you-wont-believe-these-examples\"\n    html = \"<html><p>Example</p></html>\n\n    extract = Extract()\n\n    # Main method\n    article_text = extract.article_text_from(url)\n    # OR extract.article_text_from(html=html)\n\n    # Extract raw text directly from the retrieved HTML\n    raw_text = extract.raw_text_from(url)\n\n    # Extract the HTML only - URL parameter only\n    html_only = extract.html_from(url)\n\n    # Outputs a Beautiful Soup object from the retrieved HTML\n    soup = extract.soup_from(url)\n\nTo extract text from a sea of article URLs, be sure to instantiate ``Extract`` for every new URL. \n\nNot satisfied with just a clean block of text? Vellichor comes with a few methods for extracting some basic details::\n\n    extract.article_details()\n\n    # outputs a list of author candidates: [\"Dr. Exampleton\"]\n    extract.author \n\n    # outputs the site name: \"Example\"\n    extract.site_name \n\n    # outputs the article title: \"You Won't Believe these Examples!\"\n    extract.article_title\n\nA few things to note. Running the ``article_text_from()`` method on an instance of ``Extract`` automatically gives access to the following class attributes: ``html``, ``article_text``, ``soup``, and ``soup_blocks`` (a collection of candidate nodes, or <p> tags, that were used for deciding the final output text). \n\nSecond, there is a bit of hierarchy built in. Running the ``get_soup_blocks()`` method also gives access to the ``soup`` and ``html`` class methods. Running ``get_soup()`` on your instance also gets you the ``html`` class method. \n\n``raw_text`` is only available when the ``raw_text_from()`` method is called on an instance of Extract (the URL or HTML parameter is required if this will be the first class method you call).\n\nThat's all folks.\n\n...\n\n*I have always imagined that Paradise will be a kind of library.*  \n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/aleksandr-smechov/vellichor.git",
        "keywords": "",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "vellichor",
        "package_url": "https://pypi.org/project/vellichor/",
        "platform": "",
        "project_url": "https://pypi.org/project/vellichor/",
        "project_urls": {
            "Homepage": "https://github.com/aleksandr-smechov/vellichor.git"
        },
        "release_url": "https://pypi.org/project/vellichor/0.0.1/",
        "requires_dist": [
            "urlvalidator",
            "requests",
            "commonregex",
            "beautifulsoup4",
            "scipy",
            "scikit-learn",
            "numpy",
            "lxml"
        ],
        "requires_python": "",
        "summary": "A succinct article text extractor.",
        "version": "0.0.1"
    },
    "last_serial": 4545098,
    "releases": {
        "0.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "95d1f5673bda5525f525a40b4a3dc814",
                    "sha256": "c45a65712f8c0a8bd37bf0b4534008b7692d525d144ec89660008501964587d3"
                },
                "downloads": -1,
                "filename": "vellichor-0.0.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "95d1f5673bda5525f525a40b4a3dc814",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 245671,
                "upload_time": "2018-11-30T01:07:22",
                "url": "https://files.pythonhosted.org/packages/d2/ed/34f90288595e35c08c63c1c0f3790c870943d790ea1b2d01fcda1365b75d/vellichor-0.0.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "d8315e71e252c5198b820cd518062d61",
                    "sha256": "1d64aa6f945c68f15471a4a17b403ca7564c22c2eda29bc90e115b20c3cfee8a"
                },
                "downloads": -1,
                "filename": "vellichor-0.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "d8315e71e252c5198b820cd518062d61",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 234802,
                "upload_time": "2018-11-30T01:07:26",
                "url": "https://files.pythonhosted.org/packages/fb/8a/0fad7aea7ef0e7cb9d32f43223cf73761e26a0f630ecbda378a4bb41cd1e/vellichor-0.0.1.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "95d1f5673bda5525f525a40b4a3dc814",
                "sha256": "c45a65712f8c0a8bd37bf0b4534008b7692d525d144ec89660008501964587d3"
            },
            "downloads": -1,
            "filename": "vellichor-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "95d1f5673bda5525f525a40b4a3dc814",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 245671,
            "upload_time": "2018-11-30T01:07:22",
            "url": "https://files.pythonhosted.org/packages/d2/ed/34f90288595e35c08c63c1c0f3790c870943d790ea1b2d01fcda1365b75d/vellichor-0.0.1-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "d8315e71e252c5198b820cd518062d61",
                "sha256": "1d64aa6f945c68f15471a4a17b403ca7564c22c2eda29bc90e115b20c3cfee8a"
            },
            "downloads": -1,
            "filename": "vellichor-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d8315e71e252c5198b820cd518062d61",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 234802,
            "upload_time": "2018-11-30T01:07:26",
            "url": "https://files.pythonhosted.org/packages/fb/8a/0fad7aea7ef0e7cb9d32f43223cf73761e26a0f630ecbda378a4bb41cd1e/vellichor-0.0.1.tar.gz"
        }
    ]
}