{ "info": { "author": "David Kuryakin", "author_email": "dkuryakin@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Environment :: X11 Applications :: Qt", "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 2", "Topic :: Internet :: WWW/HTTP :: Browsers", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "===========\nconstractor\n===========\n\nConstractor (derived from \"Content Extractor') allows one to use machine learning for web pages content extraction.\nLibrary provide following functionality:\n\n* Extendable features api.\n* Gui tools for simple train-set creation.\n* Simple training and testing process.\n* Simple usage of trained model.\n* Models dumping.\n* etc\n\nInstallation\n============\n\nNOTE: Project was developed and tested under Ubuntu 12.04. Other operation systems may require enhancements of library.\n\nUbuntu >=12.04 instructions:\n\n* Install apt dependecies: ``sudo apt-get install pip gcc g++ python-dev python-qt4``\n\n* Install constractor: ``sudo pip install constractor``\n\nUsage\n=====\n\nFollowing code will run gui train helper::\n\n #!/usr/bin/env python\n from constractor.train import GuiTrainer\n\n if __name__ == '__main__':\n GuiTrainer()\n\nAnd following code will print html of predicted element in DOM::\n\n #!/usr/bin/env python\n from constractor.parser import Parser\n\n if __name__ == '__main__':\n predicted = Parser('https://pypi.python.org', model_file='model.txt').predicted\n for element in predicted:\n print unicode(element.toInnerXml()).encode('utf-8')\n\nContribution\n============\n\nProject is completely open for contribution. See more on bitbucket repo: https://bitbucket.org/dkuryakin/constractor", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/constractor/", "keywords": "", "license": "LICENSE.txt", "maintainer": "", "maintainer_email": "", "name": "constractor", "package_url": "https://pypi.org/project/constractor/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/constractor/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/constractor/" }, "release_url": "https://pypi.org/project/constractor/0.1.0/", "requires_dist": null, "requires_python": null, "summary": "Smart web page content extractor.", "version": "0.1.0" }, "last_serial": 910366, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "76ad980092fc193de30eebff1975c3ac", "sha256": "fc5ed87c49f9f334ffbe19f773881c919f8843dde9ce1e49a5d2d41b6ccec345" }, "downloads": -1, "filename": "constractor-0.1.0.tar.gz", "has_sig": false, "md5_digest": "76ad980092fc193de30eebff1975c3ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10145, "upload_time": "2013-11-03T23:14:54", "url": "https://files.pythonhosted.org/packages/d6/76/9ca8b9a73de04f0ec53cccff7e648df8186198e2f197bba04692e6052738/constractor-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "76ad980092fc193de30eebff1975c3ac", "sha256": "fc5ed87c49f9f334ffbe19f773881c919f8843dde9ce1e49a5d2d41b6ccec345" }, "downloads": -1, "filename": "constractor-0.1.0.tar.gz", "has_sig": false, "md5_digest": "76ad980092fc193de30eebff1975c3ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10145, "upload_time": "2013-11-03T23:14:54", "url": "https://files.pythonhosted.org/packages/d6/76/9ca8b9a73de04f0ec53cccff7e648df8186198e2f197bba04692e6052738/constractor-0.1.0.tar.gz" } ] }