{ "info": { "author": "Mikhail Korobov", "author_email": "kmike84@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "=======================\nscrapinghub-autoextract\n=======================\n\n.. image:: https://img.shields.io/pypi/v/scrapinghub-autoextract.svg\n :target: https://pypi.python.org/pypi/scrapinghub-autoextract\n :alt: PyPI Version\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapinghub-autoextract.svg\n :target: https://pypi.python.org/pypi/scrapinghub-autoextract\n :alt: Supported Python Versions\n\n.. image:: https://travis-ci.org/scrapinghub/scrapinghub-autoextract.svg?branch=master\n :target: https://travis-ci.org/scrapinghub/scrapinghub-autoextract\n :alt: Build Status\n\n.. image:: https://codecov.io/github/scrapinghub/scrapinghub-autoextract/coverage.svg?branch=master\n :target: https://codecov.io/gh/scrapinghub/scrapinghub-autoextract\n :alt: Coverage report\n\n\nPython client libraries for `Scrapinghub AutoExtract API`_.\nIt allows to extract product and article information from any website.\n\nBoth synchronous and asyncio wrappers are provided by this package.\n\nLicense is BSD 3-clause.\n\n.. _Scrapinghub AutoExtract API: https://scrapinghub.com/autoextract\n\n\nInstallation\n============\n\n::\n\n pip install scrapinghub-autoextract\n\nscrapinghub-autoextract requires Python 3.6+ for CLI tool and for\nthe asyncio API; basic, synchronous API works with Python 3.5.\n\nUsage\n=====\n\nFirst, make sure you have an API key. To avoid passing it in ``api_key``\nargument with every call, you can set ``SCRAPINGHUB_AUTOEXTRACT_KEY``\nenvironment variable with the key.\n\nCommand-line interface\n----------------------\n\nThe most basic way to use the client is from a command line.\nFirst, create a file with urls, an URL per line (e.g. ``urls.txt``).\nSecond, set ``SCRAPINGHUB_AUTOEXTRACT_KEY`` env variable with your\nAutoExtract API key (you can also pass API key as ``--api-key`` script\nargument).\n\nThen run a script, to get the results::\n\n python -m autoextract urls.txt --page-type article > res.jl\n\nRun ``python -m autoextract --help`` to get description of all supported\noptions.\n\nSynchronous API\n---------------\n\nSynchronous API provides an easy way to try autoextract in a script.\nFor production usage asyncio API is strongly recommended.\n\nYou can send requests as described in `API docs`_::\n\n from autoextract.sync import request_raw\n query = [{'url': 'http://example.com.foo', 'pageType': 'article'}]\n results = request_raw(query)\n\nNote that if there are several URLs in the query, results can be returned in\narbitrary order.\n\nThere is also a ``autoextract.sync.request_batch`` helper, which accepts URLs\nand page type, and ensures results are in the same order as requested URLs::\n\n from autoextract.sync import request_batch\n urls = ['http://example.com/foo', 'http://example.com/bar']\n results = request_batch(urls, page_type='article')\n\n.. note::\n Currently request_batch is limited to 100 URLs at time only.\n\n.. _API docs: https://doc.scrapinghub.com/autoextract.html\n\n\nasyncio API\n-----------\n\nBasic usage is similar to sync API (``request_raw``),\nbut asyncio event loop is used::\n\n from autoextract.aio import request_raw\n\n async def foo():\n results1 = await request_raw(query)\n # ...\n\nThere is also ``request_parallel`` function, which allows to process\nmany URLs in parallel, using both batching and multiple connections::\n\n import sys\n from autoextract.aio import request_parallel, create_session\n\n async def foo():\n async with create_session() as session:\n res_iter = request_parallel(urls, page_type='article',\n n_conn=10, batch_size=3,\n session=session)\n for f in res_iter:\n try:\n batch_result = await f\n for res in batch_result:\n # do something with a result\n except ApiError as e:\n print(e, file=sys.stderr)\n raise\n\n``request_parallel`` and ``request_raw`` functions handle throttling\n(http 429 errors) and network errors, retrying a request in these cases.\n\nCLI interface implementation (``autoextract/__main__.py``) can serve\nas an usage example.\n\nContributing\n============\n\n* Source code: https://github.com/scrapinghub/scrapinghub-autoextract\n* Issue tracker: https://github.com/scrapinghub/scrapinghub-autoextract/issues\n\nUse tox_ to run tests with different Python versions::\n\n tox\n\nThe command above also runs type checks; we use mypy.\n\n.. _tox: https://tox.readthedocs.io\n\n\nChanges\n=======\n\nTBA\n---\n\nInitial release.\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/scrapinghub/scrapinghub-autoextract", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "scrapinghub-autoextract", "package_url": "https://pypi.org/project/scrapinghub-autoextract/", "platform": "", "project_url": "https://pypi.org/project/scrapinghub-autoextract/", "project_urls": { "Homepage": "https://github.com/scrapinghub/scrapinghub-autoextract" }, "release_url": "https://pypi.org/project/scrapinghub-autoextract/0.1/", "requires_dist": [ "requests", "tenacity ; python_version >= \"3.6\"", "aiohttp (>=3.6.0) ; python_version >= \"3.6\"", "tqdm ; python_version >= \"3.6\"" ], "requires_python": "", "summary": "Python interface to Scrapinghub Automatic Extraction API", "version": "0.1" }, "last_serial": 5951388, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "731b2061c99f5b9ef99cf61bed9d2b19", "sha256": "f0a9e69c49e5f1e3d1cdfa6069c322b1d9fa8d10c59a422295aa34cf74c14672" }, "downloads": -1, "filename": "scrapinghub_autoextract-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "731b2061c99f5b9ef99cf61bed9d2b19", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11964, "upload_time": "2019-10-09T18:16:23", "url": "https://files.pythonhosted.org/packages/30/ef/71ab8223947762163e062a0c79ce5019cce474831e77cf19d1fafd97e2d2/scrapinghub_autoextract-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "18ad64552554031e4bc6b67efc4d3677", "sha256": "672e67b9443aa5ab78345de212b273f92031c95688474b58b0b3fe46ba2d13fa" }, "downloads": -1, "filename": "scrapinghub-autoextract-0.1.tar.gz", "has_sig": false, "md5_digest": "18ad64552554031e4bc6b67efc4d3677", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11042, "upload_time": "2019-10-09T18:16:27", "url": "https://files.pythonhosted.org/packages/81/1c/826a9aa957870fc84f1306ecc3b7d71a9eb4a57b254eb31b5e0813985d1c/scrapinghub-autoextract-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "731b2061c99f5b9ef99cf61bed9d2b19", "sha256": "f0a9e69c49e5f1e3d1cdfa6069c322b1d9fa8d10c59a422295aa34cf74c14672" }, "downloads": -1, "filename": "scrapinghub_autoextract-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "731b2061c99f5b9ef99cf61bed9d2b19", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11964, "upload_time": "2019-10-09T18:16:23", "url": "https://files.pythonhosted.org/packages/30/ef/71ab8223947762163e062a0c79ce5019cce474831e77cf19d1fafd97e2d2/scrapinghub_autoextract-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "18ad64552554031e4bc6b67efc4d3677", "sha256": "672e67b9443aa5ab78345de212b273f92031c95688474b58b0b3fe46ba2d13fa" }, "downloads": -1, "filename": "scrapinghub-autoextract-0.1.tar.gz", "has_sig": false, "md5_digest": "18ad64552554031e4bc6b67efc4d3677", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11042, "upload_time": "2019-10-09T18:16:27", "url": "https://files.pythonhosted.org/packages/81/1c/826a9aa957870fc84f1306ecc3b7d71a9eb4a57b254eb31b5e0813985d1c/scrapinghub-autoextract-0.1.tar.gz" } ] }