{ "info": { "author": "David Gwartney", "author_email": "david.gwartney@import.io", "bugtrack_url": null, "classifiers": [], "description": "\n\nIMPORT.IO GOOGLE SHEETS EXTRACTOR INTEGRATION\n\n\nThis repository includes a script solution that uses URLs taken from a\nGoogle sheet and copies to an existing Import.io Extractor, then lastly\nexecutes the Extractor to collect data from the supplied URLs.\n\nThe command script is implemented using Python, which can be executed on\nany operating system that supports a Python version 3.5 or later.\n\n\nInstallation\n\nThe command requires the installation of Python 3.5 or later with some\nadditional third-party packages that can be installed using pip.\n\n $ pip install importio_gsei\n\n\nConfiguration\n\nBefore the integration can be used you need to setup the Google Sheets\nAPI and Import.io key.\n\nImport.io Authentication\n\nThe Import.io API needs to be made available via an environment variable\nthat can be configured as follows:\n\n $ export IMPORT_IO_API_KEY=\n\nGoogle Sheets API Setup\n\nGenerate Credentials\n\nNOTE: You must have Google Account to use the Google Sheets API, you can\nsign up here.\n\n1. Use this wizard to create or select a project in the Google\n Developers Console and automatically turn on the API. Click\n CONTINUE, then GO TO CREDENTIALS.\n\n2. On the ADD CREDENTIALS TO YOUR PROJECT page, click the\n Cancel button.\n\n3. At the top of the page, select the OAuth consent screen tab.\n\n4. Select an Email address, enter a Product name if not already set,\n and click the Save button.\n\n5. Select the CREDENTIALS tab, click the CREATE CREDENTIALS button and\n select OAUTH CLIENT ID.\n\n6. Select the application type OTHER, enter the name \"Google Sheets\n Extractor Integration\", and click the Create button.\n\n7. Click OK to dismiss the resulting dialog.\n\n8. Click the file_download (Download JSON) button to the right of the\n client ID.\n\n9. Move this file to a known directory on your computer and rename to\n client-secret.json\n\nAuthorize Access\n\n1. Authorize access to the Google Sheets API by running the following\n command:\n\n $ gextractor authorize -f client-secret.json -a 'Google Sheets Extractor Integration'\n\n2. This will launch a browser and display an authorization page\n (see below).\n\n3. Click on the ALLOW button.\n\n4. Close the browser window.\n\n[Google Authorization Page]\n\n\nOperation\n\nThe command has 7 different sub-commands that are invoked similiarly as\nfollows:\n\n $ gsextractor \n\nwhere sub-command is one of the following:\n\n- _authorize_ - Performs the Oauth handshake to allow permit access to\n Google Sheets API\n- _copy-urls_ - Copies URLs from a specified Google sheet to\n designated Extractor.\n- _extractor-start_ - Initiates a crawl-run of an Extractor.\n- _extractor-status_ - Provides a status of the crawl-run(s) of\n an Extractor.\n- _extractor-urls_ - Displays the list of URLs associated with\n an Extractor.\n- _extract_ - Performs the complete extraction operation from copying\n the URLs from the Google sheet to running the Extractor to extract\n data from web pages.\n- _sheet-urls_ - Displays a list of the URLs in a Google sheet given\n the sheet id and range.\n\nHelp can be displayed which shows the available sub-commands as follows:\n\n gsextractor -h\n usage: gsextractor [-h] [-v]\n {authorize,copy-urls,extract,extractor-start,extractor-status,extractor-urls,sheet-urls}\n ...\n\n Import.io Google Sheets Extractor Integration\n\n positional arguments:\n {authorize,copy-urls,extract,extractor-start,extractor-status,extractor-urls,sheet-urls}\n commands\n authorize Configures authentication Google Sheets API\n copy-urls Copies URLs from google sheet to an extractor\n extract Runs the full extraction process\n extractor-start Starts an extractor\n extractor-status Displays the status of recent craw runs\n extractor-urls Displays the URLs from an extractor\n sheet-urls Displays the URLs from a google sheet\n\n optional arguments:\n -h, --help show this help message and exit\n -v, --version show program's version number and exit\n\nDetail help on each of the commands is provided by the -h with the\ncorresponding sub-command:\n\n $ gsextractor -h\n\nThe version of the program can be displayed by running the command with\nthe -v option:\n\n $ gsextractor -v\n 0.4.0\n\nauth\n\nConfigures OAuth\n\n $ gsextractor auth -c \n\ncopy-urls\n\nCopy the URLs in the Google Sheet to the Extractor URLs\n\n $ gsextractor copy-urls -i -r -e \n\nNOTE: _spreadsheet_range_ needs to be surrounded by single quotes to\nprevent a range of the\n\nextractor-start\n\nInitiate a crawl-run on a specific Extractor\n\n $ gsextractor extractor-start -e \n\nextractor-status\n\nDisplay the status of a crawl-run(s) on a specific Extractor\n\n $ gsextractor extractor-status -e \n\nextractor-urls\n\nDisplays the URLs associated with a specific Extractor\n\n $ gsextractor extractor-urls -e \n\nextract\n\nRuns the complete operation of copying URLs from a Google sheet to an\nExtractor, and starting the Extractor\n\n $ gsextractor extract -i -r -e \n\nsheet-urls\n\nDisplays the URLs from a specifice Google sheet and range\n\n $ gsextractor sheeturls -i -r \n\n\nProgrammatic Execution\n\nThe same operations as listed above can performed programmatically from\nPython similar to the following example:\n\n from importio_gsei import GsExtractorUrls\n\n g = GsExtractorUrls()\n g.extractor_start(extractor_id)\n\ncopy_urls()\n\n from importio_gsei import GsExtractorUrls\n\n g = GsExtractorUrls()\n g.copy_urls(spread_sheet_id, spread_sheet_range, extractor_id)\n\nextractor_start()\n\n from importio_gsei GsExtractorUrls\n\n g = GsExtractorUrls()\n g.extractor_start(extractor_id)\n\nextractor_status()\n\n from importio_gsei import GsExtractorUrls\n\n g = GsExtractorUrls()\n crawl_runs = g.extractor_status(extractor_id)\n for crawl_run in crawl_runs:\n print(crawl_run)\n\nextractor_urls()\n\n from importio_gsei import GsExtractorUrls\n\n g = GsExtractorUrls()\n urls = g.extractor_urls(extractor_id)\n for url in urls:\n print(url)\n\nextract\n\n from importio_gsei import GsExtractorUrls\n\n g = GsExtractorUrls()\n g.extract(spread_sheet_id, spread_sheet_range, extractor_id)\n\nsheet_urls\n\n from importio_gsei import GsExtractorUrls\n\n g = GsExtractorUrls()\n urls = g.extract(spread_sheet_id, spread_sheet_range)\n for url in urls:\n print(url)\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.io/import.io/google-sheets-extractor-integration", "keywords": "", "license": "LICENSE", "maintainer": "", "maintainer_email": "", "name": "importio_gsei", "package_url": "https://pypi.org/project/importio_gsei/", "platform": "", "project_url": "https://pypi.org/project/importio_gsei/", "project_urls": { "Homepage": "http://github.io/import.io/google-sheets-extractor-integration" }, "release_url": "https://pypi.org/project/importio_gsei/0.4.0/", "requires_dist": null, "requires_python": "", "summary": "Command line to feed URLs from a Google Sheet into an Import.io Extractor", "version": "0.4.0" }, "last_serial": 3918692, "releases": { "0.3.0": [ { "comment_text": "", "digests": { "md5": "aa6c82674be47aa40d34e89040370ddc", "sha256": "60ccc5716cd62d1ea99359e6d56147b3a4e2ad765db8ed8df1c6f6c1677c1222" }, "downloads": -1, "filename": "importio_gsei-0.3.0.tar.gz", "has_sig": false, "md5_digest": "aa6c82674be47aa40d34e89040370ddc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7483, "upload_time": "2017-02-08T02:11:29", "url": "https://files.pythonhosted.org/packages/57/a7/68472578a024b278a0585a8f32ca01947c014e2ee914434a34397e46a4ea/importio_gsei-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "47ff7a402b4634e11f33a29767da5a4b", "sha256": "b44a40688b17ef0cc24804d9f8e579e68a73f9d2f4517080b13327988ddb8c14" }, "downloads": -1, "filename": "importio_gsei-0.3.1.tar.gz", "has_sig": false, "md5_digest": "47ff7a402b4634e11f33a29767da5a4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7478, "upload_time": "2017-02-08T04:34:29", "url": "https://files.pythonhosted.org/packages/b2/89/f2a23fb4dfc03ef64ff977bb4794bdabf6aa55ec75d551804dca861bbd8a/importio_gsei-0.3.1.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "1e279f59bb16b9dfc5edbe157747280d", "sha256": "fa85117133ee790203088178b1aaae87396d5182211f228c53dc928fcc30a41a" }, "downloads": -1, "filename": "importio_gsei-0.4.0.tar.gz", "has_sig": false, "md5_digest": "1e279f59bb16b9dfc5edbe157747280d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8931, "upload_time": "2017-02-09T07:34:03", "url": "https://files.pythonhosted.org/packages/0d/d2/d4b78245a0666764918aa23607c64a2fca9e202408bb708e8a6bbcb0d959/importio_gsei-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1e279f59bb16b9dfc5edbe157747280d", "sha256": "fa85117133ee790203088178b1aaae87396d5182211f228c53dc928fcc30a41a" }, "downloads": -1, "filename": "importio_gsei-0.4.0.tar.gz", "has_sig": false, "md5_digest": "1e279f59bb16b9dfc5edbe157747280d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8931, "upload_time": "2017-02-09T07:34:03", "url": "https://files.pythonhosted.org/packages/0d/d2/d4b78245a0666764918aa23607c64a2fca9e202408bb708e8a6bbcb0d959/importio_gsei-0.4.0.tar.gz" } ] }