{ "info": { "author": "TransferWise", "author_email": "", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU Affero General Public License v3", "Programming Language :: Python :: 3 :: Only" ], "description": "# pipelinewise-tap-s3-csv\n\n[![PyPI version](https://badge.fury.io/py/pipelinewise-tap-s3-csv.svg)](https://badge.fury.io/py/pipelinewise-tap-s3-csv)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pipelinewise-tap-s3-csv.svg)](https://pypi.org/project/pipelinewise-tap-s3-csv/)\n[![License: MIT](https://img.shields.io/badge/License-GPLv3-yellow.svg)](https://opensource.org/licenses/GPL-3.0)\n\nThis is a [Singer](https://singer.io) tap that reads data from files located inside a given S3 bucket and produces JSON-formatted data following the [Singer spec](https://github.com/singer-io/getting-started/blob/master/SPEC.md).\n\nThis is a [PipelineWise](https://transferwise.github.io/pipelinewise) compatible tap connector.\n\n## How to use it\n\nThe recommended method of running this tap is to use it from [PipelineWise](https://transferwise.github.io/pipelinewise). When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at [Tap S3 CSV](https://transferwise.github.io/pipelinewise/connectors/taps/s3_csv.html)\n\nIf you want to run this [Singer Tap](https://singer.io) independently please read further.\n\n### Install and Run\n\nFirst, make sure Python 3 is installed on your system or follow these\ninstallation instructions for [Mac](http://docs.python-guide.org/en/latest/starting/install3/osx/) or\n[Ubuntu](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).\n\nIt's recommended to use a virtualenv:\n\n```bash\n python3 -m venv venv\n pip install pipelinewise-tap-s3-csv\n```\n\nor\n\n```bash\n python3 -m venv venv\n . venv/bin/activate\n pip install --upgrade pip\n pip install .\n```\n\n### Configuration\n\nHere is an example of basic config, and a bit of a run down on each of the properties:\n\n ```json\n {\n \"aws_access_key_id\": \"ACCESS_KEY\",\n \"aws_secret_access_key\": \"SECRET_ACCESS_KEY\",\n \"start_date\": \"2000-01-01T00:00:00Z\",\n \"bucket\": \"tradesignals-crawler\",\n \"tables\": [{\n \"search_prefix\": \"feeds\",\n \"search_pattern\": \".csv\",\n \"table_name\": \"my_table\",\n \"key_properties\": [\"id\"],\n \"delimiter\": \",\"\n }]\n }\n ```\n- **aws_access_key_id**: AWS access key ID\n- **aws_secret_access_key**: AWS secret access key\n- **aws_endpoint_url**: (Optional): The complete URL to use for the constructed client. Normally, botocore will automatically construct the appropriate URL to use when communicating with a service. You can specify a complete URL (including the \"http/https\" scheme) to override this behavior. For example https://nyc3.digitaloceanspaces.com\n- **start_date**: This is the datetime that the tap will use to look for newly updated or created files, based on the modified timestamp of the file.\n- **bucket**: The name of the bucket to search for files under.\n- **tables**: JSON object that the tap will use to search for files, and emit records as \"tables\" from those files. \n\nThe `table` field consists of one or more objects, that describe how to find files and emit records. A more detailed (and unescaped) example below:\n\n```\n[\n {\n \"search_prefix\": \"exports\"\n \"search_pattern\": \"my_table\\\\/.*\\\\.csv\",\n \"table_name\": \"my_table\",\n \"key_properties\": [\"id\"],\n \"date_overrides\": [\"created_at\"],\n \"delimiter\": \",\"\n },\n ...\n]\n```\n\n- **search_prefix**: This is a prefix to apply after the bucket, but before the file search pattern, to allow you to find files in \"directories\" below the bucket.\n- **search_pattern**: This is an escaped regular expression that the tap will use to find files in the bucket + prefix. It's a bit strange, since this is an escaped string inside of an escaped string, any backslashes in the RegEx will need to be double-escaped.\n- **table_name**: This value is a string of your choosing, and will be used to name the stream that records are emitted under for files matching content.\n- **key_properties**: These are the \"primary keys\" of the CSV files, to be used by the target for deduplication and primary key definitions downstream in the destination.\n- **date_overrides**: Specifies field names in the files that are supposed to be parsed as a datetime. The tap doesn't attempt to automatically determine if a field is a datetime, so this will make it explicit in the discovered schema.\n- **delimiter**: This allows you to specify a custom delimiter, such as `\\t` or `|`, if that applies to your files.\n\nA sample configuration is available inside [config.sample.json](config.sample.json)\n\n---\n\nBased on Stitch documentation", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/transferwise/pipelinewise-tap-s3-csv", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pipelinewise-tap-s3-csv", "package_url": "https://pypi.org/project/pipelinewise-tap-s3-csv/", "platform": "", "project_url": "https://pypi.org/project/pipelinewise-tap-s3-csv/", "project_urls": { "Homepage": "https://github.com/transferwise/pipelinewise-tap-s3-csv" }, "release_url": "https://pypi.org/project/pipelinewise-tap-s3-csv/1.0.5/", "requires_dist": null, "requires_python": "", "summary": "Singer.io tap for extracting CSV files from S3 - PipelineWise compatible", "version": "1.0.5" }, "last_serial": 5810915, "releases": { "1.0.3": [ { "comment_text": "", "digests": { "md5": "089c78b756e883c34dd7df31f68fbecb", "sha256": "016bd649a689cfb6c67b3b23a02a9530a028091ef30708e642704ec62ae995b4" }, "downloads": -1, "filename": "pipelinewise_tap_s3_csv-1.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "089c78b756e883c34dd7df31f68fbecb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19784, "upload_time": "2019-05-28T22:20:59", "url": "https://files.pythonhosted.org/packages/4e/95/d0d5af993d66d198a5a61e2c240aeba0f16484eb18470cf051447ae92216/pipelinewise_tap_s3_csv-1.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0663dc28f8f910260beb24ccb9be0fb7", "sha256": "a820cc8b798ca8c88dc8bc0f8a3a3b21ac6071b8e752688bfd33a8974a08f162" }, "downloads": -1, "filename": "pipelinewise-tap-s3-csv-1.0.3.tar.gz", "has_sig": false, "md5_digest": "0663dc28f8f910260beb24ccb9be0fb7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8174, "upload_time": "2019-05-28T22:21:01", "url": "https://files.pythonhosted.org/packages/a3/e1/63ec4f46ec59fafea12e2184598dd110e304beddc7ffd2b5c94bcb1ad59e/pipelinewise-tap-s3-csv-1.0.3.tar.gz" } ], "1.0.4": [ { "comment_text": "", "digests": { "md5": "fefc15cabbcf89f08408aeedefb130de", "sha256": "66d7099b92c989a1d832a099438497d1554d5d51e236c9d4bedd546fbcd45a35" }, "downloads": -1, "filename": "pipelinewise-tap-s3-csv-1.0.4.tar.gz", "has_sig": false, "md5_digest": "fefc15cabbcf89f08408aeedefb130de", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8342, "upload_time": "2019-08-16T15:53:28", "url": "https://files.pythonhosted.org/packages/da/72/36af756b27cd49a9486319786a7339fecc6c16074d6a3f49447ef70411a4/pipelinewise-tap-s3-csv-1.0.4.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "41c53750bb03efaf5b244adf833776d3", "sha256": "92967c1aed8a895f9d55baad3c939f4ee0c8fdd30c10907e74bde96b96aa9d47" }, "downloads": -1, "filename": "pipelinewise-tap-s3-csv-1.0.5.tar.gz", "has_sig": false, "md5_digest": "41c53750bb03efaf5b244adf833776d3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8684, "upload_time": "2019-09-10T19:48:27", "url": "https://files.pythonhosted.org/packages/48/4d/6a3431efd75a1b3bd066977f70621ec9de23cb7f4ff97579ea79cf855727/pipelinewise-tap-s3-csv-1.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "41c53750bb03efaf5b244adf833776d3", "sha256": "92967c1aed8a895f9d55baad3c939f4ee0c8fdd30c10907e74bde96b96aa9d47" }, "downloads": -1, "filename": "pipelinewise-tap-s3-csv-1.0.5.tar.gz", "has_sig": false, "md5_digest": "41c53750bb03efaf5b244adf833776d3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8684, "upload_time": "2019-09-10T19:48:27", "url": "https://files.pythonhosted.org/packages/48/4d/6a3431efd75a1b3bd066977f70621ec9de23cb7f4ff97579ea79cf855727/pipelinewise-tap-s3-csv-1.0.5.tar.gz" } ] }