{ "info": { "author": "Nick Buker", "author_email": "nick.buker@nordstrom.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8" ], "description": "# Nordata\n\n## Author:\nNick Buker\n\n## Introduction:\nNordata is a small collection of utility functions for accessing AWS S3 and AWS Redshift. It was written by a data scientist on the Nordstrom Analytics Team. The goal Nordata is to be a simple, robust package to ease data work-flow. It is not intended to handle every possible need (for example credential management is largely left to the user) but it is designed to streamline common tasks.\n\n## Table of contents:\n\n### Installing Nordata:\n- [Installation instructions](#pip-installing-nordata)\n\n### Setting up credentials for Nordata:\n- [Credentials instructions](#nordata-credentials)\n\n### How to use Nordata:\n- [Nordata instructions](#using-nordata)\n\n Redshift:\n\n - [Importing nordata Redshift functions](#redshift-import)\n - [Reading a SQL script into Python as a string](#read-sql)\n - [Executing a SQL query that does not return data](#redshift-execute-sql-no-return)\n - [Executing a SQL query that returns data](#redshift-execute-sql-return)\n - [Executing a SQL query that returns data for pandas](#redshift-execute-sql-return-dict)\n - [Creating a connection object (experienced users)](#redshift-get-conn)\n\n S3:\n\n - [Importing S3 functions](#s3-import)\n - [Downloading a single file from S3](#s3-download-single)\n - [Downloading with a profile name](#s3-download-profile-name)\n - [Downloading a list of files from S3](#s3-download-list)\n - [Downloading files matching a pattern from S3](#s3-download-pattern)\n - [Downloading all files in a directory from S3](#s3-download-all)\n - [Uploading a single file to S3](#s3-upload-single)\n - [Uploading with a profile name](#s3-upload-profile-name)\n - [Uploading a list of files to S3](#s3-upload-list)\n - [Uploading files matching a pattern to S3](#s3-upload-pattern)\n - [Uploading all files in a directory to S3](#s3-upload-all)\n - [Deleting a single file in S3](#s3-delete-single)\n - [Deleting with a profile name](#s3-delete-profile-name)\n - [Deleting a list of files in S3](#s3-delete-list)\n - [Deleting files matching a pattern in S3](#s3-delete-pattern)\n - [Deleting all files in a directory in S3](#s3-delete-all)\n - [Creating a bucket object (experienced users)](#get-bucket)\n\n Boto3 (experienced users):\n\n - [Importing boto3 functions](#boto-import)\n - [Getting boto3 credentials](#boto-creds)\n - [Creating a boto3 session object](#boto-session)\n\n Transferring data between Redshift and S3:\n\n - [Transferring data from Redshift to S3](#redshift-unload)\n - [Transferring data from S3 to Redshift](#redshift-copy)\n\n### Testing:\n\n- [Testing Nordata](#nordata-testing)\n\n\n## Installing Nordata:\nNordata can be install via pip. As always, use of a project-level virtual environment is recommended.\n\n **Nordata requires Python >= 3.6.**\n\n```bash\n$ pip install nordata\n```\n\n\n## Setting up credentials for Nordata:\n\n### Redshift:\nNordata is designed to ingest your Redshift credentials as an environment variable in the below format. This method allows the user freedom to handle credentials a number of ways. As always, best practices are advised. Your credentials should never be placed in the code of your project such as in a `Dockerfile` or `.env` file. Instead, you may wish to place them in your `.bash_profile` locally or take advantage of a key management service such as the one offered by AWS.\n\n```bash\n'host=my_hostname dbname=my_dbname user=my_user password=my_password port=1234'\n```\n\n### S3:\nIf the user is running locally, their `Home` directory should contain a `.aws/` directory with a `credentials` file. The `credentials` file should look similar to the example below where the profile name is in brackets. Note that the specific values and region may vary. If the user is running on an EC2, instance permission to access S3 is handled by the IAM role for the instance.\n\n```bash\n[default]\naws_access_key_id=MYAWSACCESSKEY\naws_secret_access_key=MYAWSSECRETACCESS\naws_session_token=\"long_string_of_random_characters==\"\naws_security_token=\"another_string_of_random_characters==\"\nregion=us-west-2\n```\n\nNote the the profile name in brackets. If the profile name differs in your credentials file, you will likely need to pass this profile name to the S3 functions as an argument.\n\n\n## How to use Nordata:\n\n### Redshift:\n\n\nImporting nordata Redshift functions:\n\n\n```python\nfrom nordata import read_sql, redshift_execute_sql, redshift_get_conn\n```\n\n\nReading a SQL script into Python as a string:\n\n\n```python\nsql = read_sql(sql_filename='../sql/my_script.sql')\n```\n\n\nExecuting a SQL query that does not return data:\n\n\n```python\nredshift_execute_sql(\n sql=sql,\n env_var='REDSHIFT_CREDS',\n return_data=False,\n return_dict=False)\n```\n\n\nExecuting a SQL query that returns data as a list of tuples and column names as a list of strings:\n\n\n```python\ndata, columns = redshift_execute_sql(\n sql=sql,\n env_var='REDSHIFT_CREDS',\n return_data=True,\n return_dict=False)\n```\n\nExecuting a SQL query that returns data as a dict for easy ingestion into a pandas DataFrame:\n\n\n```python\nimport pandas as pd\n\ndf = pd.DataFrame(**redshift_execute_sql(\n sql=sql,\n env_var='REDSHIFT_CREDS',\n return_data=True,\n return_dict=True))\n```\n\n\nCreating a connection object that can be manipulated directly by experienced users:\n\n```python\nconn = redshift_get_conn(env_var='REDSHIFT_CREDS')\n```\n\n### S3:\n\nImporting S3 functions:\n\n```python\nfrom nordata import s3_download, s3_upload, s3_delete, create_session, s3_get_bucket\n```\n\n\nDownloading a single file from S3:\n\n```python\ns3_download(\n bucket='my_bucket',\n s3_filepath='tmp/my_file.csv',\n local_filepath='../data/my_file.csv')\n```\n\n\nDownloading with a profile name:\n\n```python\ns3_download(\n bucket='my_bucket',\n profile_name='my-profile-name',\n s3_filepath='tmp/my_file.csv',\n local_filepath='../data/my_file.csv')\n```\n\n\nDownloading a list of files from S3 (will not upload contents of subdirectories):\n\n```python\ns3_download(\n bucket='my_bucket',\n s3_filepath=['tmp/my_file1.csv', 'tmp/my_file2.csv', 'img.png'],\n local_filepath=['../data/my_file1.csv', '../data/my_file2.csv', '../img.png'])\n```\n\n\nDownloading files matching a pattern from S3 (will not upload contents of subdirectories):\n\n```python\ns3_download(\n bucket='my_bucket',\n s3_filepath='tmp/*.csv',\n local_filepath='../data/')\n```\n\n\nDownloading all files in a directory from S3 (will not upload contents of subdirectories):\n\n```python\ns3_download(\n bucket='my_bucket',\n s3_filepath='tmp/*',\n local_filepath='../data/')\n```\n\n\nUploading a single file to S3:\n\n```python\ns3_upload(\n bucket='my_bucket',\n local_filepath='../data/my_file.csv',\n s3_filepath='tmp/my_file.csv')\n```\n\n\nUploading with a profile name:\n\n```python\ns3_upload(\n bucket='my_bucket',\n profile_name='my-profile-name',\n local_filepath='../data/my_file.csv',\n s3_filepath='tmp/my_file.csv')\n```\n\n\nUploading a list of files to S3 (will not upload contents of subdirectories):\n\n```python\ns3_upload(\n bucket='my_bucket',\n local_filepath=['../data/my_file1.csv', '../data/my_file2.csv', '../img.png'],\n s3_filepath=['tmp/my_file1.csv', 'tmp/my_file2.csv', 'img.png'])\n```\n\n\nUploading files matching a pattern to S3 (will not upload contents of subdirectories):\n\n```python\ns3_upload(\n bucket='my_bucket',\n local_filepath='../data/*.csv',\n s3_filepath='tmp/')\n```\n\n\nUploading all files in a directory to S3 (will not upload contents of subdirectories):\n\n```python\ns3_upload(\n bucket='my_bucket',\n local_filepath='../data/*'\n s3_filepath='tmp/')\n```\n\n\nDeleting a single file in S3:\n\n```python\nresp = s3_delete(bucket='my_bucket', s3_filepath='tmp/my_file.csv')\n```\n\n\nDeleting with a profile name:\n\n```python\ns3_upload(\n bucket='my_bucket',\n profile_name='my-profile-name',\n s3_filepath='tmp/my_file.csv')\n```\n\n\nDeleting a list of files in S3:\n\n```python\nresp = s3_delete(\n bucket='my_bucket',\n s3_filepath=['tmp/my_file1.csv', 'tmp/my_file2.csv', 'img.png'])\n```\n\n\nDeleting files matching a pattern in S3:\n\n```python\nresp = s3_delete(bucket='my_bucket', s3_filepath='tmp/*.csv')\n```\n\n\nDeleting all files in a directory in S3:\n\n```python\nresp = s3_delete(bucket='my_bucket', s3_filepath='tmp/*')\n```\n\n\nCreating a bucket object that can be manipulated directly by experienced users:\n\n```python\nbucket = s3_get_bucket(\n bucket='my_bucket',\n profile_name='default',\n region_name='us-west-2')\n```\n\n### Boto3:\n\nImporting boto3 functions:\n\n```python\nfrom nordata import boto_get_creds, boto_create_session\n```\n\n\nRetrieves Boto3 credentials as a string for use in `COPY` and `UNLOAD` SQL statetments:\n\n```python\ncreds = boto_get_creds(\n profile_name='default',\n region_name='us-west-2',\n session=None)\n```\n\n\nCreating a boto3 session object that can be manipulated directly by experienced users:\n\n```python\nsession = boto_create_session(profile_name='default', region_name='us-west-2')\n```\n\n### Transferring data between Redshift and S3:\n\n\nTransferring data from Redshift to S3 using an `UNLOAD` statement (see [Redshift UNLOAD documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) for more information):\n```python\n\nfrom nordata import boto_get_creds, redshift_execute_sql\n\n\ncreds = boto_get_creds(\n profile_name='default',\n region_name='us-west-2',\n session=None)\n\nsql = f'''\n\n unload (\n 'select\n col1\n ,col2\n from\n my_schema.my_table'\n )\n to\n 's3://mybucket/unload/my_table/'\n credentials\n '{creds}'\n parallel off header gzip allowoverwrite;\n'''\n\nredshift_execute_sql(\n sql=sql,\n env_var='REDSHIFT_CREDS',\n return_data=False,\n return_dict=False)\n```\n\n\nTransferring data from S3 to Redshift using a `COPY` statement (see [Redshift COPY documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) for more information):\n```python\n\nfrom nordata import boto_get_creds, redshift_execute_sql\n\n\ncreds = boto_get_creds(\n profile_name='default',\n region_name='us-west-2',\n session=None)\n\nsql = f'''\n\n copy\n my_schema.my_table\n from\n 's3://mybucket/unload/my_table/'\n credentials\n '{creds}'\n ignoreheader 1 gzip;\n'''\n\nredshift_execute_sql(\n sql=sql,\n env_var='REDSHIFT_CREDS',\n return_data=False,\n return_dict=False)\n```\n\n\n## Testing:\nFor those interested in contributing to Nordata or forking and editing the project, pytest is the testing framework used. To run the tests, create a virtual environment, install the contents of `dev-requirements.txt`, and run the following command from the root directory of the project. The testing scripts can be found in the `test/` directory.\n\n```bash\n$ pytest\n```\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Nordstrom/nordata", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "nordata", "package_url": "https://pypi.org/project/nordata/", "platform": "", "project_url": "https://pypi.org/project/nordata/", "project_urls": { "Homepage": "https://github.com/Nordstrom/nordata" }, "release_url": "https://pypi.org/project/nordata/0.2.2/", "requires_dist": [ "boto3 >=1.9.38", "psycopg2-binary >=2.7.5" ], "requires_python": ">=3.6,<4", "summary": "Convenience wrappers for connecting to AWS S3 and Redshift", "version": "0.2.2" }, "last_serial": 4876738, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "30bc1576d4e7324cd1f6e7d4c2757f8e", "sha256": "55e009af7a9c9b7c256e29ea07092dbd4a85a08a0d5f2cd7b3f700a535df207f" }, "downloads": -1, "filename": "nordata-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "30bc1576d4e7324cd1f6e7d4c2757f8e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4", "size": 32186, "upload_time": "2018-12-11T23:05:13", "url": "https://files.pythonhosted.org/packages/71/5a/ee0660dd5a76abdb9513f5947c069947390029b17f4db16af6052c1cef62/nordata-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2cfd864f26aee4aa3b5f555c56a5d7ca", "sha256": "74341b23dbd5a33a8f219e50208b7c52555ca50992e33e459d2e6637d727d91f" }, "downloads": -1, "filename": "nordata-0.1.tar.gz", "has_sig": false, "md5_digest": "2cfd864f26aee4aa3b5f555c56a5d7ca", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4", "size": 11704, "upload_time": "2018-12-11T23:05:16", "url": "https://files.pythonhosted.org/packages/25/10/011e11fb205450f77115c84b2cf9efe160778e40a2763e2312653647d081/nordata-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "b0a2e7f903f382df0a07466f95a4ebdc", "sha256": "66c4f17db28c4f8afff74c6fa45c0560bb7259053155918461c50d835a2acfc3" }, "downloads": -1, "filename": "nordata-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "b0a2e7f903f382df0a07466f95a4ebdc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4", "size": 33760, "upload_time": "2018-12-14T16:59:06", "url": "https://files.pythonhosted.org/packages/06/c4/7fe38991a4b2fc066c4de6dfbc2fce82cff16a7e9e82acd9e9aebb9edfff/nordata-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "94aa28aa7aec7400144a510f685f4fea", "sha256": "16f21efa27a1aacf161e165dc04e84a3ff47350c020eb9271792a1598c891bee" }, "downloads": -1, "filename": "nordata-0.1.1.tar.gz", "has_sig": false, "md5_digest": "94aa28aa7aec7400144a510f685f4fea", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4", "size": 12152, "upload_time": "2018-12-14T16:59:10", "url": "https://files.pythonhosted.org/packages/c0/65/31140082c24a010a7696b19e189b721f343144e9b849b4fd0e6271815b82/nordata-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "5b56a036337e178f0ee5cab7f62e9067", "sha256": "6ac964003554aeef8c559559d775b7eff7ea87dab95954ea06979f27ca987df2" }, "downloads": -1, "filename": "nordata-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5b56a036337e178f0ee5cab7f62e9067", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4", "size": 35273, "upload_time": "2019-01-02T18:02:37", "url": "https://files.pythonhosted.org/packages/e9/79/8eee2ca7989d7f0f4b7b2a062ac7c68192fd6124a0ff32e292d3d1cd61e8/nordata-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "eeb527ac0e2d90b9045b82f21bb3ec2f", "sha256": "56083859ebc50c3b9f608773f49125e982e2ded9ae14b4ed4298ab750ee94a00" }, "downloads": -1, "filename": "nordata-0.2.0.tar.gz", "has_sig": false, "md5_digest": "eeb527ac0e2d90b9045b82f21bb3ec2f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4", "size": 13437, "upload_time": "2019-01-02T18:02:39", "url": "https://files.pythonhosted.org/packages/30/b8/a8c7fbe6527cbddee9c1b36d7c2cb956e936f573956acb969543d03a7e8c/nordata-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "7ad1b9e982a9dfae49acb00df22d2727", "sha256": "21df2c0ebbd2d2035a966b86614a27aa6b41d88c185c84fd41d5590333ab1bda" }, "downloads": -1, "filename": "nordata-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "7ad1b9e982a9dfae49acb00df22d2727", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4", "size": 35278, "upload_time": "2019-01-02T19:27:59", "url": "https://files.pythonhosted.org/packages/52/18/fb8995a2fff6c24029f7192bcef95492a237a2139e47c17770e2cd93d3ec/nordata-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8322431c5a414811f8091ce7db73a5ad", "sha256": "59af7d3f84d4d16641979f3cd8e8dc115d76d1e348fd2d3c1e38f71520583a0e" }, "downloads": -1, "filename": "nordata-0.2.1.tar.gz", "has_sig": false, "md5_digest": "8322431c5a414811f8091ce7db73a5ad", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4", "size": 13510, "upload_time": "2019-01-02T19:28:01", "url": "https://files.pythonhosted.org/packages/1f/5c/23f5bbe8902e688dcaa34a3eb39fac477a6b358c03e3d80d48633875e4b9/nordata-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "e7488245b42e6ab4388763c740061a2f", "sha256": "d368c502d813993e433f75a25b5a9779a3fc523f21d82879168f46ac3a8635bb" }, "downloads": -1, "filename": "nordata-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e7488245b42e6ab4388763c740061a2f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4", "size": 35302, "upload_time": "2019-02-28T00:10:03", "url": "https://files.pythonhosted.org/packages/66/c6/209899cdc575e47ee8941d25ed44f05e7a5fae9bd05c1d9906a1d3527571/nordata-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ae0987890d0b0b9c870f9a4f3e2b7b75", "sha256": "dc0be194cad73f5c451ea5a026f27d01965a76883f04766a4449096c97a7da22" }, "downloads": -1, "filename": "nordata-0.2.2.tar.gz", "has_sig": false, "md5_digest": "ae0987890d0b0b9c870f9a4f3e2b7b75", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4", "size": 13543, "upload_time": "2019-02-28T00:10:06", "url": "https://files.pythonhosted.org/packages/82/c5/449c11e373eaeb6a4afc99c81a00066556d5f4e6a14cfeaf5dc7eed31213/nordata-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e7488245b42e6ab4388763c740061a2f", "sha256": "d368c502d813993e433f75a25b5a9779a3fc523f21d82879168f46ac3a8635bb" }, "downloads": -1, "filename": "nordata-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e7488245b42e6ab4388763c740061a2f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4", "size": 35302, "upload_time": "2019-02-28T00:10:03", "url": "https://files.pythonhosted.org/packages/66/c6/209899cdc575e47ee8941d25ed44f05e7a5fae9bd05c1d9906a1d3527571/nordata-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ae0987890d0b0b9c870f9a4f3e2b7b75", "sha256": "dc0be194cad73f5c451ea5a026f27d01965a76883f04766a4449096c97a7da22" }, "downloads": -1, "filename": "nordata-0.2.2.tar.gz", "has_sig": false, "md5_digest": "ae0987890d0b0b9c870f9a4f3e2b7b75", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4", "size": 13543, "upload_time": "2019-02-28T00:10:06", "url": "https://files.pythonhosted.org/packages/82/c5/449c11e373eaeb6a4afc99c81a00066556d5f4e6a14cfeaf5dc7eed31213/nordata-0.2.2.tar.gz" } ] }