{ "info": { "author": "Manish Kumar", "author_email": "manish.kumar535@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.0", "Programming Language :: Python :: 3.1", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "redshift_tool - Elegant data load from Pandas to Redshift\n=========================================================\n\n| 1.Overview_\n| 2.Installation_\n| 3.Usages-Guidelines_\n| 4.Examples_\n| 5.Support_\n| 6.References_ \n\n1.Overview\n==========\n**redshift_tool** is a python package which is prepared for loading pandas data frame into redshift table. This package is making it easier for bulk uploads, where the procedure for uploading data consists in generating various CSV files, uploading them to an S3 bucket and then calling a copy command on the server, this package helps with all those tasks in encapsulated functions. There are two methods of data copy.\n\n| a). Append:- It simply copies the data or adds the data at the end of existing data in a redshift table.\n\n| b). Upsert:- It is used for updating the old record as per provided upsert Id/Ids and also copy the new records into the table. \n\nredshift_tool is purely implemented in Python.\n\n2.Installation\n==============\n| **To install the library, use below command**\n| $ pip install redshift_tool\n\n.. note::\n\n During the installation of the package please verify that all the required dependencies installed successfully, if not try to install them one by one manually.\n\n3.Usages-Guidelines\n===================\nUses Commands \n >>> import redshift_tool\n >>> redshift_tool.query(data,method,redshift_auth=None,s3_auth=None,schema=None,table=None,\n primarykey=None,sortkey=None,distkey=None,upsertkey=None)\n\na). data:- It will take any pandas Data frame.\n\n >>> data= df\n\n| b). method: There are two methods of writing pandas data frame as defined above either by 'append' or 'upsert'.\n\n >>> method='append/upsert'\n\n| c). redshift_auth:- To write the data into redshift, it is required to establish the redshift connection. It is the connection's credential parameter. \n\n >>> redshift_auth= {'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}\n\n| d). s3_auth:- AWS S3 is used to enhance the performance of the copy operation. It is used to pass the AWS S3 credentials as well S3 bucket name. S3 Bucket is a place where you can put your files temporary for coping into redshift tables.\n\n >>> s3_auth = {'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}\n\n| e). schema:- If there is any schema name associated with your database in which table will be created.\n\n >>> schema='Schema_name'\n\n| f). table:- Target table name to write pandas.\n\n 1. If target table is already exist, function will be used to copy/usert data into exiting table. \n\n 2. A user can proceed with two steps in case of target table is not exist.\n\n 2.1. Use SQL Create staatement to create taget table manualy.\n\n 2.2. This libaray can also create target table on the basis of input pandas dataframe columns and datatypes so before using the command make sure all the column names and datatypes of pandas dataframe set properly.\n\n >>> table='table_name'\n\n| g). primarykey:- A primary key is a special relational database table column (or combination of columns) designated to uniquely identify all table records. \n\nWhile creating the table by default, if it is required to define any column as primary key, then pass the column name in a tuple in this parameter.\n\n >>> primarykey=('Primary_Key') or \n >>> primarykey=('Primary_Key1','Primary_Key2')\n\n| h). sortkey:- A sortkey is a field or column that is used to sort the data. It can be a single key as well as multiple keys.\n\nWhile creating the table by default, if we need to define any column as a sort key, then pass the column name in a tuple in this parameter.\n\n >>> sortkey=('sort_key') or \n >>> sortkey=('sort_key1','sort_key2') \n\n| i). distkey(Default - Even):- A distribution key is a column that is used to determine the parallel data processing task with all available redshift slices.\n\n >>> distkey=('distribution_key')\n\n| j). upsertkey:- During the upsert method of data loading, we need to pass upsert key by which key old record will get updated & new will be added. It will be also added into a tuple.\n\n >>> upsertkey=('upsertkey') or \n >>> upsertkey=('upsertkey1','upsertkey2')\n\n\n4.Examples\n==========\nAppend or Copy data without primarykey, sortkey, distributionkey\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nEg. \n >>> import redshift_tool\n >>> df= pandas.DataFrame()\n >>> redshift_tool.query(data=df,method='append',\n redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},\n s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},\n schema='shcema_name',table='redshift_table_name')\n\nAppend or Copy data with primarykey, sortkey, distributionkey\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nEg. \n >>> import redshift_tool\n >>> df= pandas.DataFrame()\n >>> redshift_tool.query(data=df,method='append',\n redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},\n s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},\n schema='shcema_name',table='redshift_table_name',primarykey=(''primarykey'),\n sortkey=('sortkey'),distkey=('distributionkey'))\n\n\nUpsert data without primarykey, sortkey, distributionkey\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nEg. \n >>> import redshift_tool\n >>> df= pandas.DataFrame()\n >>> redshift_tool.query(data=df,method='append',\n redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},\n s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},\n schema='shcema_name',table='redshift_table_name',upsertkey=('upsertkey'))\n\nUpsert data with primarykey, sortkey, distributionkey\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nEg. \n >>> import redshift_tool\n >>> df= pandas.DataFrame()\n >>> redshift_tool.query(data=df,method='append',\n redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},\n s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},\n schema='shcema_name',table='redshift_table_name',primarykey=('primarykey'),\n sortkey=('sortkey'),distkey=('distributionkey'),upsertkey=('upsertkey'))\n\n\n5.Support\n==========\n +--------------------+----------------------------------------+\n |**Operating System**|Linux/OSX/Windows |\n +--------------------+----------------------------------------+\n |**Python Version** |2/2.7/3/3.2/3.3/3.4/3.5/3.6/3.7 etc. |\n +--------------------+----------------------------------------+ \n\n\n6.References\n============\n| Many thanks to the developers of dependent packages. Please use the below links to get deeper knowledge about required packages:-\n\n| **PANDAS:** https://pypi.org/project/pandas/\n| **NUMPY:** https://pypi.org/project/numpy/\n| **PSYCOPG2:** https://pypi.org/project/psycopg2/\n| **BOTO3:** https://pypi.org/project/boto3/\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/mkgiitr/redshift_tool", "keywords": "Pandas,DataFrame,S3,Redshift,Append,Copy,Upsert,Incremetal Load", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "redshift-tool", "package_url": "https://pypi.org/project/redshift-tool/", "platform": "", "project_url": "https://pypi.org/project/redshift-tool/", "project_urls": { "Homepage": "http://github.com/mkgiitr/redshift_tool" }, "release_url": "https://pypi.org/project/redshift-tool/0.4/", "requires_dist": [ "psycopg2", "pandas", "numpy", "boto3" ], "requires_python": "", "summary": "Elegant data load from Pandas to Redshift", "version": "0.4" }, "last_serial": 4811471, "releases": { "0.3": [ { "comment_text": "", "digests": { "md5": "5ab6846705bed869e1984a8d774a8ce9", "sha256": "aa3ef87deb2d1e9d3466b233f05973df10f08a23085e840efe862d0b97492147" }, "downloads": -1, "filename": "redshift_tool-0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "5ab6846705bed869e1984a8d774a8ce9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6429, "upload_time": "2019-01-09T05:28:40", "url": "https://files.pythonhosted.org/packages/94/73/ed19ea1b46a548cb6a08054e5db6b2fffcfb598bd39abfeb2b521168a206/redshift_tool-0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d0781efcff1fd744c88d75b05e11c579", "sha256": "0a7b2309defcb431c82238f8d60b9014c2c0b05942b9cbb2f30dd52b1a251097" }, "downloads": -1, "filename": "redshift_tool-0.3.tar.gz", "has_sig": false, "md5_digest": "d0781efcff1fd744c88d75b05e11c579", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5715, "upload_time": "2019-01-09T05:28:42", "url": "https://files.pythonhosted.org/packages/1e/f1/2972f61c0f463daaf169fb884506a73272de596feebba26e90f21fdf9fdc/redshift_tool-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "cb6f4ee8e89783e24ad5615015ab201f", "sha256": "ef70b4d7220f94181ed97800d1ae1369037f9deae110b349c85828de15749e9c" }, "downloads": -1, "filename": "redshift_tool-0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "cb6f4ee8e89783e24ad5615015ab201f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6436, "upload_time": "2019-02-12T15:45:45", "url": "https://files.pythonhosted.org/packages/27/b7/a5001d029e3dfcb3e1270be1627bdf45f4845ed65e92b22718e6da0dc181/redshift_tool-0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ec83b36f1b34d29a3c19a63a2952e022", "sha256": "115d9b4f73a3f0bd979fb6ea8f06a796ed9cbd03c905295c7bc12becdac7fb83" }, "downloads": -1, "filename": "redshift_tool-0.4.tar.gz", "has_sig": false, "md5_digest": "ec83b36f1b34d29a3c19a63a2952e022", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5725, "upload_time": "2019-02-12T15:45:47", "url": "https://files.pythonhosted.org/packages/32/95/e2a68463b8a4e6e0cfef02a18ba832cbc3df55afafe022dea8730f0b7306/redshift_tool-0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cb6f4ee8e89783e24ad5615015ab201f", "sha256": "ef70b4d7220f94181ed97800d1ae1369037f9deae110b349c85828de15749e9c" }, "downloads": -1, "filename": "redshift_tool-0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "cb6f4ee8e89783e24ad5615015ab201f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6436, "upload_time": "2019-02-12T15:45:45", "url": "https://files.pythonhosted.org/packages/27/b7/a5001d029e3dfcb3e1270be1627bdf45f4845ed65e92b22718e6da0dc181/redshift_tool-0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ec83b36f1b34d29a3c19a63a2952e022", "sha256": "115d9b4f73a3f0bd979fb6ea8f06a796ed9cbd03c905295c7bc12becdac7fb83" }, "downloads": -1, "filename": "redshift_tool-0.4.tar.gz", "has_sig": false, "md5_digest": "ec83b36f1b34d29a3c19a63a2952e022", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5725, "upload_time": "2019-02-12T15:45:47", "url": "https://files.pythonhosted.org/packages/32/95/e2a68463b8a4e6e0cfef02a18ba832cbc3df55afafe022dea8730f0b7306/redshift_tool-0.4.tar.gz" } ] }