{
    "info": {
        "author": "",
        "author_email": "",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "# DEPRECATION: This project was only created for a Proof of Concept purpose. The production ready version of this project received the name of **AWS Data Wrangler** (pip install awswrangler).\n### Please consider move forward to:\n* https://pypi.org/project/awswrangler/\n* https://github.com/awslabs/aws-data-wrangler\n\n\n------------------------------------------------------------------------------------------------------------\n\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\n# PandasGlue\n\n***A Python library for creating lite ETLs with the widely used Pandas library and the power of AWS Glue Catalog.***\n\n\nWith **PandasGLue** you will be able to write/read to/from an AWS Data Lake with one single line of code. With its minimalist nature **PandasGLue** has an interface with only 2 functions:\n\n\n|   function   \t|       From       \t|        To        \t|\n|:------------:\t|:----------------:\t|:----------------:\t|\n| write_glue() \t| Pandas DataFrame \t|  AWS Glue Table  \t|\n|  read_glue() \t|   AWS GlueTable  \t| Pandas DataFrame \t|\n\n\n\nOnce your data is mapped to [AWS Glue Catalog](https://aws.amazon.com/glue/) it will be accessible to many other tools like [AWS Redshift Spectrum](https://aws.amazon.com/redshift/), [AWS Athena](https://aws.amazon.com/athena/), [AWS Glue Jobs](https://aws.amazon.com/glue/), [AWS EMR](https://aws.amazon.com/emr/) ([Spark](https://spark.apache.org/), [Hive](https://hive.apache.org/), [PrestoDB](https://prestodb.github.io)), etc.\n\n[Amazon Glue](https://aws.amazon.com/glue/) is an [AWS](https://aws.amazon.com/) simple, flexible, and cost-effective ETL service and [Pandas](https://pandas.pydata.org/) is a Python library which provides high-performance, easy-to-use data structures and data analysis tools.\n\nThe goal of this package is help data engineers in the usage of cost efficient serverless compute services ([Lambda](https://aws.amazon.com/glue/), [Glue](https://aws.amazon.com/lambda/), [Athena](https://aws.amazon.com/athena/)) in order to provide an easy way to integrate Pandas with  AWS Glue,  allowing load (*appending, overwriting or only overwriting the partitions with data*) the content of a DataFrame (**Write function**) directly in a table (parquet/csv format) in the [Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/populate-data-catalog.html) and also execute Athena queries (**Read function**) returning the result directly in a Pandas DataFrame.\n\n## Use cases\n\nThis package is recommended for ETL purposes which loads and transforms small to medium size datasets without requiring to create Spark jobs, helping reduce infrastructure costs.\n\nIt could be used within [Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/lambda-introduction-function.html), [Glue scripts](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python.html), [EC2](https://aws.amazon.com/ec2/) instances or any other infrastucture resources.\n\n<p align=\"center\">\n  <img src=\"https://github.com/andresmao/test/blob/master/Pandas_glue_workflow2.png\" width=\"700\"  title=\"ETL Workflow\">\n</p>\n\n### Prerequisites\n\n* [Python 2.7, 3.6, 3.7](https://www.python.org/downloads/) \n\n```\npip install pandas\npip install boto3\npip install pyarrow \n```\n\n### Installing the package...\n\n```\npip install pandasglue\n```\n\nOr you can download direct the artifacts for [AWS Lambda Layer](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html) / [AWS Glue Job](https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html) from our [release page](https://github.com/igorborgest/pandasglue/releases). Then you only will need to upload it in your AWS account.\n\n## Usage \n\n**Read method:**\n\n***read_glue()***\n\nTo retrieve the result of an Athena Query in a Pandas DataFrame.\n\nQuick example:\n\n```python\nimport pandas as pd\nimport pandasglue as pg\n\n#Parameters\nsql_query = \"SELECT * FROM table_name LIMIT 20\" \ndb_name = \"DB_NAME\"\ns3_output_bucket = \"s3://bucket-url/\"\n\ndf = pg.read_glue(sql_query,db_name,s3_output_bucket)\n\nprint(df)\n\n```\n***Parameters list:***\n\n* ***query:*** the SQL statement on Athena\n* ***db:*** database name.\n* ***s3_output:*** path of the S3 output folder (optional)\n* ***region:*** id of the AWS region, e.g: us-west-1 (optional)\n* ***key:*** AWS Access key (optional)\n* ***secret:*** AWS secret key (optional)\n* ***profile_name:*** AWS IAM profile (optional)\n\n<p>\n  <hr>\n</p>\n\n\n**Write method:**\n\n***write_glue()***\n\nConvert a given Pandas Dataframe to a Glue Parquet table.\n\nQuick example:\n\n```python\nimport pandas as pd\nimport pandasglue as pg\n\n#Parameters\ndatabase = \"DB_NAME\"\ntable_name = \"TB_NAME\"\ns3_path = \"s3://bucket-url/\"\n\n#Sample DF\nsource_data = {'name': ['Sarah', 'Renata', 'Erika', 'Fernanda', 'Diana'], \n        'city': ['Seattle', 'Sao Paulo', 'Seattle', 'Santiago', 'Lima'],\n         'test_score': [82, 52, 56, 234, 254]}\n\ndf = pd.DataFrame(source_data, columns = ['name', 'city', 'test_score'])\n\n\npg.write_glue(df, database, table_name, s3_path, partition_cols=['city'])\n```\n\n\n***Parameters list:***\n\n* ***df:*** variable containing the Pandas DataFrame\n* ***database:*** database name.\n* ***path:*** Path of the target S3 bucket\n* ***table:*** table name (optional)\n* ***partition_cols:*** list of columns to partition (optional)\n* ***preserve_index:*** boolean, if you want to preserve the index on the table (optional)\n* ***file_format:*** parquet|csv(optional)\n* ***mode:*** append|overwrite|overwrite_partitions(optional)\n* ***region:*** ID of the AWS region (optional)\n* ***key:*** AWS Access key (optional)\n* ***secret:*** AWS secret key (optional)\n* ***profile_name:*** AWS IAM profile (optional)\n\n\n## Built With\n\n* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) - (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2.\n* [PyArrow](https://pypi.org/project/pyarrow/) - Python package to interoperate Arrow with Python allowing to convert text files format to parquet files among other functions.\n\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](https://gist.github.com/) for details on our code of conduct, and the process for submitting pull requests to us.\n\n## Authors\n\n* *Igor Tavares* - [Profile link](https://github.com/igorborgest)\n* *Ricardo Serafim* - [Profile link](https://github.com/rcserafim)\n* *Andres Palacios* - [Profile link](https://github.com/andresmao)\n\nSee also the list of [contributors](https://github.com/your/project/contributors) who participated in this project.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details\n\n\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "",
        "keywords": "",
        "license": "Apache License 2.0",
        "maintainer": "",
        "maintainer_email": "",
        "name": "pandasglue",
        "package_url": "https://pypi.org/project/pandasglue/",
        "platform": "",
        "project_url": "https://pypi.org/project/pandasglue/",
        "project_urls": null,
        "release_url": "https://pypi.org/project/pandasglue/0.0.3/",
        "requires_dist": [
            "pyarrow (==0.12.0)",
            "pandas (==0.24.1)",
            "boto3",
            "s3fs (==0.2.0)"
        ],
        "requires_python": ">=2.7",
        "summary": "Productivity for your AWS Data Lake",
        "version": "0.0.3"
    },
    "last_serial": 5687890,
    "releases": {
        "0.0.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a65f9949bfb425452584740d1a7d553c",
                    "sha256": "3c767b5d45f553019e11de66e4c4acb28efcf90fc045785894ae6ce74e232eec"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.0-py27,py36,py37-none-any.whl",
                "has_sig": false,
                "md5_digest": "a65f9949bfb425452584740d1a7d553c",
                "packagetype": "bdist_wheel",
                "python_version": "py27,py36,py37",
                "requires_python": ">=2.7",
                "size": 11177,
                "upload_time": "2019-02-07T23:54:34",
                "url": "https://files.pythonhosted.org/packages/8e/4a/c3767ee825024904491f4d717eb827b64978046c2e0e4a8246b9bbc4ab76/pandasglue-0.0.0-py27,py36,py37-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "6ae473bd508bc7b9903e26c60806e47a",
                    "sha256": "05567d282ce08589173df131072071b4ada528e51f0b2f3786969ac01a5eb88f"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.0.tar.gz",
                "has_sig": false,
                "md5_digest": "6ae473bd508bc7b9903e26c60806e47a",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=2.7",
                "size": 10262,
                "upload_time": "2019-02-07T23:54:37",
                "url": "https://files.pythonhosted.org/packages/02/87/7bc82e71236ac63968e44ce55e21218e329c05fbb29e2e148014e4789ec4/pandasglue-0.0.0.tar.gz"
            }
        ],
        "0.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "ff6875dc50ee48c3f6c9658569a6fc21",
                    "sha256": "be534cfdc110fca8f642ecf19ab447137b9d7745787aa8f6a1e420e5b1ffd369"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.1-py27,py36,py37-none-any.whl",
                "has_sig": false,
                "md5_digest": "ff6875dc50ee48c3f6c9658569a6fc21",
                "packagetype": "bdist_wheel",
                "python_version": "py27,py36,py37",
                "requires_python": ">=2.7",
                "size": 11346,
                "upload_time": "2019-02-08T14:28:46",
                "url": "https://files.pythonhosted.org/packages/d2/3d/c527f2b854b6752ff8ebbf5e558a6e2214894b7ffef4fc8bf3f40caeb95c/pandasglue-0.0.1-py27,py36,py37-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "f01b0f5abf0f19d3768775a556515e48",
                    "sha256": "9312ef02e6b1432aa5a36aa219de2393378ca8049ba4d1662bc1c547e7b20689"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "f01b0f5abf0f19d3768775a556515e48",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=2.7",
                "size": 10432,
                "upload_time": "2019-02-08T14:28:48",
                "url": "https://files.pythonhosted.org/packages/cf/2f/87506a1e55f661123894d93b978e21de77f55065dae69abf98a647c6a59c/pandasglue-0.0.1.tar.gz"
            }
        ],
        "0.0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "f1e42516575c03b1fdbcf74b9ce5dbce",
                    "sha256": "b4971b4d21726f97e9a8638cd8eb8e6111442bc81095ad63631d789cfdcf0eea"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.2-py27,py36,py37-none-any.whl",
                "has_sig": false,
                "md5_digest": "f1e42516575c03b1fdbcf74b9ce5dbce",
                "packagetype": "bdist_wheel",
                "python_version": "py27,py36,py37",
                "requires_python": ">=2.7",
                "size": 16796,
                "upload_time": "2019-02-12T16:42:34",
                "url": "https://files.pythonhosted.org/packages/bd/fd/767e5ee8bb82d56ac935efa09561329fce47b59e4eb4733da6caf783933b/pandasglue-0.0.2-py27,py36,py37-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "f7a7ecf482b39093e13d7c3a11d558e8",
                    "sha256": "364d1b702b500d0143dcf7c436cb8a580da4461047dfa0b4ddb8f0bf0079d2bb"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.2.tar.gz",
                "has_sig": false,
                "md5_digest": "f7a7ecf482b39093e13d7c3a11d558e8",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=2.7",
                "size": 15671,
                "upload_time": "2019-02-12T16:42:36",
                "url": "https://files.pythonhosted.org/packages/3e/37/616ac6288fd70272cede3dfb62f4e605dd93d6a1ffdf6894dc8c042ec12e/pandasglue-0.0.2.tar.gz"
            }
        ],
        "0.0.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b8b692882e6edbd94bc143d6872d70ff",
                    "sha256": "bfbbfeaa1b1d4e423f85ba44a6d79d40e0e160177d2cd12c08a7a64a8c2b37d0"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.3-py27,py36,py37-none-any.whl",
                "has_sig": false,
                "md5_digest": "b8b692882e6edbd94bc143d6872d70ff",
                "packagetype": "bdist_wheel",
                "python_version": "py27,py36,py37",
                "requires_python": ">=2.7",
                "size": 17275,
                "upload_time": "2019-08-16T14:22:51",
                "url": "https://files.pythonhosted.org/packages/ae/47/ace32b6ec156965d3b70ff0bc8b4137146b04460bc1ae6f769e2e414641f/pandasglue-0.0.3-py27,py36,py37-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "77183eb067a035b81c0062cb173921d8",
                    "sha256": "eeafd88b193e87db8f68a4c9a7a06afa7ef8aac0b20875ec2dc34064d760a668"
                },
                "downloads": -1,
                "filename": "pandasglue-0.0.3.tar.gz",
                "has_sig": false,
                "md5_digest": "77183eb067a035b81c0062cb173921d8",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": ">=2.7",
                "size": 15884,
                "upload_time": "2019-08-16T14:22:53",
                "url": "https://files.pythonhosted.org/packages/1e/8c/41880fa896557d59e09dd1756c09e6e2794e7050e2b00176ab660fd54784/pandasglue-0.0.3.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b8b692882e6edbd94bc143d6872d70ff",
                "sha256": "bfbbfeaa1b1d4e423f85ba44a6d79d40e0e160177d2cd12c08a7a64a8c2b37d0"
            },
            "downloads": -1,
            "filename": "pandasglue-0.0.3-py27,py36,py37-none-any.whl",
            "has_sig": false,
            "md5_digest": "b8b692882e6edbd94bc143d6872d70ff",
            "packagetype": "bdist_wheel",
            "python_version": "py27,py36,py37",
            "requires_python": ">=2.7",
            "size": 17275,
            "upload_time": "2019-08-16T14:22:51",
            "url": "https://files.pythonhosted.org/packages/ae/47/ace32b6ec156965d3b70ff0bc8b4137146b04460bc1ae6f769e2e414641f/pandasglue-0.0.3-py27,py36,py37-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "77183eb067a035b81c0062cb173921d8",
                "sha256": "eeafd88b193e87db8f68a4c9a7a06afa7ef8aac0b20875ec2dc34064d760a668"
            },
            "downloads": -1,
            "filename": "pandasglue-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "77183eb067a035b81c0062cb173921d8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7",
            "size": 15884,
            "upload_time": "2019-08-16T14:22:53",
            "url": "https://files.pythonhosted.org/packages/1e/8c/41880fa896557d59e09dd1756c09e6e2794e7050e2b00176ab660fd54784/pandasglue-0.0.3.tar.gz"
        }
    ]
}