{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [], "description": "# DEPRECATION: This project was only created for a Proof of Concept purpose. The production ready version of this project received the name of **AWS Data Wrangler** (pip install awswrangler).\n### Please consider move forward to:\n* https://pypi.org/project/awswrangler/\n* https://github.com/awslabs/aws-data-wrangler\n\n\n------------------------------------------------------------------------------------------------------------\n\n\n[](https://github.com/ambv/black)\n\n# PandasGlue\n\n***A Python library for creating lite ETLs with the widely used Pandas library and the power of AWS Glue Catalog.***\n\n\nWith **PandasGLue** you will be able to write/read to/from an AWS Data Lake with one single line of code. With its minimalist nature **PandasGLue** has an interface with only 2 functions:\n\n\n| function \t| From \t| To \t|\n|:------------:\t|:----------------:\t|:----------------:\t|\n| write_glue() \t| Pandas DataFrame \t| AWS Glue Table \t|\n| read_glue() \t| AWS GlueTable \t| Pandas DataFrame \t|\n\n\n\nOnce your data is mapped to [AWS Glue Catalog](https://aws.amazon.com/glue/) it will be accessible to many other tools like [AWS Redshift Spectrum](https://aws.amazon.com/redshift/), [AWS Athena](https://aws.amazon.com/athena/), [AWS Glue Jobs](https://aws.amazon.com/glue/), [AWS EMR](https://aws.amazon.com/emr/) ([Spark](https://spark.apache.org/), [Hive](https://hive.apache.org/), [PrestoDB](https://prestodb.github.io)), etc.\n\n[Amazon Glue](https://aws.amazon.com/glue/) is an [AWS](https://aws.amazon.com/) simple, flexible, and cost-effective ETL service and [Pandas](https://pandas.pydata.org/) is a Python library which provides high-performance, easy-to-use data structures and data analysis tools.\n\nThe goal of this package is help data engineers in the usage of cost efficient serverless compute services ([Lambda](https://aws.amazon.com/glue/), [Glue](https://aws.amazon.com/lambda/), [Athena](https://aws.amazon.com/athena/)) in order to provide an easy way to integrate Pandas with AWS Glue, allowing load (*appending, overwriting or only overwriting the partitions with data*) the content of a DataFrame (**Write function**) directly in a table (parquet/csv format) in the [Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/populate-data-catalog.html) and also execute Athena queries (**Read function**) returning the result directly in a Pandas DataFrame.\n\n## Use cases\n\nThis package is recommended for ETL purposes which loads and transforms small to medium size datasets without requiring to create Spark jobs, helping reduce infrastructure costs.\n\nIt could be used within [Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/lambda-introduction-function.html), [Glue scripts](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python.html), [EC2](https://aws.amazon.com/ec2/) instances or any other infrastucture resources.\n\n
\n
\n
\n