{ "info": { "author": "Kartik Ramasubramanian", "author_email": "r.kartik@berkeley.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "Project Background\n==================\n\npython package that helps data engineers and data scientists accelerate data-pipeline development\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe goal of this python project is to build a bunch of wrappers that can\nbe reused for building data pipelines from - - Relational databases:\npostgres, mysql, greenplum, redshift, etc. - NOSQL databases: hive,\nmongo, etc. - messaging sources and caches: kafka, redis, rabbitmq, etc.\n- cloud service providers: salesforce, mixpanel, jira, google-drive,\ndelighted, wootric, etc.\n\nInstallation\n------------\n\nThere are 3 ways to install dattasa package -\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n1) Easiest way is to install from pypi using pip\n\n::\n\n pip install dattasa\n\n2) Download from github and build from scratch\n\n::\n\n git clone git@github.com:kartikra/dattasa.git\n cd dattasa\n python setup.py build\n python setup.py clean\n python setup.py install\n\n3) Download from github and install using pip\n\n::\n\n git clone git@github.com:kartikra/dattasa.git\n cd dattasa\n pip install -e .\n pip install -U -e . (if upgrading)\n\nConfig Files\n------------\n\nBy default dattasa expects the config files to be in the mode directory\nof user. These can be overridden. See links to sample code in README\nfile below to find out more. There are 2 yaml config files -\ndatabase.yaml - conists of database credentials and api keys needed for\nmaking connection. see `sample database\nconfig `__ - ftpsites.yaml - Needed for\nperforming sftp transfers. see `sample ftpsites\nconfig `__\n\nEnvironment Variables\n---------------------\n\ndattasa package relies on the following environment variables. Make sure\nto set these in your bash profile - GPLOAD_HOME: Path to gpload package\n(needed only if using gpload utilities for greenplum or redshift) -\nPROJECT_HOME: Path to python project directory -\nPROJECT_HOME/python_bash_scripts: python scripts to invoke gpload\n(needed only if using gpload utilities for greenplum or redshift) -\nSQL_DIR: Place to keep all sql scripts - TEMP_DIR: All temp files\ncreated in this folder - LOG_DIR: All log files are created in this\nfolder\n\nDescription of classes\n----------------------\n\nv1.0 of the package comprises of the following classes. Please see link\nto sample code for details on how to use each of them.\n\n+----------------+----------------------------------+------------------+\n| class | Description | Sample Code |\n+================+==================================+==================+\n| environment | Lets you source all the os | `see first row |\n| | environment variables | in mongo |\n| | | example `__ |\n+----------------+----------------------------------+------------------+\n| postgres_clien | Lets you use psql and gpload | `sample postgres |\n| t | utilities provided by `pivotal | code `__ |\n| | unix.html>`__. | |\n| | Make connections to postgres / | |\n| | greenplum database using | |\n| | pyscopg2 or sqlalchemy.Use the | |\n| | connections to interact with | |\n| | database in interactive program | |\n| | or run queries from a sql file | |\n| | using the connection | |\n+----------------+----------------------------------+------------------+\n| greenplum_clie | Lets you use psql and gpload | `sample |\n| nt | utilities provided by `pivotal | greenplum |\n| (inherits | greenplum `__. | ient.ipynb>`__ |\n| | Make connections to postgres / | |\n| | greenplum database using | |\n| | pyscopg2 or sqlalchemy.Use the | |\n| | connections to interact with | |\n| | database in interactive program | |\n| | or run queries from a sql file | |\n| | using the connection | |\n+----------------+----------------------------------+------------------+\n| mysql_client | Lets you use mysql and other | `sample mysql |\n| | methods provided by PyMySQL | code `__ |\n+----------------+----------------------------------+------------------+\n| file_processor | Create sftp connection using | `see file |\n| | `paramiko `__ | example `__ |\n| | encryption, archive (File Class) | |\n+----------------+----------------------------------+------------------+\n| notification | Send email notifications | |\n+----------------+----------------------------------+------------------+\n| mongo_client | Load data to mongodb using bulk | `see mongo |\n| | load. Run java script queries | example `__ |\n+----------------+----------------------------------+------------------+\n| redis_client | Read data from a redis cache or | `see redis |\n| | load a redis cache | example `__ |\n+----------------+----------------------------------+------------------+\n| kafka_system | Currently allows Publisher and | `see kafka |\n| | Consumer to use kafka in batch | example `__ |\n+----------------+----------------------------------+------------------+\n| rabbitmq_syste | Currently has Publisher to | |\n| m | publish messages in rabbitmq | |\n+----------------+----------------------------------+------------------+\n| mixpanel_clien | Connect to mixpanel api and | `see mixpnael |\n| t | fetch data using jql or export | section in api |\n| | raw events data. `mixpanel api | example `__ |\n| | ence>`__ | |\n+----------------+----------------------------------+------------------+\n| salesforce_cli | Create a connection to | `see salesforce |\n| ent | salesforce using | section in api |\n| | `simple_salesforce `__ | les.ipynb>`__ |\n| | package | |\n+----------------+----------------------------------+------------------+\n| delighted_clie | Get nps scores and survey | `see delighted |\n| nt | responses from delighted.\\ `api | section in api |\n| | documentation `__ | tation/api_examp |\n| | | les.ipynb>`__ |\n+----------------+----------------------------------+------------------+\n| wootric_client | Gets nps scores and survey | `see wootric |\n| | responses from wootric.\\ `api | section in api |\n| | documentation `__ | tation/api_examp |\n| | | les.ipynb>`__ |\n+----------------+----------------------------------+------------------+\n| dag_controller | Functions needed to integrate | |\n| | this package within an airflow | |\n| | dag. `airflow | |\n| | documentation `__ | |\n| | and `github | |\n| | project `__ | |\n+----------------+----------------------------------+------------------+\n\ndata_pipeline class\n~~~~~~~~~~~~~~~~~~~\n\nThis is the main class that\u2019s accessible to other projects. The data\npipeline consists of data from components and API. Each object of\ndata-processor can use individual data streams and process them\ndata_pipeline decides which modules to call based on type of database\n(as defined in config file). data_pipeline comprises of 3 classes -\nDataComponent : Each database connection is considered to be\ndata-component object.See examples for postgres, mysql, greenplum, etc\nabove - APICall : Each api call is an apicall object. See examples for\nmixpanel, delighted, salesforce and wootric above - DataProcessor :\ntransfers and loads data between data components. `see\nexamples `__\n\nAdding ipython notebook files to github\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUse git lfs See\n`documentation `__\n\n- if using mac install git-lfs using brew ``brew install git-lfs``\n- install lfs ``git lfs install``\n- track ipynb files in your project. go to the project folder and do\n ``git lfs track \"*.psd\"``\n- add ``.*ipynb_checkpoints/`` to .gitignore file\n- Finally add .gitattributes file ``git add .gitatttributes``\n\nDeploying code in pypi\n~~~~~~~~~~~~~~~~~~~~~~\n\n- build the code:\n ``python setup.py build && python setup.py clean && python setup.py install``\n- push to pypitest : ``python setup.py sdist upload -r pypitest``\n- push to pypi prod : ``python setup.py sdist upload -r pypi``\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/kartikra/dattasa", "keywords": "greenplum postgres mysql mongodb kafka redis rabbitmq salesforce mixpanel delighted wootric data pipeline", "license": "GNU", "maintainer": "", "maintainer_email": "", "name": "dattasa", "package_url": "https://pypi.org/project/dattasa/", "platform": "", "project_url": "https://pypi.org/project/dattasa/", "project_urls": { "Homepage": "https://github.com/kartikra/dattasa" }, "release_url": "https://pypi.org/project/dattasa/1.1/", "requires_dist": null, "requires_python": "", "summary": "Python wrapper for connecting to postgres/greenplum, mysql, mongodb, kafka, redis, mixpanel and salesforce. Also included are modules for performing secure file transfer and sourcing environment variables.", "version": "1.1" }, "last_serial": 3571594, "releases": { "1.1": [ { "comment_text": "", "digests": { "md5": "cea52ebbddd232bda6bb7024660764b0", "sha256": "2ca8d0b5c3156c3d46b249eaefb18024e8da958c452f51497bdbe719a2efb9c9" }, "downloads": -1, "filename": "dattasa-1.1.tar.gz", "has_sig": false, "md5_digest": "cea52ebbddd232bda6bb7024660764b0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29469, "upload_time": "2018-02-11T07:41:32", "url": "https://files.pythonhosted.org/packages/71/82/73224de19e0397d973eec1ab20c3346b95783ffe41e2d8a4c74a88e94970/dattasa-1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cea52ebbddd232bda6bb7024660764b0", "sha256": "2ca8d0b5c3156c3d46b249eaefb18024e8da958c452f51497bdbe719a2efb9c9" }, "downloads": -1, "filename": "dattasa-1.1.tar.gz", "has_sig": false, "md5_digest": "cea52ebbddd232bda6bb7024660764b0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29469, "upload_time": "2018-02-11T07:41:32", "url": "https://files.pythonhosted.org/packages/71/82/73224de19e0397d973eec1ab20c3346b95783ffe41e2d8a4c74a88e94970/dattasa-1.1.tar.gz" } ] }