{
"info": {
"author": "Poppin",
"author_email": "technology@poppin.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: Information Technology",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3 :: Only"
],
"description": "Mr. Plow\n========\n\nMr. Plow is Poppin's ETL system to persist data from third-party APIs into a\nSnowflake database for later business analysis.\n\nWe use Python to:\n\n#. Call said APIs and store the data in AWS S3 (\"Extract\")\n#. Issue Snowflake commands to import the data from AWS S3 (\"Stage\")\n#. Issue Snowflake commands to transform the new data from its original\n unstructured form to the tabular form used for analysis (\"Transform\")\n#. Issue Snowflake commands to load the new tabular data to our main store,\n eliminating any duplicates (\"Load\")\n\nMr. Plow can be run from the command line. In the future we will add support\nfor running its functions as AWS Lambda functions.\n\nWhy that ridiculous name?\n-------------------------\n\nTwo reasons.\n\nFirst, this is primarily a Snowflake client.\n\nSecond: https://www.youtube.com/watch?v=uYXEt7xOh1M\n\nInstallation\n============\n\nCurrently, we support only Python 3.6. This is the highest current version.\nThere is no reason to suspect 3.7 and up will not work to run Mr. Plow.\n\nLower Python 3.x versions may work, but they are not tested. You are free to\ntry them. If you succeed running Mr. Plow in a 3.3, 3.4, or 3.5 environment,\nplease tell us on the Github issue tracker and we will add testing support.\n\nFirst run ``python3 --version`` to ensure you have the required Python version.\nIt may also be installed as ``python3.6``.\n\nWe recommend you install using a virtual python environment to manage library\ndependencies. `Virtualenv `_ is a great\ntool for this.\n\nSuggested usage:\n\n#. `Install virtualenv `_\n if not already present\n#. Create a virtual environment inside the clone: ``virtualenv -p python3.6 venv``\n\n * This creates a directory ``venv/``, which contains an entire python\n environment.\n\n#. Activate your virtual environment: ``source venv/bin/activate``\n\n * This adds the virtual environment's ``python`` and ``pip`` executables to your\n ``PATH``.\n\n#. Install dependencies: ``pip install mr-plow``\n\nNow you should have the executable ``plow`` in your ``PATH``. Simply run\n``plow`` at the command line to see available commands and an example workflow.\n\nAll commands accept the ``--help`` option and print a helpful usage guide.\n\nConfiguration\n=============\n\nSince we access both Snowflake and AWS S3, credentials are required. These may\nbe provided via a config file. It can be provided to the ``plow`` CLI tool\neither via the ``-c`` option or by exporting the environment variable\n``MR_PLOW_CONFIG``.\n\nThe supported formats are JSON, YAML, and INI. INI files must have the\nappropriate settings in the ``[mr-plow]`` section. Run ``plow generate-config``\nto have a minimal example generated for you to fill in.\n\nMr. Plow fills in configuration it does not find by searching environment\nvariables.\n\nTo see more including a full list of config options, install Mr. Plow and run\n``python -m pydoc plow.config.Config``.\n\nSnowflake\n---------\n\nThose commands that require Snowflake access draw the credentials from the\nexecution environment. Mr. Plow uses the following environment variables to\ncreate a Snowflake connection. If both ``SNOWSQL_FOO`` and ``SNOWFLAKE_FOO``\nare present as environment variables, Mr. Plow uses ``SNOWFLAKE_FOO``.\n\n* ``SNOWFLAKE_ACCOUNT`` or ``SNOWSQL_ACCOUNT``\n\n * visible in the URL you use to log in to Snowflake. For the author and his\n peers, this is \"poppin\".\n\n* ``SNOWFLAKE_USER`` or ``SNOWSQL_USER``\n\n * Same as your login on the Snowflake website.\n\n* ``SNOWFLAKE_PASSWORD`` or ``SNOWSQL_PASSWORD``\n\n * This is the password for that login. With the command line, this is\n optional as it can be provided as a console prompt.\n\n* ``SNOWFLAKE_ROLE`` or ``SNOWSQL_ROLE``\n\n* ``SNOWFLAKE_DATABASE`` or ``SNOWSQL_DATABASE``\n\n* ``SNOWFLAKE_SCHEMA`` or ``SNOWSQL_SCHEMA``\n\n * E.g. \"public\". However please keep in mind that Mr. Plow creates its own\n schemas and never references tables without explicitly providing a schema.\n\n* ``SNOWFLAKE_WAREHOUSE`` or ``SNOWSQL_WAREHOUSE``\n\n.. warning::\n **Caution**: Choose carefully as many of these commands spin up a warehouse\n instance. You may incur charges.\n\nAWS\n----\n\nWe use `boto3 `_ to connect to AWS S3.\nIt has a rich system for specifying credentials, which can be used in its\nentirety by omitting AWS settings from your config file. This is appropriate in\nAWS Lambda, where credentials are taken care of in the background, or if you\nare an `AWS CLI `_ user and have\nexisting configs. See `boto3 documentation\n`_\nfor more detail.\n\nTo specify config directly, you may either give the ``aws_access_key_id`` and\n``aws_secret_access_key`` settings directly, or if you have an existing `AWS\nCLI configuration\n`_, you\nmay simply specify ``aws_profile``.\n\nWhen Snowflake reads your API data from S3, it requires you to provide AWS\ncredentials with the appropriate S3 read permissions. Mr. Plow picks these up\nas distinct config options: ``staging_aws_access_key_id`` and\n``staging_aws_secret_access_key``. As with the direct S3 credentials,\n``staging_aws_profile`` may also be provided.\n\nThird-party API's\n-----------------\n\nWe currently provide integrations for Livechat and Snapfulfil.\n\nThe Livechat integration requires an API login, which is specified as config\noptions ``livechat_login`` and ``livechat_api_key``.\n\nThe Snapfulfil integration requires an API login (config ``snapfulfil_login`` and\n``snapfulfil_password``) as well as an explicit designation of which Snap domain\nyou want to make requests to (config ``snapfulfil_api_domain``). Typically the\nlatter is either ``https://treltestapi.snapfulfil.net`` or\n``https://trelliveapi.snapfulfil.net``.\n\nExtending Mr. Plow\n==================\n\nMr. Plow is designed to be broadly useful and easily extensible. We currently\nprovide integrations for Livechat and Snapfulfil, and more will be added, but\nyou can easily add your own.\n\nYou must create your own implementation of ``plow.op.extract.Extractor`` to\ndefine how to fetch data. In a pinch you can use a RestExtractor; to allow\nfor automatic fetching of subsequent pages, you'll have to subclass it and\nimplement ``postprocess_response()``. See documentation of\n``plow.op.Extractor``; also see ``plow.vendors.*`` for examples.\n\nYou must furthermore create your own instances of ``plow.queries.Table`` to\nspecify how to translate data from the documents you fetch using the Extractor\ninto Snowflake DB tables. See documentation of ``plow.queries.Table``, and see\n``plow.queries.livechat`` and ``plow.queries.snapfulfil`` for tested examples.\n\nFinally, you must create a ``plow.cli.Source`` pointing to all of these.\n\nIf you write your own adapter, we'd love to include it. Please send a pull\nrequest!\n\nExample\n-------\n\nIf you have the following files in your Python project and ``mr-plow`` installed\nwith ``pip``::\n\n # mymodule/plow/extract.py\n from plow.op.extract import RestExtractor\n class Extractor(RestExtractor):\n ...\n\n::\n\n # mymodule/plow/tables.py\n from plow.queries import Table\n class Table1(Table):\n select = \"...\"\n # etc...\n\n class Table2(Table):\n ...\n\n class Table3(Table):\n ...\n\n::\n\n # mymodule/plow/cli.py\n from plow.cli import Source\n from mymodule.plow.extract import Extractor\n from mymodule.plow.tables import Table1, Table2, Table3\n\n extractor = Extractor()\n tables = {t.name: t for t in (Table1(), Table2(), Table3())}\n source = Source(extractor=extractor, tables=tables)\n\n\nThen you can invoke the ``plow`` CLI tool, using the ``--source`` option to point\nto your code::\n\n $ plow -c mr-plow.ini extract --source mymodule.plow.cli:source [options]...\n\n\nDevelopment\n===========\n\nDeveloper installation\n----------------------\n\nTo develop on Mr. Plow, clone this repository, set up and activate a virtualenv\n(see Installation) in the new working copy, and run ``pip install -e .[dev]``.\nThis installs the ``plow`` executable as well as development dependencies like\nFlake8 (the linter we use) and pytest.\n\nTesting\n-------\n\nMr. Plow is tested using `pytest `_. If you\nclone the source and install using ``pip install -e .[dev]``, it is installed\nautomatically along with several other test dependencies. Run ``pytest`` to run\nthe unit tests; add ``--cov-report=term-missing`` or ``--cov-report=html`` to\nsee detailed coverage information.\n\nTesting is separated into two section, unit tests and integration tests.\nIntegration tests are disabled by default: specify ``pytest --integration`` to\nrun integration tests as well, or ``pytest --no-unit`` to disable unit tests and\nrun only integration tests.\n\nThe unit tests use mocking for all external functionality, including Snowflake,\nS3, and third-party API's, and so may be run without an internet connection or\nany of the service-specific configuration specified above. However, at this\ntime, with very few exceptions these tests do not verify any specific SQL\nqueries, nor almost any vendor-specific logic.\n\nThe integration tests run all Snowflake setup operations and a full run of ETL\noperations through S3 and Snowflake, so you must do some setup in order to run\nthem. We mock access to the third party API, so that we can simulate the\nprocessing of a constant dataset and verify the result with precision. To run\nintegration tests, you must supply a configuration file by exporting its path\nto ``PLOW_TEST_CONFIG``.\n\n.. warning::\n **Caution**: Since these run real Snowflake operations, you may incur charges by running\n these tests.\n\nGit hooks\n---------\n\nThis project adheres to several standards including a style guide and unit\ntests. To aid developers in complying, we include hooks that can be run upon\na commit. Install them as follows:\n\n* Include whatever hooks you wish in your own .git/hooks/pre-commit::\n\n $ echo '#!/usr/bin/env bash' >> .git/hooks/pre-commit\n $ echo 'flake8-hook.py || exit $?' >> .git/hooks/pre-commit\n $ echo 'unittest-hook.sh || exit $?' >> .git/hooks/pre-commit\n $ chmod u+x .git/hooks/pre-commit\n\n* Add the appropriate options to your git config::\n\n $ git config flake8.strict true\n $ git config plow.unit.strict true\n\nNow, the linter and the unit tests will run every time you commit and you will\nbe prompted to fix any deficiencies before committing. These checks can be\ndisabled temporarily using environment variables. To avoid a linting check::\n\n $ git commit\n Style errors found!\n $ FLAKE8_STRICT=false git commit\n [...] Success!\n\nTo skip unit tests::\n\n $ PLOW_UNIT_STRICT=false git commit\n\n\n",
"description_content_type": null,
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/Poppin-Tech/mr-plow",
"keywords": "Snowflake,Database,Amazon Web Services,AWS,S3,API,ETL",
"license": "Apache",
"maintainer": "",
"maintainer_email": "",
"name": "mr-plow",
"package_url": "https://pypi.org/project/mr-plow/",
"platform": "",
"project_url": "https://pypi.org/project/mr-plow/",
"project_urls": {
"Homepage": "https://github.com/Poppin-Tech/mr-plow"
},
"release_url": "https://pypi.org/project/mr-plow/0.1.5/",
"requires_dist": [
"boto3 (>=1.4.6)",
"click (>=6.7)",
"ijson (>=2.3)",
"requests (>=2.18.4)",
"snowflake-connector-python (>=1.4.10)",
"pyyaml (>=3.12)",
"pytest; extra == 'dev'",
"pytest-cov; extra == 'dev'",
"pytest-mock; extra == 'dev'",
"flake8; extra == 'dev'",
"pytest-profiling; extra == 'dev'"
],
"requires_python": "~=3.6",
"summary": "ETL from third-party APIs into Snowflake",
"version": "0.1.5"
},
"last_serial": 3417715,
"releases": {
"0.1.5": [
{
"comment_text": "",
"digests": {
"md5": "bc3752956e66e81d921849ca5828c41e",
"sha256": "aff26362bc6ded91ec2488fa023b691a4f35410130924aad1a692ad4c94ca862"
},
"downloads": -1,
"filename": "mr_plow-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bc3752956e66e81d921849ca5828c41e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.6",
"size": 43476,
"upload_time": "2017-12-14T19:40:14",
"url": "https://files.pythonhosted.org/packages/c6/3e/3353ada63955c7210da796e5b035a86f02a0001ec0bc15f7814f6ac71513/mr_plow-0.1.5-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "3cc613eeb2b9e263887889b2eaa92be5",
"sha256": "fd44e6a7caad9549e6ad5a583f02c00a3b2f6ada461ad54155db70a65172b106"
},
"downloads": -1,
"filename": "mr-plow-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "3cc613eeb2b9e263887889b2eaa92be5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.6",
"size": 35823,
"upload_time": "2017-12-14T19:40:15",
"url": "https://files.pythonhosted.org/packages/39/b7/8663717a9f7b762cd0a4b01a220083fbb3d276909515cc8c36d030c2634f/mr-plow-0.1.5.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "bc3752956e66e81d921849ca5828c41e",
"sha256": "aff26362bc6ded91ec2488fa023b691a4f35410130924aad1a692ad4c94ca862"
},
"downloads": -1,
"filename": "mr_plow-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bc3752956e66e81d921849ca5828c41e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.6",
"size": 43476,
"upload_time": "2017-12-14T19:40:14",
"url": "https://files.pythonhosted.org/packages/c6/3e/3353ada63955c7210da796e5b035a86f02a0001ec0bc15f7814f6ac71513/mr_plow-0.1.5-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "3cc613eeb2b9e263887889b2eaa92be5",
"sha256": "fd44e6a7caad9549e6ad5a583f02c00a3b2f6ada461ad54155db70a65172b106"
},
"downloads": -1,
"filename": "mr-plow-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "3cc613eeb2b9e263887889b2eaa92be5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.6",
"size": 35823,
"upload_time": "2017-12-14T19:40:15",
"url": "https://files.pythonhosted.org/packages/39/b7/8663717a9f7b762cd0a4b01a220083fbb3d276909515cc8c36d030c2634f/mr-plow-0.1.5.tar.gz"
}
]
}