{ "info": { "author": "Josip Delic", "author_email": "delijati@gmx.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "# Spark-EMR\n\n[![Build Status](https://api.travis-ci.org/delijati/spark-emr.svg?branch=master)](https://travis-ci.org/delijati/spark-emr)\n\nRun an python package on AWS EMR\n\n## Install\n\nDevelop install:\n\n $ pip install -e .\n\nTesting:\n\n $ pip install tox\n $ tox\n\n## Setup\n\nThe easiest way to get EMR up and running is to go through the Web-Interface\nand create a ssh key, and start a cluster by hand. This will then create the\nneeded subnet_key and EMR roles.\n\n## Config yaml file\n\nCreate a ``config.yaml`` per project or as a default into\n`~/.config/spark-emr.yaml`\n\n bootstrap_uri: s3://foo/bar\n master: \n instance_type: m4.large\n size_in_gb: 100\n core: \n instance_type: m4.large\n instance_count: 2\n size_in_gb: 100\n ssh_key: XXXXX\n subnet_id: subnet-XXXXXX\n python_version: python36\n emr_version: emr-5.20.0\n consistent: false\n optimization: false\n region: eu-central-1\n job_flow_role: EMR_EC2_DefaultRole \n service_role: EMR_DefaultRole\n\n## CLI-Interface\n\n### Start\n\nTo run a python code on EMR you need to build a proper python package aka\n`setup.py` with `console_scripts` the script needs to end on `.py` or yarn\nwon't be able to execute it |-(\n\nBootstrap a cluster, install the pypackage, execute the task in cmdline, poll\ncluster until finished, stop cluster:\n\n $ spark-emr start \\\n [--config config.yaml] \\\n --name \"Spark-ETL\" \\\n --bid-master 0.04 \\\n --bid-core 0.04 \\\n --cmdline \"etl.py --input s3://in/in.csv --output s3://out/out.csv\" \\\n --tags foo 2 bar 4 \\\n --poll \\\n --yarn-log \\\n --package \"../\"\n\nRunning with a released pypackage version (pip):\n\n $ spark-emr start \\\n ... \\\n --package pip+etl_pypackage\n\n### Status\n\nReturns the status of a cluster (also terminated ones):\n\n $ spark-emr status --cluster-id j-XXXXX\n\n### List\n\nList all cluster and filter optionally by tag:\n\n $ spark-emr list [--config config.yaml] [--filter somekey somevalue]\n\n### Stop\n\nStop a running cluster:\n\n $ spark-emr stop --cluster-id j-XXXXX\n\n### Spot price check\n\nThis call returns for all regions and configured instances the spot price:\n\n $ spark-emr spot\n\n# Appendix\n\n### Running commands on EMR\n\nThe created command can also be run directly from the master:\n\n $ /usr/bin/spark-submit \\\n --deploy-mode cluster \\\n --master yarn \\\n --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=python35 \\\n --conf spark.executorEnv.PYSPARK_PYTHON=python35 \\\n /usr/local/bin/etl.py --input s3://in/in.csv --output s3://out/out.csv\n\n### Running commands on docker\n\nTo test if our spark is running as expected we can run it locally in docker.\n\n $ git clone https://github.com/delijati/spark-docker\n $ cd spark-docker\n $ docker build . --pull -t spark\n\nNow we can run our spark job locally.\n\n $ docker run --rm -ti -v `pwd`/test/dummy:/app/work spark \\\n bash -c \"cd /app/work && pip3 install -e . && spark_emr_dummy.py 10\"\n\n\n# CHANGES\n\n0.1.2 (2019-03-10)\n------------------\n\n- Add spot price check cli.\n- Add spot BidPrice.\n- Show estimated cost.\n- Filter by tag for list cli.\n\n\n0.1.1 (2019-02-21)\n------------------\n\n- Fixed url in setup.py.\n\n\n0.1.0 (2019-02-21)\n------------------\n\n- Initial release.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://www.github.com/delijati/spark-emr", "keywords": "AWS EMR SPARK PYSPARK", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "spark-emr", "package_url": "https://pypi.org/project/spark-emr/", "platform": "", "project_url": "https://pypi.org/project/spark-emr/", "project_urls": { "Homepage": "https://www.github.com/delijati/spark-emr" }, "release_url": "https://pypi.org/project/spark-emr/0.1.2/", "requires_dist": [ "boto3", "loguru", "pyyaml", "spark-optimizer", "pluggy (>=0.7)" ], "requires_python": "", "summary": "Run python packages on AWS EMR", "version": "0.1.2" }, "last_serial": 4920983, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "9bd16dbf4780d5401475b48c9140e4f7", "sha256": "bcdb17ee396f47706cb37e70b7782704ceb438d8f294bb54c4567e7cc1075d44" }, "downloads": -1, "filename": "spark_emr-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9bd16dbf4780d5401475b48c9140e4f7", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 16382, "upload_time": "2019-02-21T19:58:46", "url": "https://files.pythonhosted.org/packages/34/40/2c1b439eb9e0ac846733701b9d11a6aa0f6bcd795492f9bfbf81d9b2b579/spark_emr-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "464c063469892007c522d0b3ed7b6644", "sha256": "3e8280451974bccf1dee4d0c407d49ae7ed0cfe6948285f0ea81f6a1775197a8" }, "downloads": -1, "filename": "spark_emr-0.1.1.tar.gz", "has_sig": false, "md5_digest": "464c063469892007c522d0b3ed7b6644", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11501, "upload_time": "2019-02-21T19:58:48", "url": "https://files.pythonhosted.org/packages/cd/53/4e912945c464da3dc42babfa9a5e1b9f550a86555b579993a7a83fea32db/spark_emr-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "fcece6b7d068dd960f353e3a27bdb146", "sha256": "b3c3f127ed2df309b89115f8aaef9c2c66b89dc311f8cf00d3323d403d7b8f5d" }, "downloads": -1, "filename": "spark_emr-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fcece6b7d068dd960f353e3a27bdb146", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 18699, "upload_time": "2019-03-10T10:38:43", "url": "https://files.pythonhosted.org/packages/99/ff/672e1bf5d3da274ab38bf816f10dd47635bf6b56a6a037efb1424d1b5dae/spark_emr-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "43904396dfaf247d054903b53f134805", "sha256": "9dc61d3f3b25b311d3e86e294062fd5145a6e4274f8668ef8c3c891de798a208" }, "downloads": -1, "filename": "spark_emr-0.1.2.tar.gz", "has_sig": false, "md5_digest": "43904396dfaf247d054903b53f134805", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13360, "upload_time": "2019-03-10T10:38:44", "url": "https://files.pythonhosted.org/packages/08/a8/795758983d206c610cb69de875a8a56f42f1a069b4a4ec5043f5275c3634/spark_emr-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fcece6b7d068dd960f353e3a27bdb146", "sha256": "b3c3f127ed2df309b89115f8aaef9c2c66b89dc311f8cf00d3323d403d7b8f5d" }, "downloads": -1, "filename": "spark_emr-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fcece6b7d068dd960f353e3a27bdb146", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 18699, "upload_time": "2019-03-10T10:38:43", "url": "https://files.pythonhosted.org/packages/99/ff/672e1bf5d3da274ab38bf816f10dd47635bf6b56a6a037efb1424d1b5dae/spark_emr-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "43904396dfaf247d054903b53f134805", "sha256": "9dc61d3f3b25b311d3e86e294062fd5145a6e4274f8668ef8c3c891de798a208" }, "downloads": -1, "filename": "spark_emr-0.1.2.tar.gz", "has_sig": false, "md5_digest": "43904396dfaf247d054903b53f134805", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13360, "upload_time": "2019-03-10T10:38:44", "url": "https://files.pythonhosted.org/packages/08/a8/795758983d206c610cb69de875a8a56f42f1a069b4a4ec5043f5275c3634/spark_emr-0.1.2.tar.gz" } ] }