{ "info": { "author": "Swarvanu Sengupta (s8sg)", "author_email": "swarvanusg@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Software Development :: Build Tools" ], "description": "Spark Py Submit\n===============\n\nA python library that can submit spark job to spark yarn cluster using\nrest API\n\n| **Note: It Currently supports the CDH(5.6.1) and\n HDP(2.3.2.0-2950,2.4.0.0-169)**\n| The Library is Inspired from:\n ``github.com/bernhard-42/spark-yarn-rest-api``\n\nGetting Started:\n~~~~~~~~~~~~~~~~\n\nUse the library\n^^^^^^^^^^^^^^^\n\n.. code:: python\n\n # Import the SparkJobHandler\n from spark_job_handler import SparkJobHandler\n\n ...\n\n logger = logging.getLogger('TestLocalJobSubmit')\n # Create a spark JOB\n # jobName: name of the Spark Job \n # jar: location of the Jar (local/hdfs) \n # run_class: entry class of the appliaction \n # hadoop_rm: hadoop resource manager host ip \n # hadoop_web_hdfs: hadoop web hdfs ip \n # hadoop_nn: hadoop name node ip (Normally same as of web_hdfs) \n # env_type: env type is CDH or HDP \n # local_jar: flag to define if a jar is local (Local jar gets uploaded to hdfs) \n # spark_properties: custom properties that need to be set \n sparkJob = SparkJobHandler(logger=logger, job_name=\"test_local_job_submit\", \n jar=\"./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar\",\n run_class=\"IrisApp\", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',\n env_type=\"CDH\", local_jar=True, spark_properties=None)\n trackingUrl = sparkJob.run()\n print \"Job Tracking URL: %s\" % trackingUrl\n\n| The above code starts an spark application using the local jar\n (simple-project/target/scala-2.10/simple-project\\_2.10-1.0.jar)\n| For more example see the\n `test\\_spark\\_job\\_handler.py `__\n\nBuild the simple-project\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: bash\n\n $ cd simple-project\n $ sbt package;cd ..\n\nThe above steps will create the target jar as:\n``./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar``\n\nUpdate the nodes Ip in test:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n| Add the node IP for hadoop resource manager and Name node in the\n test\\_cases:\n| \\* rm: Resource Manager \\* nn: Name Node\n\nload the data and make it available to HDFS:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: bash\n\n $ wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\n\nupload data to the HDFS:\n\n.. code:: bash\n\n $ python upload_to_hdfs.py iris.data /tmp/iris.data\n\nRun the test cases:\n^^^^^^^^^^^^^^^^^^^\n\nMake the simple-project jar available in HDFS to test remote jar:\n\n.. code:: bash\n\n $ python upload_to_hdfs.py simple-project/target/scala-2.10/simple-project_2.10-1.0.jar /tmp/test_data/simple-project_2.10-1.0.jar\n\nRun the test:\n\n.. code:: bash\n\n $ python test_spark_job_handler.py \n\nUtility:\n~~~~~~~~\n\n- upload\\_to\\_hdfs.py: upload local file to hdfs file system\n\nNotes:\n~~~~~~\n\n| The Library is still in early stage and need testing, bug-fixing and\n documentation\n| Before running, follow the below steps:\n| \\* Update the ResourceManager,NameNode and WebHDFS Port if required in\n settings.py\n| \\* Make the spark-jar available in hdfs as:\n ``hdfs:/user/spark/share/lib/spark-assembly.jar``\n| For Contribution Please Create Issue corresponding PR\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/s8sg/spark-py-submit", "keywords": "spark yarn submit bigdata hadoop", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "spark-yarn-submit", "package_url": "https://pypi.org/project/spark-yarn-submit/", "platform": "", "project_url": "https://pypi.org/project/spark-yarn-submit/", "project_urls": { "Homepage": "https://github.com/s8sg/spark-py-submit" }, "release_url": "https://pypi.org/project/spark-yarn-submit/1.0.0/", "requires_dist": null, "requires_python": "", "summary": "library to handle spark job submit in a yarn cluster in different environment", "version": "1.0.0" }, "last_serial": 2630297, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "fa51e6c96cfa71c5657a3a521438cd8c", "sha256": "b757b2d7b3a47997dc803609eb40df3ae93026618c83febdf3e66ada3bd15fcd" }, "downloads": -1, "filename": "spark_yarn_submit-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fa51e6c96cfa71c5657a3a521438cd8c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12252, "upload_time": "2017-02-09T09:41:34", "url": "https://files.pythonhosted.org/packages/d2/5d/f8b9747498ebbcf36c82e6d17f0c8e3964e2a5eb238588c077360e54c8f9/spark_yarn_submit-1.0.0-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fa51e6c96cfa71c5657a3a521438cd8c", "sha256": "b757b2d7b3a47997dc803609eb40df3ae93026618c83febdf3e66ada3bd15fcd" }, "downloads": -1, "filename": "spark_yarn_submit-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "fa51e6c96cfa71c5657a3a521438cd8c", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12252, "upload_time": "2017-02-09T09:41:34", "url": "https://files.pythonhosted.org/packages/d2/5d/f8b9747498ebbcf36c82e6d17f0c8e3964e2a5eb238588c077360e54c8f9/spark_yarn_submit-1.0.0-py2.py3-none-any.whl" } ] }