{ "info": { "author": "Rakesh Varma", "author_email": "varma.rakesh@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Programming Language :: Python :: 2.7" ], "description": "Create Enterprise grade Hadoop cluster in AWS.\n===============================\n\nauthor: Rakesh Varma\n\nOverview\n--------\n\nCreate enterprise grade hadoop cluster in AWS in minutes.\n\nInstallation / Usage\n--------------------\n\nMake sure [terraform](https://www.terraform.io/intro/getting-started/install.html) is installed. It is required to run this solution.\n\nMake sure AWS credentials exists in your local `~/.aws/credentials` file. \nIf you are using an `AWS_PROFILE` called `test` then your `credentials` file should like looks this:\n\n```sh\n[test]\naws_access_key_id = SOMEAWSACCESSKEYID\naws_secret_access_key = SOMEAWSSECRETACCESSKEY\n```\n\nCreate a `config.ini` with the appropriate settings.\n\n```sh\n[default]\n\n# AWS settings\naws_region = us-east-1\naws_profile = test\nterraform_s3_bucket = hadoop-terraform-state\nssh_private_key = key.pem\nvpc_id = vpc-883883883\nvpc_subnets = [\n 'subnet-89dad652',\n 'subnet-7887z892',\n 'subnet-f300b8z8'\n ]\nhadoop_namenode_instance_type = t2.micro\nhadoop_secondarynamenode_instance_type = t2.micro\nhadoop_datanodes_instance_type = t2.micro\nhadoop_datanodes_count = 2\n\n# Hadoop settings\nhadoop_replication_factor = 2\n```\n\nOnce `config.ini` file is ready then install the libs and run. It is recommended to use a virtualenv.\n\n```\n pip install aws-hadoop\n```\nRun this in python to create a hadoop cluster.\n```\nfrom aws_hadoop.install import Install\nInstall().create()\n```\n\nFor running the source directly,\n\n```sh\npip install -r requirements.txt\n```\n```sh\nfrom aws_hadoop.install import Install\nInstall().create()\n```\n\n### Configuration Settings\n\nThis section describes each of the settings that go into the config file. Note some of the settings are optional.\n\n###### aws_region\n\nThe aws_region where your terraform state bucket and your hadoop resources get created (eg: us-east-1)\n\n##### aws_profile\n\nThe aws_profile that is used in your local `~/.aws/credentials` file.\n\n##### terraform_s3_bucket\n\nThe terraform state information will be maintained in the specified s3 bucket. Make sure the aws_profile has write access to the s3 bucket.\n\n##### ssh_key_pair\n\nFor hadoop provisioning, aws_hadoop needs to connect to hadoop nodes using SSH. The specified `ssh_key_pair` will allow the hadoop ec2's to be created with the public key.\nIf So make sure your machine has the private key in your `~/.ssh/` directory.\n\n##### vpc_id\n\nSpecifiy the vpc id your AWS region in which the terraform resources should be created.\n\n##### vpc_subnets\n\nvpc_subnets is a list item that contains one or more subnet_id's. You can specify as many subnet id's as you want. Hadoop EC2 will get created in multiple subnets.\n\n##### hadoop_namenode_instance_type (optional)\n\nSpecify the instance type of hadoop namenode. It not specified then the default instance type is `t2.micro`\n\n##### hadoop_secondarynamenode_instance_type (optional)\n\nSpecify the instance type of hadoop secondarynamenode. It not specified then the default instance type is t2.micro\n\n##### hadoop_datanodes_instance_type (optional)\n\nSpecify the instance type of hadoop datanodes. It not specified then the default instance type is t2.micro\n\n##### hadoop_datanodes_count (optional)\n\nSpecify the number of hadoop data nodes that should be created. It not specified then the default value is set to 2\n\n##### hadoop_replication_factor (optional)\n\nSpecify the replication factor of hadoop. It not specified then the default value is set to 2.\n\nThe following are ssh settings, used to ssh into the nodes.\n\n##### ssh_user (optional)\nThe ssh user, eg: ubuntu\n\n##### ssh_use_ssh_config (optional)\nSet it to True if you want to use your settings in your `~/.ssh/config`\n\n##### ssh_key_file (optional)\nThis is the key file location. SSH login is done thru a private/public key pair.\n\n##### ssh_proxy (optional)\nUse this setting if you are using a proxy ssh server (such as bastion).\n\nLogging\n------\n\nA log file `hadoop-cluster.log` is created in the local directory.\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/varmarakesh/aws-hadoop/tarball/1.0dev2", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/varmarakesh/aws-hadoop", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "aws-hadoop", "package_url": "https://pypi.org/project/aws-hadoop/", "platform": "", "project_url": "https://pypi.org/project/aws-hadoop/", "project_urls": { "Download": "https://github.com/varmarakesh/aws-hadoop/tarball/1.0dev2", "Homepage": "https://github.com/varmarakesh/aws-hadoop" }, "release_url": "https://pypi.org/project/aws-hadoop/1.0.dev2/", "requires_dist": [ "jinja2", "coverage", "pypi-publisher", "python-terraform", "fabric", "pep8" ], "requires_python": "", "summary": "Create enterprise grade Hadoop cluster in AWS in minutes.", "version": "1.0.dev2" }, "last_serial": 3451020, "releases": { "1.0.dev2": [ { "comment_text": "", "digests": { "md5": "6c91e175984ee2dcdbe5b4312ef92d86", "sha256": "23dbed504b1c4bb63e0b130162c81460d01511398fcb1467b4f8954891f98ac4" }, "downloads": -1, "filename": "aws_hadoop-1.0.dev2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6c91e175984ee2dcdbe5b4312ef92d86", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 18365, "upload_time": "2017-12-30T03:58:15", "url": "https://files.pythonhosted.org/packages/d3/8f/a2d86271843bc72450d3e4d6133e165b116a30d1c2ccdfde1c0622d90766/aws_hadoop-1.0.dev2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a942bd2a21a54541355f1f6819009898", "sha256": "7e5a36d7db768596e787b08a8dee4fb31cdd6e72457e90bc7fa2de330e88ede6" }, "downloads": -1, "filename": "aws-hadoop-1.0.dev2.tar.gz", "has_sig": false, "md5_digest": "a942bd2a21a54541355f1f6819009898", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12351, "upload_time": "2017-12-30T03:58:17", "url": "https://files.pythonhosted.org/packages/3f/d9/acefb459f0f65c6c83ab988beb2df831c0fa82ff98fe2eb85551d8d0e74a/aws-hadoop-1.0.dev2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6c91e175984ee2dcdbe5b4312ef92d86", "sha256": "23dbed504b1c4bb63e0b130162c81460d01511398fcb1467b4f8954891f98ac4" }, "downloads": -1, "filename": "aws_hadoop-1.0.dev2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6c91e175984ee2dcdbe5b4312ef92d86", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 18365, "upload_time": "2017-12-30T03:58:15", "url": "https://files.pythonhosted.org/packages/d3/8f/a2d86271843bc72450d3e4d6133e165b116a30d1c2ccdfde1c0622d90766/aws_hadoop-1.0.dev2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a942bd2a21a54541355f1f6819009898", "sha256": "7e5a36d7db768596e787b08a8dee4fb31cdd6e72457e90bc7fa2de330e88ede6" }, "downloads": -1, "filename": "aws-hadoop-1.0.dev2.tar.gz", "has_sig": false, "md5_digest": "a942bd2a21a54541355f1f6819009898", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12351, "upload_time": "2017-12-30T03:58:17", "url": "https://files.pythonhosted.org/packages/3f/d9/acefb459f0f65c6c83ab988beb2df831c0fa82ff98fe2eb85551d8d0e74a/aws-hadoop-1.0.dev2.tar.gz" } ] }