{ "info": { "author": "mullerhai", "author_email": "hai710459649@foxmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering" ], "description": "DSTL\n====\n\nhttps://github.com/mullerhai/etl_ml\n\nNote: this repo is not supported. License is MIT.\n\n\n..\n\n image:: etl_ml.jpg\n.. image:: https://github.com/mullerhai/etl_ml/blob/master/img/logo.jpeg\n\n.. contents::\n\nInstall [sorry Mircosoft Windows System cannot use it]\n------------\n\n: pip install -U etl_ml[Now Version is 0.0.1]\n\n - python Version >= 3.5\n - sasl>=0.2.1\n - thrift>=0.11.0\n - thrift-sasl>=0.3.0\n - paramiko>=2.4.1\n - selectors>=0.0.14\n\n\n\nUse in Unix System Terminal[centos macos ubuntu]\n------------\n\n: jumps\n - default param\n: parameter:\n - @click.option('-jh', '--jumphost', default=\"***\", help='Jump Gateway Server host \u8df3\u677f\u673assh \u4e3b\u673a\u540d, \u9ed8\u8ba4117.48.195.186')\n - @click.option('-jp', '--jumpport', default=2222, help='Jump Gateway Server port\u8df3\u677f\u673assh\u767b\u5f55\u7aef\u53e3\u53f7, \u9ed8\u8ba42222')\n - @click.option('-ju', '--jumpuser', default='dm', help='Jump Gateway Server login user \u8df3\u677f\u673a ssh\u767b\u5f55\u7528\u6237\u540d')\n - @click.option('-jpd', '--jumppwd', default=\"***\", help='Jump Gateway Server login user password \u8df3\u677f\u673a\u767b\u5f55\u7528\u6237\u5bc6\u7801')\n - @click.option('-th', '--tunnelhost', default='172.16.16.32', help='ssh-tunnel \u96a7\u9053 host ')\n - @click.option('-tp', '--tunnelappport', default=10000, help='ssh-tunnel Application port\u96a7\u9053 \u76ee\u6807\u7a0b\u5e8f\u7684\u7aef\u53e3\u53f7 \u9ed8\u8ba4\u4e3a hive 10000 ')\n - @click.option('-lh', '--localhost', default='127.0.0.1', help='localhost\u672c\u673a host ,\u9ed8\u8ba4127.0.0.1 ')\n - @click.option('-lp', '--localbindport', default=\"4230\", help='localbindport \u672c\u673a \u88ab\u7ed1\u5b9a\u7684\u7aef\u53e3\u53f7')\n - @click.option('-dt', '--daemonsecond', default=\"21600\", help='ssh_tunnel_daemon_session_hold_on_second six hours, ssh \u96a7\u9053 \u540e\u53f0\u7ebf\u7a0b \u4fdd\u6301\u65f6\u95f4 \u9ed8\u8ba4\u4e3a\u516d\u5c0f\u65f6')\n\n.. image:: https://github.com/mullerhai/sshjumphive/blob/master/img/runshell.jpeg\n\n\nUse in Unix System Terminal Run GUI[centos macos ubuntu]\n------------\n: jumpgui\n \u00a0 - you will see the \u00a0GUI like this\n.. image:: https://github.com/mullerhai/sshjumphive/blob/master/img/rungui.jpg\n\n\nIf you Buy the SSH_Tunnel for mac [maybe feel Expensive]\n------------\n\n.. image:: https://github.com/mullerhai/sshjumphive/blob/master/img/SSH_Tunnel_mac.jpg\n\nObject types\n------------\n\nNote that ssh_jump_hive is an tools can jump the jump machine to connect hive get hive data to pandas dataframe:\n\n- 0: hive_client for simple connect hive server with no jump server\n- 1: Jump_Tunnel just for connect hive server with jump server separete\n- 2: SSH_Tunnel for get ssh tunnel channel\n\n\nGeneral approach\n----------------\n\nif you want to use it ,you need to know some things\nfor example these parameters [ jumphost,jumpport,jumpuser,jumppwd,tunnelhost,tunnelAPPport,localhost,localbindport]\nfor hive server you also need to know params [localhost, hiveusername, hivepassword, localbindport,database, auth]\nfor query hive data you need to know params [ table, query_fileds_list, partions_param_dict, query_limit]\n\nif your hive server has jump server separete\uff0c you need do like this\n[\n::\n from ssh_jump_hive import Jump_Tunnel_HIVE\n import pandas as pd\n ## get hive_tunnel_client_session\n def gethive():\n jumphost = '117.*****.176'\n jumpport = 2222\n jumpuser = 'dm'\n jumppwd = '&&&&&&'\n tunnelhost = '172.**.16.32'\n tunnelhiveport = 10000\n localhost = '127.0.0.1'\n localbindport = 4800\n username = 'muller'\n auth = 'LDAP'\n password = \"abc123.\"\n database = 'fkdb'\n table = 'tab_client_label'\n partions_param_dict = {'client_nmbr': 'AA75', 'batch': 'p1'}\n query_fileds_list = ['gid', 'realname', 'card']\n querylimit = 1000\n jump = Jump_Tunnel_HIVE(jumphost, jumpport, jumpuser, jumppwd, tunnelhost, tunnelhiveport, localhost, localbindport,\n username, password)\n return jump\n\n ## query some fileds by table name and partitions params\n def demo1():\n table = 'tab_client_label'\n partions_param_dict = {'client_nmbr': 'AA75', 'batch': 'p1'}\n query_fileds_list = ['gid', 'realname', 'card']\n querylimit = 1000\n jump=gethive()\n df2=jump.get_JumpTunnel_df(table,partions_param_dict,query_fileds_list,querylimit)\n return df2\n ## query all fileds by table name and partitions params\n def demo2():\n table = 'tab_client_label'\n partions_param_dict = {'client_nmbr': 'AA75', 'batch': 'p1'}\n jump =gethive()\n df2 = jump.get_JumpTunnel_table_partitions_df(table,partions_param_dict,1000)\n return df2\n ## use hsql to query data\n def demo3():\n jump = gethive()\n hsql=\"select * from fkdb.tab_client_label where client_nmbr= 'AA75' and batch= 'p1' limit 500\"\n df2=jump.get_JumpTunnel_hsql_df(hsql)\n return df2\n ## initail the instance to query\n df3=demo2()\n print(df3.shape)\n print(df3.columns)\n print(df3.head(100))\n]\n\n\nUNet network with batch-normalization added, training with Adam optimizer with\na loss that is a sum of 0.1 cross-entropy and 0.9 dice loss.\nInput for UNet was a 116 by 116 pixel patch, output was 64 by 64 pixels,\nso there were 16 additional pixels on each side that just provided context for\nthe prediction.\nBatch size was 128, learning rate was set to 0.0001\n(but loss was multiplied by the batch size).\nLearning rate was divided by 5 on the 25-th epoch\nand then again by 5 on the 50-th epoch,\nmost models were trained for 70-100 epochs.\nPatches that formed a batch were selected completely randomly across all images.\nDuring one epoch, network saw patches that covered about one half\nof the whole training set area. Best results for individual classes\nwere achieved when training on related classes, for example buildings\nand structures, roads and tracks, two kinds of vehicles.\n\nAugmentations included small rotations for some classes\n(\u00b110-25 degrees for houses, structures and both vehicle classes),\nfull rotations and vertical/horizontal flips\nfor other classes. Small amount of dropout (0.1) was used in some cases.\nAlignment between channels was fixed with the help of\n``cv2.findTransformECC``, and lower-resolution layers were upscaled to\nmatch RGB size. In most cases, 12 channels were used (RGB, P, M),\nwhile in some cases just RGB and P or all 20 channels made results\nslightly better.\n\n\nValidation\n----------\n\nValidation was very hard, especially for both water and both vehicle\nclasses. In most cases, validation was performed on 5 images\n(6140_3_1, 6110_1_2, 6160_2_1, 6170_0_4, 6100_2_2), while other 20 were used\nfor training. Re-training the model with the same parameters on all 25 images\nimproved LB score.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mullerhai/etl_ml", "keywords": "etl,dirtyData,ssh tunnel,ml,machine learning", "license": "Apache 2.0", "maintainer": "muller helen", "maintainer_email": "hai710459649@foxmail.com", "name": "etl-ml", "package_url": "https://pypi.org/project/etl-ml/", "platform": "all", "project_url": "https://pypi.org/project/etl-ml/", "project_urls": { "Homepage": "https://github.com/mullerhai/etl_ml" }, "release_url": "https://pypi.org/project/etl-ml/1.0.7/", "requires_dist": [ "thrift (>=0.11.0)", "thrift-sasl (>=0.3.0)", "hdfs (>=2.1.0)", "sklearn-pandas (>=1.6.0)", "scikit-learn (>=0.19.1)", "click (>=6.7)", "scipy (>=1.1.0)", "numpy (>=1.14.3pandas>=0.20.3)", "PyHive (>=0.5.1)", "paramiko (>=2.4.1)", "selectors (>=0.0.14)", "sasl (>=0.2.1)" ], "requires_python": "", "summary": "etl_ml is a tools could etl origin excel or csv dirty data and send data to ftp or server and insert data to hive database and load data from jump hive make feature project machine learning model train and jump the jump machine to connect hive get hive data to pandas dataframe", "version": "1.0.7" }, "last_serial": 4005652, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "ac533d0e6e2a0c2864e8899bb87fb5e5", "sha256": "ac7c02ebff711ff334b1a3e9cd20979ac45ce28b03381ecbbfcf221c5c21b40b" }, "downloads": -1, "filename": "etl_ml-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "ac533d0e6e2a0c2864e8899bb87fb5e5", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 37130, "upload_time": "2018-06-22T03:13:25", "url": "https://files.pythonhosted.org/packages/92/e2/2f68d87699c47ae5ba4980b4eab51fb319d0c458b1587a62bd7ee6e9d959/etl_ml-0.0.1-py2.py3-none-any.whl" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "0f816bc42eba3e05350c096707a32f05", "sha256": "93a5da20d4eecdc985655c1e0c0228c186e0dcb3f16b460666f262a3da96757f" }, "downloads": -1, "filename": "etl_ml-1.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0f816bc42eba3e05350c096707a32f05", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 37130, "upload_time": "2018-06-22T06:08:33", "url": "https://files.pythonhosted.org/packages/68/a4/e64fc161798b71b48d240847be5a653d76bd63a0713a5b2bcffcae87dd31/etl_ml-1.0.1-py2.py3-none-any.whl" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "73d81eaf41722028f8f9cf9342502ca2", "sha256": "c16c9b2cbf514e8b8cdb10ced672c04a433506de378c742ebc787a8732a79ca7" }, "downloads": -1, "filename": "etl_ml-1.0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "73d81eaf41722028f8f9cf9342502ca2", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 46561, "upload_time": "2018-06-26T05:39:50", "url": "https://files.pythonhosted.org/packages/df/64/2c59614b70226c99119ef55c94a9e3690d565a476f4493f4cbaaaa789d48/etl_ml-1.0.5-py2.py3-none-any.whl" } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "c7ab31a88dd536b42c85cfc17d3c452b", "sha256": "400571faabefd05e7c868534c82fd122c11fb7bec8b33c9fdb4dbfb6076ebc43" }, "downloads": -1, "filename": "etl_ml-1.0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c7ab31a88dd536b42c85cfc17d3c452b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 48263, "upload_time": "2018-06-26T09:13:17", "url": "https://files.pythonhosted.org/packages/0a/06/2ef1bc5e398a7821d2c883406918efd214a80092e47f23b8b07e721b1bab/etl_ml-1.0.6-py2.py3-none-any.whl" } ], "1.0.7": [ { "comment_text": "", "digests": { "md5": "6091088a8c5e8893b44a472366097243", "sha256": "e151b187428d4b835a13455cbfddae3a3ee167e73dba6d1c708ba63e4585da16" }, "downloads": -1, "filename": "etl_ml-1.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6091088a8c5e8893b44a472366097243", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 48265, "upload_time": "2018-06-27T01:30:07", "url": "https://files.pythonhosted.org/packages/97/00/d32269429bf21903840c39c7a653b831eb930e2b1e355bd7ec31e4e25ac2/etl_ml-1.0.7-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6091088a8c5e8893b44a472366097243", "sha256": "e151b187428d4b835a13455cbfddae3a3ee167e73dba6d1c708ba63e4585da16" }, "downloads": -1, "filename": "etl_ml-1.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6091088a8c5e8893b44a472366097243", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 48265, "upload_time": "2018-06-27T01:30:07", "url": "https://files.pythonhosted.org/packages/97/00/d32269429bf21903840c39c7a653b831eb930e2b1e355bd7ec31e4e25ac2/etl_ml-1.0.7-py2.py3-none-any.whl" } ] }