{
"info": {
"author": "Chu-Hsuan Lee",
"author_email": "joseph.chuhsuanlee@gmail.com",
"bugtrack_url": null,
"classifiers": [
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3"
],
"description": "# guacamoleETL - Document\n\nAn ETL pipeline tool that\n* pre-process\n * extracts data from a .txt file ([Challenge_me.txt](Challenge_me.txt))\n * cleans up the data with invalid information\n* transforms the data through given specifications into a matrix (list of lists)\n* loads the data into a .csv file ([output.csv](output.csv))\n\n## Installation\nThis tool can be installed with `pip`
\nCopy-paste and run this command in the terminal\n```\npip install guacamoleETL\n```\n\n## Usage\nThis ETL pipeline can be part of predictive model training and feed the data directly to the model\n```py\nimport guacamoleETL\n\ndataFile = 'Challenge_me.txt'\n\nguacamoleETL.load(dataFile)\nresult = guacamoleETL.transform(dataFile)\npredictive_model = model_training(result)\n```\n\n## Functions\n* __extract_data(txt_file):__\n __Extract data from a .txt file to a temporary .csv file__
\n Leading or trailing whitespace are removed during the extraction\n* __clean_up():__\n __Clean up the data with invalid information__
\n Rows with the placeholder '-' (NA) in any of the specified columns are excluded\n* __transform(path):__\n __Transform the data from pre-process through given specifications into a matrix__
\n `engine-location` is split into two columns, `engine-location_front` and `engine-location_rear` and one-hot-encoded
\n `num-of-cylinders` is transformed from word into integer through a pre-defined dictionary
\n `engine-size` is transformed into integer
\n `weight` is transformed into integer
\n `horsepower` is transformed from German decimal notation string into float number
\n `aspiration` is modified as `aspiration_turbo` so that turbo engines are marked as 1
\n `price` is converted from minor units to major units
\n `make` is not transformed but kept in the dataset\n* __load(path):__\n __Load the data from previous transformation into a .csv file__\n\n\n## Architecture\nAll the functions are implemented in the [\\_\\_init\\_\\_.py](guacamoleETL/__init__.py), this decision is made based on the following reasons:\n* After the package is imported, if we want to use the transform and load functions directly as sub-module, the functions must be imported or defined in `__init__.py`.\n* Since they are all connected to each other, such as the transform function takes the result from pre-process (extract and clean up) and the load function also takes the result from transform function, it's easier to follow the flow if they are all in the same file.\n* This might not be the best architecture implementation, but while starting from small, simplicity is always a good consideration.\n\n\n",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/chuhsuanlee/guacamoleETL",
"keywords": "",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "guacamoleETL",
"package_url": "https://pypi.org/project/guacamoleETL/",
"platform": "",
"project_url": "https://pypi.org/project/guacamoleETL/",
"project_urls": {
"Homepage": "https://github.com/chuhsuanlee/guacamoleETL"
},
"release_url": "https://pypi.org/project/guacamoleETL/0.3.0/",
"requires_dist": null,
"requires_python": "",
"summary": "ETL package for AUTO1 challenge",
"version": "0.3.0"
},
"last_serial": 4503889,
"releases": {
"0.3.0": [
{
"comment_text": "",
"digests": {
"md5": "175c6fd4fa2ef3002f7a12d929b355ff",
"sha256": "c8cd67437a41fb259151da2682366e5c6806c5cb8316f02b3e97aaeb24a57fb3"
},
"downloads": -1,
"filename": "guacamoleETL-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "175c6fd4fa2ef3002f7a12d929b355ff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6456,
"upload_time": "2018-11-19T17:55:16",
"url": "https://files.pythonhosted.org/packages/bc/f4/c0508079f244172f00f7353c59999e2ced8ab4e9c5b662facdcc0ae554b1/guacamoleETL-0.3.0-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "a331408b17a67742197856510acd52ee",
"sha256": "2c5ccdcfec63b51a6c6e39ff6177b3b17f297e53927e12c7fbc178319b50c47a"
},
"downloads": -1,
"filename": "guacamoleETL-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "a331408b17a67742197856510acd52ee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3111,
"upload_time": "2018-11-19T17:55:18",
"url": "https://files.pythonhosted.org/packages/fe/f8/f32e0da8ab4a45f488d7dd1cb49b31664624e831e45c5ef552d9e9e1e0fe/guacamoleETL-0.3.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "175c6fd4fa2ef3002f7a12d929b355ff",
"sha256": "c8cd67437a41fb259151da2682366e5c6806c5cb8316f02b3e97aaeb24a57fb3"
},
"downloads": -1,
"filename": "guacamoleETL-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "175c6fd4fa2ef3002f7a12d929b355ff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6456,
"upload_time": "2018-11-19T17:55:16",
"url": "https://files.pythonhosted.org/packages/bc/f4/c0508079f244172f00f7353c59999e2ced8ab4e9c5b662facdcc0ae554b1/guacamoleETL-0.3.0-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "a331408b17a67742197856510acd52ee",
"sha256": "2c5ccdcfec63b51a6c6e39ff6177b3b17f297e53927e12c7fbc178319b50c47a"
},
"downloads": -1,
"filename": "guacamoleETL-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "a331408b17a67742197856510acd52ee",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 3111,
"upload_time": "2018-11-19T17:55:18",
"url": "https://files.pythonhosted.org/packages/fe/f8/f32e0da8ab4a45f488d7dd1cb49b31664624e831e45c5ef552d9e9e1e0fe/guacamoleETL-0.3.0.tar.gz"
}
]
}