{ "info": { "author": "Wu Haifeng", "author_email": "wuhaifengdhu@163.com", "bugtrack_url": null, "classifiers": [], "description": "#MLData\nMLData, is a project to clean and normalize data for machine learning process.\n\n\n## How to install\n```pip install mldata```\n\n\n\n\n## Usage Example\nUsage Example: \n```\n from mldata import Processor\n new_file_path = \"outputs/new.csv\"\n processor = Processor(\"resource/raw_dataset.csv\", target_column=\"APPROVE/NOT\", exclude_column_list=[\"id\"],\n category_list=[\"Work Class\", \"FnlWgt\", \"Education\", \"Maried Status\", \"Occupation\",\n \"Relationship\", \"Race\", \"Gender\", \"Native Country\", \"Flag\"],\n invalid_values=[\"?\", \"\", \"null\", None],\n positive_tag=1)\n processor.normalize()\n processor.save_to_file(new_file_path)\n```\n\n\n## API Description \n1, Init function\n```\nProcessor(csv_file_path, target_column, exclude_column_list=None, category_list=None, positive_tag=1,\n csv_header=0, invalid_values=None)\n\n```\nParameters: \ncsv_file_path: The origin csv file path \ntarget_column: The column name of the target \nexclude_column_list: Columns no need to normalize \ncategory_list: A column name list which are category based columns \npositive_tag: The positive tag for the target column value, default value is 1 \ninvalid_values: values in csv not valid, such as \"?\", \"\", \"null\", None \n\n2, Norm the list\n```buildoutcfg\nProcessor.normalize() \n``` \nThis function is used to do norm to the csv file.\n\n\n3, Save result to csv file. \n```buildoutcfg\nProcessor.save_to_file(new_file_name) \n``` \nThis function is used to save normalized output to csv file. \nParameters: \nnew_file_name: The new file name to save the normalized data \n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/wuhaifengdhu/MLData.git", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "MLData", "package_url": "https://pypi.org/project/MLData/", "platform": "", "project_url": "https://pypi.org/project/MLData/", "project_urls": { "Homepage": "https://github.com/wuhaifengdhu/MLData.git" }, "release_url": "https://pypi.org/project/MLData/2.0.0/", "requires_dist": [ "six (>=1.11.0)" ], "requires_python": "", "summary": "MLData is used to clean data before machine learning process!", "version": "2.0.0" }, "last_serial": 3686969, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "ef70a358fa3a06fe4d8c488c1674233e", "sha256": "1c8354e71522bb2278ca53e8a64aa9ccf36ed844c3d58198e0b52d341f06c4cc" }, "downloads": -1, "filename": "MLData-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "ef70a358fa3a06fe4d8c488c1674233e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 8160, "upload_time": "2018-03-20T08:51:05", "url": "https://files.pythonhosted.org/packages/c2/3e/1ee6a9a99f13720e86e611ba7e34c1240110b4f774a1667821903c9a9e4d/MLData-1.0.0-py2.py3-none-any.whl" } ], "2.0.0": [ { "comment_text": "", "digests": { "md5": "598fcf3cad0405789602cd98332c7353", "sha256": "d905200efc46be2e8d44f99156cb3eb230a10f3e56025a5996d3fa7b75201fd6" }, "downloads": -1, "filename": "MLData-2.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "598fcf3cad0405789602cd98332c7353", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 9372, "upload_time": "2018-03-20T09:13:17", "url": "https://files.pythonhosted.org/packages/b0/1f/40cd24cedcd3deb8f225569a889a3c2a84cca13d29264b336734e868ae9c/MLData-2.0.0-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "598fcf3cad0405789602cd98332c7353", "sha256": "d905200efc46be2e8d44f99156cb3eb230a10f3e56025a5996d3fa7b75201fd6" }, "downloads": -1, "filename": "MLData-2.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "598fcf3cad0405789602cd98332c7353", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 9372, "upload_time": "2018-03-20T09:13:17", "url": "https://files.pythonhosted.org/packages/b0/1f/40cd24cedcd3deb8f225569a889a3c2a84cca13d29264b336734e868ae9c/MLData-2.0.0-py2.py3-none-any.whl" } ] }