{ "info": { "author": "datawig-dev", "author_email": "datawig-dev@amazon.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3 :: Only" ], "description": "DataWig - Imputation for Tables\n================================\n\n[![PyPI version](https://badge.fury.io/py/datawig.svg)](https://badge.fury.io/py/datawig.svg)\n[![GitHub license](https://img.shields.io/github/license/awslabs/datawig.svg)](https://github.com/awslabs/datawig/blob/master/LICENSE)\n[![GitHub issues](https://img.shields.io/github/issues/awslabs/datawig.svg)](https://github.com/awslabs/datawig/issues)\n[![Build Status](https://travis-ci.org/awslabs/datawig.svg?branch=master)](https://travis-ci.org/awslabs/datawig)\n\nDataWig learns Machine Learning models to impute missing values in tables.\n\nSee our user-guide and extended documentation [here](https://datawig.readthedocs.io/en/latest).\n\n## Installation\n\n### CPU\n```bash\npip3 install datawig\n```\n\n### GPU\nIf you want to run DataWig on a GPU you need to make sure your version of Apache MXNet Incubating contains the GPU bindings.\nDepending on your version of CUDA, you can do this by running the following:\n\n```bash\nwget https://raw.githubusercontent.com/awslabs/datawig/master/requirements/requirements.gpu-cu${CUDA_VERSION}.txt\npip install datawig --no-deps -r requirements.gpu-cu${CUDA_VERSION}.txt\nrm requirements.gpu-cu${CUDA_VERSION}.txt\n```\nwhere `${CUDA_VERSION}` can be `75` (7.5), `80` (8.0), `90` (9.0), or `91` (9.1).\n\n## Running DataWig\nThe DataWig API expects your data as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). Here is an example of how the dataframe might look:\n\n|Product Type | Description | Size | Color |\n|-------------|-----------------------|------|-------|\n| Shoe | Ideal for Running | 12UK | Black |\n| SDCards | Best SDCard ever ... | 8GB | Blue |\n| Dress | This **yellow** dress | M | **?** |\n\nFor most use cases, the `SimpleImputer` class is the best starting point. DataWig expects you to provide the column name of the column you would like to impute values for (called `output_column` below) and some column names that contain values that you deem useful for imputation (called `input_columns` below).\n\n### Imputation of categorical columns\n\n```python\nimport datawig\n\ndf = datawig.utils.generate_df_string(num_samples=200, data_column_name='sentences', label_column_name='label')\ndf_train, df_test = datawig.utils.random_split(df)\n\n#Initialize a SimpleImputer model\nimputer = datawig.SimpleImputer(\n input_columns=['sentences'], # column(s) containing information about the column we want to impute\n output_column='label', # the column we'd like to impute values for\n output_path = 'imputer_model' # stores model data and metrics\n )\n\n#Fit an imputer model on the train data\nimputer.fit(train_df=df_train)\n\n#Impute missing values and return original dataframe with predictions\nimputed = imputer.predict(df_test)\n```\n\n### Imputation of numerical columns\n\n```python\nimport datawig\n\ndf = datawig.utils.generate_df_numeric(num_samples=200, data_column_name='x', label_column_name='y') \ndf_train, df_test = datawig.utils.random_split(df)\n\n#Initialize a SimpleImputer model\nimputer = datawig.SimpleImputer(\n input_columns=['x'], # column(s) containing information about the column we want to impute\n output_column='y', # the column we'd like to impute values for\n output_path = 'imputer_model' # stores model data and metrics\n )\n\n#Fit an imputer model on the train data\nimputer.fit(train_df=df_train, num_epochs=50)\n\n#Impute missing values and return original dataframe with predictions\nimputed = imputer.predict(df_test)\n\n```\n\nIn order to have more control over the types of models and preprocessings, the `Imputer` class allows directly specifying all relevant model features and parameters. \n\nFor details on usage, refer to the provided [examples](./examples).\n\n### Acknowledgments\nThanks to [David Greenberg](https://github.com/dgreenberg) for the package name.\n\n### Building documentation\n\n```bash\ngit clone git@github.com:awslabs/datawig.git\ncd datawig/docs\nmake html\nopen _build/html/index.html\n```\n\n\n### Executing Tests\n\nClone the repository from git and set up virtualenv in the root dir of the package:\n\n```\npython3 -m venv venv\n```\n\nInstall the package from local sources:\n\n```\n./venv/bin/pip install -e .\n```\n\nRun tests:\n\n```\n./venv/bin/pip install -r requirements/requirements.dev.txt\n./venv/bin/python -m pytest\n```\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/awslabs/datawig", "keywords": "", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "datawig-dev@amazon.com", "name": "datawig", "package_url": "https://pypi.org/project/datawig/", "platform": "", "project_url": "https://pypi.org/project/datawig/", "project_urls": { "Homepage": "https://github.com/awslabs/datawig" }, "release_url": "https://pypi.org/project/datawig/0.1.10/", "requires_dist": [ "numpy (>=1.15.0)", "scikit-learn[alldeps] (>=0.20.0)", "typing (>=3.6.6)", "pandas (>=0.22.0)", "mxnet (>=1.3.0)" ], "requires_python": ">=3", "summary": "Imputation for tables with missing values", "version": "0.1.10" }, "last_serial": 5252278, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "63cca414c3d01c1f428af287489d8d8b", "sha256": "bfe3fb2e4570240c5f4015811430c2e3a9ca943ad8cf334ab95406cb56437212" }, "downloads": -1, "filename": "datawig-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "63cca414c3d01c1f428af287489d8d8b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 36645, "upload_time": "2018-08-27T12:02:22", "url": "https://files.pythonhosted.org/packages/f2/44/340852d109ee6fce0aabd2ce5fac93b1d42f7acf96cb12b0c04bb014d86d/datawig-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a5b3c4e2250d3dbbf6137f7314bb07ed", "sha256": "586d320feed01e1be8ad464c70d88df28bda7b67aa5e227e2c662e6591b8402d" }, "downloads": -1, "filename": "datawig-0.0.1.tar.gz", "has_sig": false, "md5_digest": "a5b3c4e2250d3dbbf6137f7314bb07ed", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 38844, "upload_time": "2018-08-27T12:02:24", "url": "https://files.pythonhosted.org/packages/b3/49/b07a89cf1f074317b72f7673b8a0e81d44c22e2f00f3125ff896af6ecbd4/datawig-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "9ce34bfc65ce6ffc0ae53c7ff6d8f199", "sha256": "8d33557a120badb3a93559ad7fedeb8688cbe0cca3fb55c995572b818a020218" }, "downloads": -1, "filename": "datawig-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "9ce34bfc65ce6ffc0ae53c7ff6d8f199", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 36054, "upload_time": "2018-08-30T13:24:15", "url": "https://files.pythonhosted.org/packages/a4/69/4ee75149dba146619349933301aa478e3c5a62e40ab5b3b47e5a16c9542a/datawig-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8c60d4ff71a39705d9b8f0090b7a0447", "sha256": "8ad35897f66cf1d80fd9a32cd03c26a85368f363ceffa8329224a6ddb9d9a3d9" }, "downloads": -1, "filename": "datawig-0.0.2.tar.gz", "has_sig": false, "md5_digest": "8c60d4ff71a39705d9b8f0090b7a0447", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 38263, "upload_time": "2018-08-30T13:24:17", "url": "https://files.pythonhosted.org/packages/42/49/e554a0ddd766bfd830bcd3c06803ddb2557f972bd0689a9d8ae14780d71e/datawig-0.0.2.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "5073fd5d66b6539b5cc7da579dee1b81", "sha256": "8aa8ff59b0a9bfcb53c2e18a6edbf6e80535cb84511af2940853f7f3f3861604" }, "downloads": -1, "filename": "datawig-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5073fd5d66b6539b5cc7da579dee1b81", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 39834, "upload_time": "2018-09-11T07:28:07", "url": "https://files.pythonhosted.org/packages/6c/f0/1f40e423ba9ba18458786173e96122aa33571c147c1932c8087200e45766/datawig-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "89b8d1a3375bb7bc0031e0fdade6d9a8", "sha256": "c411d4625f7e32abcee9bca39774fb5a4c24d24f00ea2bcee47eabdee1ad739b" }, "downloads": -1, "filename": "datawig-0.1.0.tar.gz", "has_sig": false, "md5_digest": "89b8d1a3375bb7bc0031e0fdade6d9a8", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 42949, "upload_time": "2018-09-11T07:28:08", "url": "https://files.pythonhosted.org/packages/5a/10/0609ef0ffa7e75dd80661a6ead89ded35854d7d80652f4b40295fe56f277/datawig-0.1.0.tar.gz" } ], "0.1.10": [ { "comment_text": "", "digests": { "md5": "4117b7391220c83466bff7296a4d80c8", "sha256": "0f033a5456e0dfae913e38ece000375fdaac141a9f51a79f7af41a07f04cd5dd" }, "downloads": -1, "filename": "datawig-0.1.10-py3-none-any.whl", "has_sig": false, "md5_digest": "4117b7391220c83466bff7296a4d80c8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 76534, "upload_time": "2019-05-10T13:44:24", "url": "https://files.pythonhosted.org/packages/0a/bc/5d334e3d7a59170625a302dadb0d31f275dfcc274add2762ee886913cb2d/datawig-0.1.10-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e8e5743653e505357388fcfd4f13a14b", "sha256": "62e4b7a25f22aa5440ddd6ec561597cf48711a706bcb76485e10c2c762285021" }, "downloads": -1, "filename": "datawig-0.1.10.tar.gz", "has_sig": false, "md5_digest": "e8e5743653e505357388fcfd4f13a14b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 61728, "upload_time": "2019-05-10T13:44:26", "url": "https://files.pythonhosted.org/packages/55/73/9f5c823cdf7cea49a8f4406f1c473e81ea2bacad3cf295141914ded802b3/datawig-0.1.10.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "05d43d1ad3c7526355d8b61af31225e1", "sha256": "79bbeb75034612903c7c341ff916efabf5758620660d7d7a169b20bfe7817e13" }, "downloads": -1, "filename": "datawig-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "05d43d1ad3c7526355d8b61af31225e1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 61821, "upload_time": "2018-10-02T08:21:02", "url": "https://files.pythonhosted.org/packages/18/d9/2c6623070d89f8706168e98b7f6551673f275d0359e8d761263f5f232757/datawig-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3f85a5716f9b2679d18b3c46dd210c0f", "sha256": "8b6c19de3bdee16a5968217695e590afc3693740f368bdd55ecc553a2bfbf2ec" }, "downloads": -1, "filename": "datawig-0.1.2.tar.gz", "has_sig": false, "md5_digest": "3f85a5716f9b2679d18b3c46dd210c0f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 46484, "upload_time": "2018-10-02T08:21:04", "url": "https://files.pythonhosted.org/packages/c4/d7/f7f10842aead7553a90e8565788fe826d0b76d0d56006fbcec6eb01e2e01/datawig-0.1.2.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "2baef36e484b1da74d161539c8065478", "sha256": "89d1b5b2d1ca31548c2a5dfad29d06ee863550dc0bf46a55b5be50c1d2c2e446" }, "downloads": -1, "filename": "datawig-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "2baef36e484b1da74d161539c8065478", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 61821, "upload_time": "2018-10-02T08:23:56", "url": "https://files.pythonhosted.org/packages/76/c6/2c1694661aa6cccc496bda23f868fbc479dd1543f8cfdd7aadc5675f2d5b/datawig-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "949c4467570f9eeb52e8fd79fecedf69", "sha256": "4099c80597b50f189bf968ee370654ca506f9eef69c1675ceac78cadc007e8e7" }, "downloads": -1, "filename": "datawig-0.1.4.tar.gz", "has_sig": false, "md5_digest": "949c4467570f9eeb52e8fd79fecedf69", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 46493, "upload_time": "2018-10-02T08:23:57", "url": "https://files.pythonhosted.org/packages/e4/2f/a4bccca4c8933464aeb7c0e85d63bfd824ae7af12197426a7acae20da416/datawig-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "d11c8d38d6f01b0939d269a89e360a55", "sha256": "4d330c1ac9b81acec11d26fed6e846b427b820264673e5b977649101da68d537" }, "downloads": -1, "filename": "datawig-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "d11c8d38d6f01b0939d269a89e360a55", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 56849, "upload_time": "2018-10-11T09:15:23", "url": "https://files.pythonhosted.org/packages/64/4e/b3af8f7438f99ba64ba39065d7f51dfeaf23a10dc8439f0bca96c95adeba/datawig-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c36636237717dadf20bcec9f9fd494d0", "sha256": "a9adc93919e4c8c6cda2fc31665b64061b9ffb87f825b0ff4ce2c946ae6856c4" }, "downloads": -1, "filename": "datawig-0.1.5.tar.gz", "has_sig": false, "md5_digest": "c36636237717dadf20bcec9f9fd494d0", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 43329, "upload_time": "2018-10-11T09:15:24", "url": "https://files.pythonhosted.org/packages/28/d2/780137b398da3798d4a53083e4d6e2dc0ff9652bf0db964e5993010c3f64/datawig-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "1ec415250869bbf588f879f381451e23", "sha256": "fabe002189c4eff8dc3a4cbd37500b24351ce6248d8029ac9e28d79a0227b69a" }, "downloads": -1, "filename": "datawig-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "1ec415250869bbf588f879f381451e23", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 71752, "upload_time": "2018-11-20T14:52:23", "url": "https://files.pythonhosted.org/packages/cf/15/40e01aca1d8f1589d6428d3daaaa3148c09eaee493d9b537a525a9288b0e/datawig-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "06517ba1062c6bf4ea5569d714f443b7", "sha256": "ee1efbf09d22baccc2ac1b7d9abf53697823e777dcdf738e5c779b6acb78faaf" }, "downloads": -1, "filename": "datawig-0.1.6.tar.gz", "has_sig": false, "md5_digest": "06517ba1062c6bf4ea5569d714f443b7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 55913, "upload_time": "2018-11-20T14:52:25", "url": "https://files.pythonhosted.org/packages/1d/2b/805ac1adcb00c1beba494613a69edbf6e680ef0df9a9d3cb2ddcd4e07443/datawig-0.1.6.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "c68013d3f93019dadcc7d8a7a21195b6", "sha256": "72612b9df0426cc6f96cfa3736b0a66ae3540c5061c02cd82d6be4bd95674af3" }, "downloads": -1, "filename": "datawig-0.1.7-py3-none-any.whl", "has_sig": false, "md5_digest": "c68013d3f93019dadcc7d8a7a21195b6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 71903, "upload_time": "2018-11-20T19:55:38", "url": "https://files.pythonhosted.org/packages/31/cb/649c7f95819ec0e2d5681d0d023b5e89c97a84b734a1ec14314de92db178/datawig-0.1.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "df0e53650dd09be3a9a8bd3a3b9fe1e5", "sha256": "afa43aaef7678ddfdfcff19afa6b1c56f595cdc7aa368aeeeaae8cdbc4ba2dbc" }, "downloads": -1, "filename": "datawig-0.1.7.tar.gz", "has_sig": false, "md5_digest": "df0e53650dd09be3a9a8bd3a3b9fe1e5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 56989, "upload_time": "2018-11-20T19:55:40", "url": "https://files.pythonhosted.org/packages/b1/80/1cf2180c69d1662971e6bfbfd4835c55cabaa67213ad723a510b4cf0b115/datawig-0.1.7.tar.gz" } ], "0.1.8": [ { "comment_text": "", "digests": { "md5": "7fc8f5ffb935b21f3a9b26d33e7446c8", "sha256": "ad57cc3dba5aa85413d4b78eebfab8ea5f4319ec1a147df10b73d7bf7d10014c" }, "downloads": -1, "filename": "datawig-0.1.8-py3-none-any.whl", "has_sig": false, "md5_digest": "7fc8f5ffb935b21f3a9b26d33e7446c8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 73189, "upload_time": "2019-01-30T19:31:43", "url": "https://files.pythonhosted.org/packages/e1/0e/24ae9b2e655fcd88e28c3d41872040eedb41f3a16448358476dd9e4ce740/datawig-0.1.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7ec4b7d86251f778a1d60d9725d4d644", "sha256": "d31bdbffddc8da0e5f2548de445dc4b7202955a40583b76a7393a13fb143e506" }, "downloads": -1, "filename": "datawig-0.1.8.tar.gz", "has_sig": false, "md5_digest": "7ec4b7d86251f778a1d60d9725d4d644", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 57654, "upload_time": "2019-01-30T19:31:45", "url": "https://files.pythonhosted.org/packages/77/ac/2d6f41a79453e3733f60e7f5a7c702d722abdf4a6a0187e3ce6c41179171/datawig-0.1.8.tar.gz" } ], "0.1.9": [ { "comment_text": "", "digests": { "md5": "46ddff44d8f6a2940d12744afcd887c2", "sha256": "15e973e5680ea234b93e55c7e9477fc81fc83038863a54f766d485167e53c22b" }, "downloads": -1, "filename": "datawig-0.1.9-py3-none-any.whl", "has_sig": false, "md5_digest": "46ddff44d8f6a2940d12744afcd887c2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 73740, "upload_time": "2019-03-06T10:14:33", "url": "https://files.pythonhosted.org/packages/7a/00/88681c3b97c473255fba3c2639823f3e5ddf2522eb5af53f026b043fa343/datawig-0.1.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9637cc3dc5b06e52c34950ae23e4944b", "sha256": "203026138a883918c9521c9313fbc60ecfb1fe5ad6d136e431d1330173562c5f" }, "downloads": -1, "filename": "datawig-0.1.9.tar.gz", "has_sig": false, "md5_digest": "9637cc3dc5b06e52c34950ae23e4944b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 58239, "upload_time": "2019-03-06T10:14:34", "url": "https://files.pythonhosted.org/packages/bb/4c/639e11d222b74a20b156fc46789894cf7d9f99830da06e84d5fb431bab37/datawig-0.1.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4117b7391220c83466bff7296a4d80c8", "sha256": "0f033a5456e0dfae913e38ece000375fdaac141a9f51a79f7af41a07f04cd5dd" }, "downloads": -1, "filename": "datawig-0.1.10-py3-none-any.whl", "has_sig": false, "md5_digest": "4117b7391220c83466bff7296a4d80c8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 76534, "upload_time": "2019-05-10T13:44:24", "url": "https://files.pythonhosted.org/packages/0a/bc/5d334e3d7a59170625a302dadb0d31f275dfcc274add2762ee886913cb2d/datawig-0.1.10-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e8e5743653e505357388fcfd4f13a14b", "sha256": "62e4b7a25f22aa5440ddd6ec561597cf48711a706bcb76485e10c2c762285021" }, "downloads": -1, "filename": "datawig-0.1.10.tar.gz", "has_sig": false, "md5_digest": "e8e5743653e505357388fcfd4f13a14b", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 61728, "upload_time": "2019-05-10T13:44:26", "url": "https://files.pythonhosted.org/packages/55/73/9f5c823cdf7cea49a8f4406f1c473e81ea2bacad3cf295141914ded802b3/datawig-0.1.10.tar.gz" } ] }