{ "info": { "author": "Harsh Nisar and Deshana Desai", "author_email": "nisar.harsh@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Utilities" ], "description": "Badfish - A missing data wrangling library in Python\r\n====================================================\r\n\r\nBadfish introduces MissFrame, a wrapper over ``pandas`` ``DataFrame``,\r\nto wrangle through and investigate missing data. It opens an easy to use\r\nAPI to summarize and explore patterns in missingness.\r\n\r\nBadfish provides methods which make it easy to investigate any\r\nsystematic issues in data wrangling, surveys, ETL processes which can\r\nlead to missing data.\r\n\r\nThe API has been inspired by typical questions which arise when\r\nexploring missing data.\r\n\r\nBadfish uses the ``where`` and ``how`` api in most of its methods to\r\nprepare a subset of the data to work on. ``where`` : Work on a subset of\r\ndata ``where`` specified columns are missing. ``how`` : Either ``all``\r\n\\| ``any`` of the columns should be missing.\r\n\r\nEg. ``mf.counts(columns = ['Age', 'Gender'])`` would give counts of\r\nmissing values in the entire dataset.\r\n\r\nWhile, ``mf.counts(where=['Income'], columns = ['Age', 'Gender'])``\r\nwould give counts of missing values in subset of data where ``Income``\r\nis already missing.\r\n\r\nInstallation\r\n------------\r\n\r\n``pip install badfish``\r\n\r\nUsage\r\n-----\r\n\r\n::\r\n\r\n >>> import badfish as bf\r\n >>> mf = bf.MissFrame(df)\r\n\r\nExample\r\n~~~~~~~\r\n\r\nWill add an exmaple IPython notebook soon.\r\n\r\nCounts\r\n~~~~~~\r\n\r\nBasic counts of missing data per column.\r\n\r\n::\r\n\r\n >>> mf.counts(where=['gender', 'age'], how='all', columns=['Income', 'Marital Status'])\r\n\r\nPattern\r\n~~~~~~~\r\n\r\nGet counts on different combinations of columns with missing data.\r\n``True`` means missing and ``False`` means present.\r\n\r\n::\r\n\r\n >>> mf.pattern()\r\n\r\nThe same can be visualized in the form of a plot (inspired by VIM\r\npackage in R)\r\n\r\n::\r\n\r\n >>> mf.plot(kind='pattern')\r\n\r\n Example plot:\r\n\r\nNote: Both ``where`` and ``how`` can be used in this method.\r\n\r\nItemset Mining\r\n~~~~~~~~~~~~~~\r\n\r\nUse frequency item set mining to find subgroups where data goes missing\r\ntogether. Note: This uses the PyMining package.\r\n\r\n::\r\n\r\n >>> itemsets, rules = mf.frequency_item_set()\r\n\r\nCohort\r\n~~~~~~\r\n\r\nTries to find signigicant group differences between values of columns\r\nother than the ones specified in the group clause. Group made on the\r\nbasis of missing or non-missing of columns in the group clause.\r\nInternally uses ``scipy.stats.ttest_ind``.\r\n\r\nThis method works on the values in each column instead of column names.\r\n\r\nNote: Experimental method.\r\n\r\n::\r\n\r\n >>> mf.cohort(group=['gender'], columns=['Income'])\r\n\r\nLicense\r\n-------\r\n\r\nPlease see the `repository\r\nlicense `__.\r\n\r\nGenerally, we have licensed badfish to make it as widely usable as\r\npossible.\r\n\r\nCall for contribution\r\n---------------------\r\n\r\nIf you have any ideas, issues or feature requests, feel free to open an\r\nissue, send a PR or contact us.\r\n\r\nAuthors\r\n-------\r\n\r\n`Harsh Nisar `__ & `Deshana\r\nDesai `__\r\n\r\nInteresting links\r\n-----------------\r\n\r\n- https://github.com/tierneyn/ggmissing\r\n- https://github.com/tierneyn/visdat\r\n- http://www.njtierney.com/blag/rbloggers/2016/03/06/wombat-2016-wrap-up/", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/harshnisar/badfish", "keywords": "data cleaning,missing,machine learning,data analysis,imputation,data wrangling", "license": "License :: OSI Approved :: MIT License", "maintainer": null, "maintainer_email": null, "name": "badfish", "package_url": "https://pypi.org/project/badfish/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/badfish/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/harshnisar/badfish" }, "release_url": "https://pypi.org/project/badfish/0.1.2/", "requires_dist": null, "requires_python": null, "summary": "Badfish - A missing data analysis and wrangling library in Python", "version": "0.1.2" }, "last_serial": 2288243, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "cba7c45d7cc8822a2fd5dd172d40dbde", "sha256": "012ef3414721b4524e537804501845b713ab65af823129063e3d32635c974a1a" }, "downloads": -1, "filename": "badfish-0.1.0.zip", "has_sig": false, "md5_digest": "cba7c45d7cc8822a2fd5dd172d40dbde", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9100, "upload_time": "2016-08-18T09:34:37", "url": "https://files.pythonhosted.org/packages/2b/30/b70f415d617190090e4d83f7949c12e4cf301e3716a89f48e7da08a8d9e5/badfish-0.1.0.zip" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "5b29799b09c20fd98af397807c5e642a", "sha256": "f933547194f9f1ee35108d19a689b891dcde434977d87099b20852ecf6b70a83" }, "downloads": -1, "filename": "badfish-0.1.1-py2.7.egg", "has_sig": false, "md5_digest": "5b29799b09c20fd98af397807c5e642a", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 14014, "upload_time": "2016-08-18T09:43:47", "url": "https://files.pythonhosted.org/packages/8f/ed/5dee851a2b36da24273f9f2669186ea51ed9140d4666b4c7b306ed880446/badfish-0.1.1-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "e1dc9bbdd61f438cf2fafeee5fe0dc51", "sha256": "a47986e654978effc732a53eba8495527abf52aacdc42a117bac21e60e85cf67" }, "downloads": -1, "filename": "badfish-0.1.1.zip", "has_sig": false, "md5_digest": "e1dc9bbdd61f438cf2fafeee5fe0dc51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11613, "upload_time": "2016-08-18T09:43:16", "url": "https://files.pythonhosted.org/packages/de/a3/c23ff2073d870fc95517afdcca4a6bcb0f03ddad876a489a4769112c9f72/badfish-0.1.1.zip" } ], "0.1.1a0": [ { "comment_text": "", "digests": { "md5": "86a611e0b7bb5443bb31d9991baffea2", "sha256": "294a6c41ad2f3fb66f34d3af69e8d986647cffca77cefa26969afaa18f58ebc6" }, "downloads": -1, "filename": "badfish-0.1.1a0-py3.5.egg", "has_sig": false, "md5_digest": "86a611e0b7bb5443bb31d9991baffea2", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 15473, "upload_time": "2016-08-18T09:54:44", "url": "https://files.pythonhosted.org/packages/2e/2c/394fa24675e0cfe8d1b8bdc4f8d10b76f3741abaf73206d4023b6ce870d4/badfish-0.1.1a0-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "40461edc96e08e448430532f1983886a", "sha256": "af640d9583f5b4d8e73b23ada4a483fd07efc47a26520856c530b82e20d5a908" }, "downloads": -1, "filename": "badfish-0.1.1a0.zip", "has_sig": false, "md5_digest": "40461edc96e08e448430532f1983886a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14059, "upload_time": "2016-08-18T09:54:36", "url": "https://files.pythonhosted.org/packages/48/c7/dbcaba0e5a23023ec5331f5a1e52ec092702d684a056a655baac5e906dd1/badfish-0.1.1a0.zip" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "03a2d21bb8b9b82396785ab78971715d", "sha256": "bc07d61be2c7f9ac846b78277447c774c77940bdea0ed6dc392ca3fe76a899d0" }, "downloads": -1, "filename": "badfish-0.1.2-py3.5.egg", "has_sig": false, "md5_digest": "03a2d21bb8b9b82396785ab78971715d", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 15493, "upload_time": "2016-08-18T11:44:16", "url": "https://files.pythonhosted.org/packages/d7/4a/67cf8f868cc9f393f4a7675f07fcddbc64ecd3928c8f8a9191b123a56cd2/badfish-0.1.2-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "7df8bad6d7076123a058f2f518589fa9", "sha256": "9b6997ff82738fd68c40dd2e0cbe7bd14d9cb53a1620c2e69839cbe1bb6a13a7" }, "downloads": -1, "filename": "badfish-0.1.2.zip", "has_sig": false, "md5_digest": "7df8bad6d7076123a058f2f518589fa9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14119, "upload_time": "2016-08-18T11:44:11", "url": "https://files.pythonhosted.org/packages/e5/8a/6286c7b85078ad9cbb8818ad4db0e9bf20ebdb7deb87cd4587b34d6d942d/badfish-0.1.2.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "03a2d21bb8b9b82396785ab78971715d", "sha256": "bc07d61be2c7f9ac846b78277447c774c77940bdea0ed6dc392ca3fe76a899d0" }, "downloads": -1, "filename": "badfish-0.1.2-py3.5.egg", "has_sig": false, "md5_digest": "03a2d21bb8b9b82396785ab78971715d", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 15493, "upload_time": "2016-08-18T11:44:16", "url": "https://files.pythonhosted.org/packages/d7/4a/67cf8f868cc9f393f4a7675f07fcddbc64ecd3928c8f8a9191b123a56cd2/badfish-0.1.2-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "7df8bad6d7076123a058f2f518589fa9", "sha256": "9b6997ff82738fd68c40dd2e0cbe7bd14d9cb53a1620c2e69839cbe1bb6a13a7" }, "downloads": -1, "filename": "badfish-0.1.2.zip", "has_sig": false, "md5_digest": "7df8bad6d7076123a058f2f518589fa9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14119, "upload_time": "2016-08-18T11:44:11", "url": "https://files.pythonhosted.org/packages/e5/8a/6286c7b85078ad9cbb8818ad4db0e9bf20ebdb7deb87cd4587b34d6d942d/badfish-0.1.2.zip" } ] }