{ "info": { "author": "Oliver Chen", "author_email": "oliverxchen@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5" ], "description": ".. image:: https://travis-ci.org/DataKind-SG/test-driven-data-cleaning.svg?branch=master\n :target: https://travis-ci.org/DataKind-SG/test-driven-data-cleaning# Test Driven Data Cleaning\n :alt: Build Status\n\nThis package provides a framework for collaborative, test-driven data cleaning. The framework enables a reproducible method for data cleaning that can be easily validated.\n \nFor a given tabular data set, a Trello board is populated with cards for each column so that team members can tag themselves to a column and ensure that work does not overlap. The cards include summary statistics of the columns that can be useful for writing methods to clean the column. Method stubs and test stubs are also scaffolded out for team members to fill out.\n\n======\nUsage:\n======\n\nThis works on Linux with Python 2.7, 3.3, 3.4 and 3.5, and on OSX with Python 2.7 and 3.5 (and probably 3.3 and 3.4, but those haven't been tested). \nIt works on Windows (tested using Python 3.5.2 :: Anaconda 4.1.1 (64-bit)). \nIntegration with Trello on Windows using tddc is yet to be tested though.\n\nInstall the package with:\n``$ pip install tddc``\n\nYou can download a tiny example CSV file at: https://github.com/DataKind-SG/test-driven-data-cleaning/raw/master/input/foobar_data.csv\n\nIn the same directory as the file, run:\n\n``$ tddc summarize foobar_data.csv``\n\nThis takes the csv data set and summarizes it, outputing to a json file in a newly created output/ directory.\n\nNext, you can run:\n\n``$ tddc build_trello foobar_data.csv``\n\nThe first time you run this, it will fail and give you instructions on how to create a Trello configuration file in your root directory (in future, this should probably be created through the CLI).\nOnce you create it, you can try to run that step again. This will create a Trello board. The one my run created is here: https://trello.com/b/cqP9VZal/data-cleaning-board-for-foobar-data \n\nFinally, you can run:\n\n``$ tddc build foobar_data.csv``\n\nThis outputs a script into the output/ folder that contains method stubs and glue code to clean the data set. It also outputs stubs for tests in output/.\n\nContributing:\n=============\n\nBefore running the tests, you'll need to run:\n\n``$ pip install pytest pytest-cov mock``\n\nThen, in the root of the project directory you can run the tests with:\n\n``$ py.test``\n\nWe're trying out the new Github projects feature. The project we're currently working on is https://github.com/DataKind-SG/test-driven-data-cleaning/projects/1\n\nEach card is an issue that you can click through to. If you'd like to take a card (thank you!), move the card to the \"In progress\" column and assign yourself to the issue. Once you're finished, issue a pull request and move the card to \"For review\". \n\nIf you think of a new issue, create the card in the appropriate project and convert the card to an issue in the pull-down menu (it's currently not possible to link to an already created issue from a card).\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/DataKind-SG/test-driven-data-cleaning", "keywords": "data cleaning collaborative", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "tddc", "package_url": "https://pypi.org/project/tddc/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/tddc/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/DataKind-SG/test-driven-data-cleaning" }, "release_url": "https://pypi.org/project/tddc/0.1.1/", "requires_dist": null, "requires_python": null, "summary": "Scaffold out methods and tests for collaborative data cleaning.", "version": "0.1.1" }, "last_serial": 2347714, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "0f92e9875a4ecbdc7922da7413185d63", "sha256": "55e96dff5d80a3d8804fd657c065829138b33321c6324b3a1b20dd3d855320a3" }, "downloads": -1, "filename": "tddc-0.1.0.tar.gz", "has_sig": false, "md5_digest": "0f92e9875a4ecbdc7922da7413185d63", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8549, "upload_time": "2016-09-14T16:51:45", "url": "https://files.pythonhosted.org/packages/9b/5d/a1bd2ec11a0cb691bc84a08765747946a63b84428a3cf79785c4430fcc26/tddc-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "5781ffdf301c808505c04e50aae2c5c3", "sha256": "df77c805f05f14092492f3843382610244e59b0fe9d9829b5e8bf9b931ffd56b" }, "downloads": -1, "filename": "tddc-0.1.1.tar.gz", "has_sig": false, "md5_digest": "5781ffdf301c808505c04e50aae2c5c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8311, "upload_time": "2016-09-17T10:23:10", "url": "https://files.pythonhosted.org/packages/4a/b0/fb06183e56af7f5b39985b9c85870fc8a553fc27fc9b8f184b10fe360e32/tddc-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5781ffdf301c808505c04e50aae2c5c3", "sha256": "df77c805f05f14092492f3843382610244e59b0fe9d9829b5e8bf9b931ffd56b" }, "downloads": -1, "filename": "tddc-0.1.1.tar.gz", "has_sig": false, "md5_digest": "5781ffdf301c808505c04e50aae2c5c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8311, "upload_time": "2016-09-17T10:23:10", "url": "https://files.pythonhosted.org/packages/4a/b0/fb06183e56af7f5b39985b9c85870fc8a553fc27fc9b8f184b10fe360e32/tddc-0.1.1.tar.gz" } ] }