{ "info": { "author": "Karik Isichei", "author_email": "karik.isichei@digital.justice.gov.uk", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# data_linter\n\nA python package that validates datasets against a metadata schema which is defined [here](https://github.com/moj-analytical-services/data_linter/blob/master/data_linter/data/metadata_jsonschema.json).\n\nIt performs the following checks:\n- Are the columns of the correct data types (or can they be converted without error using `pd.Series.astype` in the case of untyped data formats like `csv`)\n- Column names:\n - Are the columns named correctly?\n - Are they in the same order specified in the meta data\n - Are there any missing columns?\n- Where a regex `pattern` is provided in the metadata, does the actual data always fit the `pattern`\n- Where an `enum` is provided in the metadata, does the actual data contain only values in the `enum`\n- Where `nullable` is set to false in the metadata, are there really no nulls in the data?\n\nThe package also provides functionality to `impose_metadata_types_on_pd_df`, which allows the user to safely convert a pandas dataframe to the datatypes specified in the metadata. This is useful in the case you have an untyped data file such as a `csv` and want to ensure it is conformant with the metadata.\n\n## Usage\n\nFor detailed information about how to use the package, please see the [demo repo](https://github.com/moj-analytical-services/data_linter_demo). This includes an interactive tutorial that you can run in your web browser.\n\nHere's a very basic example\n\n```\nimport pandas as pd\nimport json\n\nfrom data_linter.lint import Linter\n\ndef read_json_from_path(path):\n with open(path) as f:\n return_json = json.load(f)\n return return_json\n\nmeta = read_json_from_path(\"tests/meta/test_meta_cols_valid.json\")\ndf = pd.read_parquet(\"tests/data/test_parquet_data_valid.parquet\")\nl = Linter(df, meta)\nl.check_all()\nl.markdown_report()\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/moj-analytical-services/data_linter", "keywords": "", "license": "MIT", "maintainer": "Karik Isichei", "maintainer_email": "karik.isichei@digital.justice.gov.uk", "name": "data-linter", "package_url": "https://pypi.org/project/data-linter/", "platform": "", "project_url": "https://pypi.org/project/data-linter/", "project_urls": { "Homepage": "https://github.com/moj-analytical-services/data_linter", "Repository": "https://github.com/moj-analytical-services/data_linter" }, "release_url": "https://pypi.org/project/data-linter/0.1.0/", "requires_dist": [ "great-expectations (>=0.7.0,<0.8.0)", "pandas (>=0.25.0,<0.26.0)", "parameterized (>=0.7.0,<0.8.0)", "tabulate (>=0.8.0,<0.9.0)", "pyarrow (>=0.14.0,<0.15.0)", "jsonschema (>=3.0.0,<3.1.0)" ], "requires_python": ">=3.6,<4.0", "summary": "A python package that validates datasets against a metadata schema", "version": "0.1.0" }, "last_serial": 5737371, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "1f29ac40c43eac4b8dd74ffaec860e65", "sha256": "ea4920005c62f89a3daaa4c3357dfe64e0373ec321c8f14b39257fdf0a0d2139" }, "downloads": -1, "filename": "data_linter-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "1f29ac40c43eac4b8dd74ffaec860e65", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 13136, "upload_time": "2019-08-27T15:04:01", "url": "https://files.pythonhosted.org/packages/49/a5/a0393051dfb52b00fd3df67717788ae0a4892a285270b4c56f39a986cbca/data_linter-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3c6f620617f15365d59c9a6a71eccac3", "sha256": "9a4ea28e01c8459189051a778c9779fed85256f44c318f429f634a93d01110fb" }, "downloads": -1, "filename": "data_linter-0.1.0.tar.gz", "has_sig": false, "md5_digest": "3c6f620617f15365d59c9a6a71eccac3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 10650, "upload_time": "2019-08-27T15:04:04", "url": "https://files.pythonhosted.org/packages/c3/1e/5a0a2c964d2fa07b7a7d855c406eef9b36a03fefc85d01ff3b880b03a725/data_linter-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1f29ac40c43eac4b8dd74ffaec860e65", "sha256": "ea4920005c62f89a3daaa4c3357dfe64e0373ec321c8f14b39257fdf0a0d2139" }, "downloads": -1, "filename": "data_linter-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "1f29ac40c43eac4b8dd74ffaec860e65", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6,<4.0", "size": 13136, "upload_time": "2019-08-27T15:04:01", "url": "https://files.pythonhosted.org/packages/49/a5/a0393051dfb52b00fd3df67717788ae0a4892a285270b4c56f39a986cbca/data_linter-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3c6f620617f15365d59c9a6a71eccac3", "sha256": "9a4ea28e01c8459189051a778c9779fed85256f44c318f429f634a93d01110fb" }, "downloads": -1, "filename": "data_linter-0.1.0.tar.gz", "has_sig": false, "md5_digest": "3c6f620617f15365d59c9a6a71eccac3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6,<4.0", "size": 10650, "upload_time": "2019-08-27T15:04:04", "url": "https://files.pythonhosted.org/packages/c3/1e/5a0a2c964d2fa07b7a7d855c406eef9b36a03fefc85d01ff3b880b03a725/data_linter-0.1.0.tar.gz" } ] }