{ "info": { "author": "Sterling Paramore", "author_email": "sterling.paramore@insidetrack.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "# Data Transform Spec\n\ndtspec is an API for specifying and testing data transformations.\n\n## Introduction\n\nTesting data transformations is hard. So hard that a lot of ETL/ELT\nprocesses have little or (more often) no automated tests.\ndtspec aims to make it easier to write and run tests for very complicated\ndata transformations typically encountered in ETL/ELT.\n\nWith dtspec, we imagine a data transformation process that takes a set of\ndata **sources** and transforms them into a set of data **targets**. dtspec\nis primarily concerned with structured data sources, like Pandas\ndataframes or database tables. A user of dtspec defines data **factories** that\ngenerate source data, and a set of **expectations** that describe how the data\nshould look after it's been transformed.\n\nWhile dtspec is written in Python, it is intended to be used as more of a\nlanguage-agnostic API. A dtspec user writes a test **spec**, which is then passed\nto dtspec. dtspec processes that spec and then returns to the user test data for\nall of the source specific in the spec. The user then feeds that test data into\ntheir data transformation system, collects the output, and sends it back to dtspec.\ndtspec compares the actual results of the data transformations with the expected\nresults specific in the spec and reports on any discrepancies.\n\n\n## Tutorial\n\nLet's see this all at work with some examples.\n\n\n### Hello World!\n\nLet's suppose we have a dataset containing student records. Our data\ntransformation simply reads in that data, and returns a new dataframe\nwith a \"Hello \" salutation. We want to test that it says\n\"hello\" to everyone. For the purposes of our tutorial, the data\ntransformation will be written in Pandas as\n\n````python\ndef hello_world_transformer(raw_students):\n salutations_df = raw_students.copy()\n salutations_df[\"salutation\"] = salutations_df['name'].apply(lambda v: 'Hello ' + v)\n\n return {\"salutations\": salutations_df}\n\n````\n\ndtspec is an API that accepts a JSON blob for the transformation spec. However, I strongly\nprefer to write specs in YAML and then convert them into JSON before passing them\non to dtspec. To begin writing our transform spec, we define the dtspec `version`, a `description`\nof the transform spec, and then list out the `sources` and `targets`:\n\n````yaml\n---\nversion: '0.1'\ndescription: HelloWorld - Simplest example of running dtspec\n\n# The names of sources and targets is arbitrary, but it's up to the user to determine\n# how they get mapped to/from their data transformation system.\nsources:\n - source: raw_students\n\ntargets:\n - target: salutations\n````\n\nThese define our inputs and outputs. But we also need to define how to generate\ndata for the input(s). For that, we define a **factory**:\n\n````yaml\nfactories:\n - factory: SomeStudents\n description: Minimal example of what some student records may look like\n\n data:\n - source: raw_students\n # Tables written as a markdown table\n table: |\n | id | name |\n | - | - |\n | 1 | Buffy |\n | 2 | Willow |\n````\n\nLastly, we need to describe how we expect the data to look after it has been transformed.\nTo do this, we define **scenarios** and **cases**. Scenarios are collections of cases\nthat share some common data factory or describe similar situations. For now, our\ntransform spec will just contain a single scenario and a single case:\n\n````yaml\nscenarios:\n - scenario: Hello World\n description: The simplest scenario\n # All cases in this scenario will use this factory (which may be modified on case-by-case basis)\n factory:\n parents:\n - SomeStudents\n\n cases:\n - case: HelloGang\n description: Make sure we say hello to everyone\n expected:\n data:\n - target: salutations\n # The actual output may also contain the \"name\" field, but the expectation\n # will ignore comparing any fields not listed in the expected table.\n table: |\n | id | salutation |\n | - | - |\n | 1 | Hello Buffy |\n | 2 | Hello Willow |\n````\n\nThat's it. See also the [full YAML spec](tests/hello_world.yml).\n\nNow that we've described the full transform spec, we need to use it. The first step is to\nparse the YAML file, send it to the dtspec api, and have dtspec generate source data:\n\n````python\nimport dtspec\nimport yaml\n\nspec = yaml.safe_load(open(\"tests/hello_world.yml\"))\napi = dtspec.api.Api(spec)\napi.generate_sources()\n````\n\nThe specific steps taken at this point are going to be sensitive to the data transformation\nenvironment being used, but we'll stick with our Pandas transformations for the sake of this\ntutorial. Given this, we can define a simple function that converts the source data returned\nfrom dtspec into Pandas dataframes:\n\n````python\nimport pandas as pd\n\ndef parse_sources(sources):\n \"Converts test data returned from dtspec api into Pandas dataframes\"\n\n return {\n source_name: pd.DataFrame.from_records(data.serialize())\n for source_name, data in sources.items()\n }\n````\n\nWe can then run those test Pandas dataframes through our data transformation function.\n\n````python\nsources_data = parse_sources(api.spec[\"sources\"])\nactual_data = hello_world_transformer(**sources_data)\n````\n\nNext, we need to convert the output dataframes of the transformations, `actual_data`,\nback into a format that can be loaded into dtspec for comparison. For Pandas,\nthis function is:\n\n````python\ndef serialize_actuals(actuals):\n \"Converts Pandas dataframe results into form needed to load dtspec api actuals\"\n\n return {\n target_name: json.loads(dataframe.astype(str).to_json(orient=\"records\"))\n for target_name, dataframe in actuals.items()\n }\n````\n\nIt is loaded into dtspec using:\n\n````python\nserialized_actuals = serialize_actuals(actual_data)\napi.load_actuals(serialized_actuals)\n````\n\nFinally, dtspec can be called to run all of the expectations:\n\n````python\napi.assert_expectations()\n````\n\nPutting all of this together:\n````python\nspec = yaml.safe_load(open(\"tests/hello_world.yml\"))\napi = dtspec.api.Api(spec)\napi.generate_sources()\n\nsources_data = parse_sources(api.spec[\"sources\"])\nactual_data = hello_world_transformer(**sources_data)\nserialized_actuals = serialize_actuals(actual_data)\napi.load_actuals(serialized_actuals)\n````\n\nTry running the above code and changing either the YAML spec or the `hello_world_transformer`\nfunction and see how dtspec responds.\n\n### Hello World With Multiple Test Cases\n\nRunning tests with multiple cases that reference the same data sources\nintroduces a complicating factor. One of the reasons that makes\nit hard to build tests for ETL/ELT is the fact that many data\ntransformation systems in use today have a high latency for even very\nsmall transformations. For example, Redshift is a distributed RDBMS\nthat can process billions of rows in minutes, millions of rows in\nseconds, thousands of rows in seconds, or 10s of rows in, well,\nseconds. Given these latency issues, we don't want to have to rely on\nloading data into our system, running a test, clearing out the data,\nloading some more, running the next test, and so on as is often\ndone when testing ORM-based applications like Rails or Django.\n\ndtspec seeks to minimize the number of requests on the data\ntransformation system in order to deal with these latency issues.\nIt does this by \"stacking\" the test data generated in each case\nand delivering back to the user all of this stacked data. The user\nthen loads this stacked data into their data transformation system\n**once**, runs the data transformations **once**, and then collects\nthe resulting output **once**.\n\nLet's see how dtspec handles this in action.\n\nFirst, let's change our hello world data transformation a bit. Instead of\njust saying hello to our heroes, let's say goodbye to any villians (as\nidentified by a `clique` data field).\n\n````python\ndef hello_world_multiple_transformer(raw_students):\n def salutation(row):\n if row[\"clique\"] == \"Scooby Gang\":\n return \"Hello {}\".format(row[\"name\"])\n return \"Goodbye {}\".format(row[\"name\"])\n\n salutations_df = raw_students.copy()\n salutations_df[\"salutation\"] = salutations_df.apply(salutation, axis=1)\n\n return {\"salutations\": salutations_df}\n````\n\nWhile it would be possible to test saying hello or goodbye in a single\ncase just by adding more records to the source data, we'll split it\ninto two to demonstrate how multiple cases work. Here's how the YAML would look:\n\n````yaml\nscenarios:\n - scenario: Hello World With Multiple Cases\n description: The simplest scenario\n factory:\n parents:\n - SomeStudents\n\n cases:\n - case: HelloGang\n description: Make sure we say hello to everyone\n expected:\n data:\n - target: salutations\n table: |\n | id | name | clique | salutation |\n | - | - | - | - |\n | 1 | Buffy | Scooby Gang | Hello Buffy |\n | 2 | Willow | Scooby Gang | Hello Willow |\n\n - case: GoodbyeVillians\n description: Say goodbye to villians\n # For this case, we tweak the factory defined for the scenario.\n factory:\n # The ids here might be the same as above. However, these are just named\n # references and get translated into unique ids when the source data\n # is generated.\n data:\n - source: raw_students\n table: |\n | id | name |\n | - | - |\n | 1 | Drusilla |\n | 2 | Harmony |\n # Use values to populate a constant over all records\n values:\n - column: clique\n value: Vampires\n\n expected:\n data:\n # Again, the ids here are not the actual ids sent to dtspec after performing\n # the transformations. They are just named references and dtspec\n # keeps track of the relationship between the actual ids and the named ones.\n - target: salutations\n table: |\n | id | name | clique | salutation |\n | - | - | - | - |\n | 1 | Drusilla | Vampires | Goodbye Drusilla |\n | 2 | Harmony | Vampires | Goodbye Harmony |\n\n````\n\nThis won't quite work as is, because we're missing something. We have\ntwo cases that describe variations on the source data `raw_students`\nand the output `salutations`. dtspec collects the source data\ndefinitions from each case and stacks them into a single data source.\nThe user then runs the transformations on that source and generates a\nsingle target to provide back to dtspec. But dtspec has to know which record\nbelongs to which case. To do this, we have to define an\n**identifier** that tells dtspec which columns should be used to identify\na record as belonging to a case. A good identifier is often a primary\nkey that uniquely defines a record, but it is not strictly required to\nbe unique across all records.\n\nFor this example, we'll define an identifier called \"students\" with a single\n**identifier attribute** called `id` that is a unique integer:\n\n````yaml\nidentifiers:\n - identifier: students\n attributes:\n - field: id\n generator: unique_integer\n````\n\nWe tell dtspec that this identifier is associated with the `id` columns of both\nthe source and the target via:\n\n````yaml\nsources:\n - source: raw_students\n identifier_map:\n - column: id\n identifier:\n name: students\n attribute: id\n\n\ntargets:\n - target: salutations\n identifier_map:\n - column: id\n identifier:\n name: students\n attribute: id\n````\n\nWith the sources and targets with identifiers, the values we see in\nthe source factories and target expectations are not the values that\nare actually used in the data. Instead, they are simply **named\nrefereces**. For example, in the \"HelloGang\" case, `id=1` belongs to\nBuffy and `id=2` belongs to Willow. But when dtspec generates the source\ndata, the actual values may be 3 and 9, or 4 and 7, or something else.\nUnique values are not generated in any deterministic manner -- each\nrun of dtspec can give a diferent set. dtspec only guarantees that the\neach named reference will be a unique integer (via the `generator`\ndefined in the `identifier` section).\n\nFuthermore, in the second case called \"GoodbyeVillians\", we see that\n`id=1` belongs to Drusilla and `id=2` belongs to Harmony. dtspec will\ngenerate unique values for this case as well, and they **will not**\nconflict with the values generated for the first case. So dtspec will pass\nback to the user 4 total records (Buffy, Willow, Drusilla, Harmony) with 4\ndifferent ids\n\nWith the [full YAML spec](tests/hello_world_multiple_cases.yml) defined, we can\nrun the assertions in the same fashion as the the earlier example\n\n````python\nspec = yaml.safe_load(open(\"tests/hello_world_multiple_cases.yml\"))\napi = dtspec.api.Api(spec)\napi.generate_sources()\n\nsources_data = parse_sources(api.spec[\"sources\"])\nactual_data = hello_world_multiple_transformer(**sources_data)\nserialized_actuals = serialize_actuals(actual_data)\napi.load_actuals(serialized_actuals)\n\napi.assert_expectations()\n````\n\n#### Embedded Identifiers\n\nIt is also possible to embed identifiers in the value of a particular column.\nFor example, suppose our `salutation` column said hello to the `id` instead\nof the name of the person. To make this work, we have to put a particular\nstring pattern in the column that indicates the name of the identifier, the\nattribute, and the named id - `{identifier.attribute[named_id]}`. The\nyaml spec would look like:\n\n````yaml\n - case: HelloGang\n description: Make sure we say hello to everyone\n expected:\n data:\n - target: salutations\n table: |\n | id | name | clique | salutation |\n | - | - | - | - |\n | 1 | Buffy | Scooby Gang | Hello {students.id[1]} |\n | 2 | Willow | Scooby Gang | Hello {students.id[2]} |\n````\nThe [realistic example](tests/realistic.yml) discussed below has another example\nof using embedded identifiers.\n\n**Note** that embedded identifiers cannot be used to associate records\nwith cases. A target must have at least one column listed in the\n`identifier_map` section.\n\n### A More Realistic Example\n\nFinally, let's example a more realistic example that one might\nencounter when building a data warehouse. In these situations, we'll\nhave multiple sources, targets, scenarios, and cases. Now suppose we\nhave a students table, where every student belongs to a school and\ntakes 0 to many classes. Our goal is to create one denormalized table\nthat combines all of these data sources into one table. Additionally,\nwe want to create a table that aggregates all of our students to give\na count of the students per school. In Pandas, the data transformation\nmight look like:\n\n````python\ndef realistic_transformer(raw_students, raw_schools, raw_classes, dim_date):\n\n student_schools = raw_students.rename(\n columns={\"id\": \"student_id\", \"external_id\": \"card_id\"}\n ).merge(\n raw_schools.rename(columns={\"id\": \"school_id\", \"name\": \"school_name\"}),\n how=\"inner\",\n on=\"school_id\",\n )\n\n student_classes = student_schools.merge(\n raw_classes.rename(columns={\"name\": \"class_name\"}),\n how=\"inner\",\n on=\"student_id\",\n ).merge(\n dim_date.rename(columns={\"date\": \"start_date\"}), how=\"left\", on=\"start_date\"\n )\n\n student_classes[\"student_class_id\"] = student_classes.apply(\n lambda row: \"-\".join([str(row[\"card_id\"]), str(row[\"class_name\"])]), axis=1\n )\n\n students_per_school = (\n student_schools.groupby([\"school_name\"])\n .size()\n .to_frame(name=\"number_of_students\")\n .reset_index()\n )\n\n return {\n \"student_classes\": student_classes,\n \"students_per_school\": students_per_school,\n }\n````\n\nGiven the [full YAML spec](tests/realistic.yml) defined, we can again run\nthe data assertions using a familiar pattern:\n\n````python\nspec = yaml.safe_load(open(\"tests/realistic.yml\"))\napi = dtspec.api.Api(spec)\napi.generate_sources()\n\nsources_data = parse_sources(api.spec[\"sources\"])\nactual_data = hello_world_multiple_transformer(**sources_data)\nserialized_actuals = serialize_actuals(actual_data)\napi.load_actuals(serialized_actuals)\n\napi.assert_expectations()\n````\n\n## dbt support\n\ndtspec also contains a CLI tool that can facilitate using it with [dbt](https://getdbt.com).\nThe CLI tools helps you set up a test environment, run dbt in that environment, and\nexecute the dbt tests. The CLI tool currently only works for Postgres and Snowflake dbt\nprojects.\n\nSee the [dbt-container-skeleton](https://github.com/gnilrets/dbt-container-skeleton) for a\nworking example.\n\n### dtspec CLI Config\n\nAll of the dtspec files should be placed in a subdirectory of your dbt project: `dbt/dtspec`.\nThe first thing to set up for the dtspec CLI is the configuration file, which should\nbe placed in `dtspec/config.yml`. The configuration file tells dtspec how to recreate\nthe table schemas in a test environment, where to recreate the table schemas, and where\nto find the results of a dbt run. Here is an example:\n\n````yaml\n# A target environment is where the output of data transformations appear.\n# Typically, there will only be on target environment.\ntarget_environments:\n # The target environment IS NOT your production environment. It needs to be a separate\n # database where dbt will run against the test data that dtspec generates. The name\n # of this environment needs to be the same as a target defined in dbt profiles.yml (in this case `dtspec`)\n dtspec:\n # Field names here follow the same conventions as dbt profiles.yml (https://docs.getdbt.com/dbt-cli/configure-your-profile)\n type: postgres\n host: \"{{ env_var('POSTGRES_HOST') }}\"\n port: 5432\n user: \"{{ env_var('POSTGRES_USER') }}\"\n password: \"{{ env_var('POSTGRES_PASSWORD') }}\"\n dbname: \"{{ env_var('POSTGRES_DBNAME') }}_dtspec\"\n\n# A source environment is where source data is located. It may be in the same database\n# as the target environment or it may be different if the data warehouse supports it (e.g., Snowflake).\n# It is also possible to define several source environments if your source data is spread\n# across multiple databases.\nsource_environments:\n raw:\n # Use `tables` to specify source tables that need to be present to run tests.\n tables:\n # `wh_raw` is the name of a namespace (aka schema) in the `raw` source environment\n wh_raw:\n # tables may be listed indivdually (or, use `wh_raw: '*'` to indicate all tables within the `wh_raw` namespace)\n - raw_customers\n - raw_orders\n - raw_payments\n\n # In order to run tests, we need to replicate the table schemas in the test environment.\n # The schema section here contains credentials for a database where those tables are defined.\n # This is likely a production database (in your warehouse), or is a production replica.\n # dtspec only uses this database to read reflect the table schemas (via `dtspec db --fetch-schemas`).\n schema:\n type: postgres\n host: \"{{ env_var('POSTGRES_HOST') }}\"\n port: 5432\n user: \"{{ env_var('POSTGRES_USER') }}\"\n password: \"{{ env_var('POSTGRES_PASSWORD') }}\"\n dbname: \"{{ env_var('POSTGRES_DBNAME') }}\"\n # The test section contains credentials for a database where test data will be created.\n # Data in this database is destroyed and rebuilt for every run of dtspec and SHOULD NOT be\n # the same as the schema credentials defined above.\n test:\n type: postgres\n host: \"{{ env_var('POSTGRES_HOST') }}\"\n port: 5432\n user: \"{{ env_var('POSTGRES_USER') }}\"\n password: \"{{ env_var('POSTGRES_PASSWORD') }}\"\n dbname: \"{{ env_var('POSTGRES_DBNAME') }}_dtspec\"\n\n # Pretending snapshots are in a different database because Postgres doesn't support cross-db queries.\n # This is how you would do it if snapshots were in a different database than other raw source data.\n snapshots:\n tables:\n snapshots: '*'\n schema:\n type: postgres\n host: \"{{ env_var('POSTGRES_HOST') }}\"\n port: 5432\n user: \"{{ env_var('POSTGRES_USER') }}\"\n password: \"{{ env_var('POSTGRES_PASSWORD') }}\"\n dbname: \"{{ env_var('POSTGRES_DBNAME') }}\"\n test:\n type: postgres\n host: \"{{ env_var('POSTGRES_HOST') }}\"\n port: 5432\n user: \"{{ env_var('POSTGRES_USER') }}\"\n password: \"{{ env_var('POSTGRES_PASSWORD') }}\"\n dbname: \"{{ env_var('POSTGRES_DBNAME') }}_dtspec\"\n````\n\n### Test environment setup\n\nOnce the configuration file has been defined, the next step is to fetch/reflect schemas for\nthe source tables. From the `dbt` directory, run the following CLI command:\n\n dtspec db --fetch-schemas\n\nThis will query all of the databases defined in the `schema` section of the source\nenvironments defined in `dtspec/config.yml`, and create table schema files in `dtspec/schemas`.\nThe files in this directory should be committed to source control and updated whenever\nyour source data changes (in so much as it would affect the dtspec tests).\n\nNext, initialize the test databases defined in the `test` section of the source\nenvironments defined in `dtspec/config.yml` with the CLI command\n\n dtspec db --init-test-db\n\nThis will create empty source tables in your test databases, ready to be loaded with test data.\n\n\n### Executing tests\n\nIn order to use dtspec with dbt, spec files must make use of the `dbt_source` and `dbt_ref`\nJinja functions. These are analogous to the dbt `source` and `ref` functions. dtspec\nwill compile your dbt project and use the `dbt/target/manifest.json` file to resolve the names\nof sources and targets that you want to test. For example, the SomeStudents factory\nwould be written as follows if this were a dbt project:\n\n````yaml\nfactories:\n - factory: SomeStudents\n data:\n - source: {{ dbt_source('raw', 'raw_students') }}\n table: |\n | id | name |\n | - | - |\n | 1 | Buffy |\n | 2 | Willow |\n````\n\nand an expectation would be:\n\n````yaml\n cases:\n - case: HelloGang\n expected:\n data:\n - target: {{ dbt_ref('salutations') }}\n table: |\n | id | salutation |\n | - | - |\n | 1 | Hello Buffy |\n | 2 | Hello Willow |\n````\n\nWith these references set, dtspec tests can be executed via the CLI command:\n\n dtspec test-dbt\n\nThis command will do the following:\n\n1. It will first compile your dbt project. If your dbt code does not change between\n dtspec tests, you may skip this step by pass the `--partial-parse` argument.\n2. The dtspec spec files are compiled into a single document and dbt references are resolved.\n The compiled dtspec document is output to `dtspec/compiled_specs.yml`, which does not\n need to be saved to source control.\n3. Source data is generated and loaded into the test databases.\n4. dbt is executed against the test database.\n5. The models that dbt built in the target test environment are extracted. These are the \"actuals\".\n6. The actuals are compared with the expected data as specified in the dtspec specs.\n\nThe `test-dbt` command has several options that may be useful. See `dtspec test-dbt -h` for a full\nlist, but here are some noteworthy options:\n\n- `--models` specifies the models that dbt should run, using standard dbt model selection syntax.\n- `--scenarios` is used to restrict the number of scenarios that are tested. The argument is a\n regular expression that will match on the compiled Scenario name. This can be used\n in combination with the `--models` command to only run those tests and models that you're\n concerned with.\n\n### Additonal CLI notes\n\n#### Log level\n\nIf you want to see more detailed loggin information, set the `DTSPEC_LOG_LEVEL` environment\nvariable (options are DEBUG, INFO, WARN, and ERROR). For example:\n\n DTSPEC_LOG_LEVEL=INFO dtspec test-dbt\n\n#### Project location\n\nIf you really don't want to put dtspec in the dbt project directory you can override the\ndefault by setting `DTSPEC_ROOT` and `DBT_ROOT` environment variables that point\nto the root path of these projects.\n\n#### Special Values\n\nWhen dtspec is run via the CLI, it recognizes nulls and booleans in the spec files. To\nindicate these kinds of values in a dtspec spec, use `{NULL}`, `{True}`, and `{False}`.\nFor example:\n\n````yaml\n cases:\n - case: HelloGang\n expected:\n data:\n - target: {{ dbt_ref('salutations') }}\n table: |\n | id | salutation | is_witch |\n | - | - | - |\n | 1 | Hello Buffy | {False} |\n | 2 | Hello Willow | {True} |\n | 3 | Hello NA | {NULL} |\n````\n\n#### Jinja context\n\nWhen writing spec files that will be parsed with the dtspec CLI, the following functions\nare available in the jinja context:\n\n* `datetime` -- This is the [Python datetime.datetime type](https://docs.python.org/3/library/datetime.html)\n* `date` -- This is the [Python datetime.date type](https://docs.python.org/3/library/datetime.html)\n* `relativedelta` -- This is the [Python relativedelta type0](https://dateutil.readthedocs.io/en/stable/relativedelta.html)\n* `UTCNOW` -- The UTC datetime value at the time the specs are parsed\n* `TODAY` -- The current UTC date value at the time the specs are parsed\n* `YESTERDAY` -- Yesterday's date\n* `TOMORROW` -- Tomorrow's date\n* `dbt_source` -- Used to reference dbt sources\n* `dbt_ref` -- Used to reference dbt models\n\nSome example of using these functions:\n\n - source: raw_products\n table: |\n | export_time | file | product_id | product_name |\n | - | - | - | - |\n | {{ YESTERDAY }} | products-2021-01-06.csv | milk | Milk |\n | {{ TODAY - relativedelta(days=5) }} | products-2021-01-02.csv | milk | Milk |\n\n\n## Additional notes about dtspec\n\n* At the moment, all source data values are generated as strings. It\n is up to the the user to enforce data types suitable to their data\n transformation system. Note that the dtspec dbt CLI commands handle this\n for Postgres and Snowflake warehouses.\n* Additionally, data expectations are stringified prior to running assertions.\n\n## Contributing\n\nWe welcome contributors! Please submit any suggests or pull requests in Github.\n\n### Developer setup\n\nCreate an appropriate python environment. I like [miniconda](https://conda.io/miniconda.html),\nbut use whatever you like:\n\n conda create --name dtspec python=3.8\n conda activate dtspec\n\nThen install pip packages\n\n pip install pip-tools\n pip install --ignore-installed -r requirements.txt\n\nrun tests via\n\n inv test\n\nand the linter via\n\n inv lint", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/inside-track/dtspec", "keywords": "etl elt data testing", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "dtspec", "package_url": "https://pypi.org/project/dtspec/", "platform": "", "project_url": "https://pypi.org/project/dtspec/", "project_urls": { "Homepage": "https://github.com/inside-track/dtspec" }, "release_url": "https://pypi.org/project/dtspec/0.7.5/", "requires_dist": null, "requires_python": ">=3", "summary": "dtspec - Data Test Spec", "version": "0.7.5", "yanked": false, "yanked_reason": null }, "last_serial": 12364348, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "349de548be91ed8ea42e496bfc88628e", "sha256": "4b769cfa263ccdef7100994497f55241128d6e96b20b2dfcf6b7b58a632be7e1" }, "downloads": -1, "filename": "dtspec-0.1.0.tar.gz", "has_sig": false, "md5_digest": "349de548be91ed8ea42e496bfc88628e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 15407, "upload_time": "2019-09-24T18:38:29", "upload_time_iso_8601": "2019-09-24T18:38:29.225137Z", "url": "https://files.pythonhosted.org/packages/de/0f/c4a6fab0dee8d1f5888301f6728914acd503a15885206f76bcbc77b8f823/dtspec-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "fefbadac1ad520f334d3fd26b9d71e12", "sha256": "e0a058cf7a49d9f26e542b3517f172415345f6e87e66c2b63394849a35cd2a71" }, "downloads": -1, "filename": "dtspec-0.1.1.tar.gz", "has_sig": false, "md5_digest": "fefbadac1ad520f334d3fd26b9d71e12", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 16117, "upload_time": "2019-09-26T00:30:35", "upload_time_iso_8601": "2019-09-26T00:30:35.842779Z", "url": "https://files.pythonhosted.org/packages/1c/c1/106fc42af0dd73996374024fb7fe7532942d14fb2aeb756c538789409470/dtspec-0.1.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "8d2408693688207f4ee8a7288decf896", "sha256": "1b9247a4d3576f091405b3df9e4613c6f3bca1821c6a770f7a23bad4e4443918" }, "downloads": -1, "filename": "dtspec-0.2.0.tar.gz", "has_sig": false, "md5_digest": "8d2408693688207f4ee8a7288decf896", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 16218, "upload_time": "2019-10-01T00:12:38", "upload_time_iso_8601": "2019-10-01T00:12:38.127133Z", "url": "https://files.pythonhosted.org/packages/f2/02/571590d31b66983c10c09ce57ba386327bf04deebd526257b3057d3f2192/dtspec-0.2.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "e353af2e52884e4cdd92c0ac8c741e20", "sha256": "0ef674377d036f4c8590f8ca20861bdb341dc7fa18798d7437d4c3deb5d6d486" }, "downloads": -1, "filename": "dtspec-0.3.0.tar.gz", "has_sig": false, "md5_digest": "e353af2e52884e4cdd92c0ac8c741e20", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 17179, "upload_time": "2019-10-09T18:55:20", "upload_time_iso_8601": "2019-10-09T18:55:20.684475Z", "url": "https://files.pythonhosted.org/packages/7e/18/fe67c614ec1f12ece4817338937a32d57010e27cd33fec53f1a9a971290f/dtspec-0.3.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "9394c9270c39e2174c117bcabee1c6c8", "sha256": "63cfda9022a3e42ad2312978fabfaa35c27d3d34f536a222cf3ddd9301f7615c" }, "downloads": -1, "filename": "dtspec-0.4.0.tar.gz", "has_sig": false, "md5_digest": "9394c9270c39e2174c117bcabee1c6c8", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 17544, "upload_time": "2019-10-30T23:55:06", "upload_time_iso_8601": "2019-10-30T23:55:06.905416Z", "url": "https://files.pythonhosted.org/packages/c7/27/0d14dbafda7f0cd25fd2245eaa1dadd6e2b0cb5d7cd6ca270cba469979e0/dtspec-0.4.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "c59ac73467f5d515f694e2f4cb4939ee", "sha256": "d42b93931680a55a9a32a6e4b4a25795a3537f74f7a3a2ee7be63093badb2ff2" }, "downloads": -1, "filename": "dtspec-0.5.0.tar.gz", "has_sig": false, "md5_digest": "c59ac73467f5d515f694e2f4cb4939ee", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 17913, "upload_time": "2019-11-04T23:44:56", "upload_time_iso_8601": "2019-11-04T23:44:56.779636Z", "url": "https://files.pythonhosted.org/packages/a0/a5/3b0039fd7237c0414794cd5ec35851c8399a0514b533ba332bf51535d106/dtspec-0.5.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.6.0": [ { "comment_text": "", "digests": { "md5": "0d272bd5f7dc5b32197e25cb40873c20", "sha256": "4f3d2758af9c2deca43d873c2725a0a2b0ccf6b03668d924a73fbe7dbb6c6516" }, "downloads": -1, "filename": "dtspec-0.6.0.tar.gz", "has_sig": false, "md5_digest": "0d272bd5f7dc5b32197e25cb40873c20", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 18849, "upload_time": "2019-11-06T23:11:05", "upload_time_iso_8601": "2019-11-06T23:11:05.457716Z", "url": "https://files.pythonhosted.org/packages/84/4e/764c8f9b6b3efd080d510aca66f082fd2e33c6e1a578502b147b19f617de/dtspec-0.6.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "555713cad27452a6922b9487dab4d21d", "sha256": "e8f4c32cb7c04cd42202c59b49a3749092060a2e5fb7bc0e21593f1004ae82bf" }, "downloads": -1, "filename": "dtspec-0.6.1.tar.gz", "has_sig": false, "md5_digest": "555713cad27452a6922b9487dab4d21d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 18893, "upload_time": "2019-11-07T22:35:55", "upload_time_iso_8601": "2019-11-07T22:35:55.787412Z", "url": "https://files.pythonhosted.org/packages/d8/65/5d0a0f8ea68e6574e3b3f07ca14f7b0f318b7d5810cefb113651c0cc2c05/dtspec-0.6.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.6.2": [ { "comment_text": "", "digests": { "md5": "fac0120611353d6424de2cfa7c5ae5a3", "sha256": "34c1ac6a2770dafdbe65fad4a743523477f150284dae45494f322d2690bde384" }, "downloads": -1, "filename": "dtspec-0.6.2.tar.gz", "has_sig": false, "md5_digest": "fac0120611353d6424de2cfa7c5ae5a3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 19071, "upload_time": "2019-11-08T17:15:18", "upload_time_iso_8601": "2019-11-08T17:15:18.016152Z", "url": "https://files.pythonhosted.org/packages/b8/44/a52277df6ecdd7acb7e3ad7858da5938441d06974c94573cbd9fdb2eb3eb/dtspec-0.6.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.6.3": [ { "comment_text": "", "digests": { "md5": "06b8e2b402f35aa3eb21ddadb1def9fe", "sha256": "84a786772c74d822bf6392c923b439a1787600340426ad2ec85ed2eb8cb51b6b" }, "downloads": -1, "filename": "dtspec-0.6.3.tar.gz", "has_sig": false, "md5_digest": "06b8e2b402f35aa3eb21ddadb1def9fe", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 19059, "upload_time": "2020-04-13T17:23:21", "upload_time_iso_8601": "2020-04-13T17:23:21.132531Z", "url": "https://files.pythonhosted.org/packages/b0/55/bb29c5e7b3cbcab3a1666f922e2dcef7e23f822a5f5169239921045e1594/dtspec-0.6.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.7.0": [ { "comment_text": "", "digests": { "md5": "14a7e49f537b4fa929eb05d7e21e2c00", "sha256": "4dbca91ab5f097f1e910f7827b466e6429d5e83f3694ff7873fb5c0e49262ea8" }, "downloads": -1, "filename": "dtspec-0.7.0.tar.gz", "has_sig": false, "md5_digest": "14a7e49f537b4fa929eb05d7e21e2c00", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 41331, "upload_time": "2021-04-16T21:43:34", "upload_time_iso_8601": "2021-04-16T21:43:34.366130Z", "url": "https://files.pythonhosted.org/packages/99/78/692c1a737dd5c3a427cbfd867d6ff42b9b7800f4d28ce900a25afc533441/dtspec-0.7.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.7.1": [ { "comment_text": "", "digests": { "md5": "f0065cc7e5012f5023b69dbba212a64d", "sha256": "e7388027337760e7e8ae72ad94581e72bfb57462c3bfdd0b00b1db9bd37a9a88" }, "downloads": -1, "filename": "dtspec-0.7.1.tar.gz", "has_sig": false, "md5_digest": "f0065cc7e5012f5023b69dbba212a64d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 41332, "upload_time": "2021-04-16T22:23:18", "upload_time_iso_8601": "2021-04-16T22:23:18.051695Z", "url": "https://files.pythonhosted.org/packages/38/ca/e9cac70a8c3ec8035609b06141ed1be3c24ce32d8640cedf6007be99678f/dtspec-0.7.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.7.2": [ { "comment_text": "", "digests": { "md5": "d45bad11c740b47b15b66dd17049378c", "sha256": "1cd37b348cdb850819cdbf057af04ba746e2c1cb88c1612ec86802c8caa00b7a" }, "downloads": -1, "filename": "dtspec-0.7.2.tar.gz", "has_sig": false, "md5_digest": "d45bad11c740b47b15b66dd17049378c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 44241, "upload_time": "2021-07-20T22:35:38", "upload_time_iso_8601": "2021-07-20T22:35:38.720839Z", "url": "https://files.pythonhosted.org/packages/9d/2d/bcea04a7526ddaea0edd76a3c2c246e76dc8e79bdc069246a8aa724170fd/dtspec-0.7.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.7.3": [ { "comment_text": "", "digests": { "md5": "5eebb8c575e8408703f00d1c1ea13fdd", "sha256": "d99f6f88dbe7d8ee0240b403febeb43f06845e98479db7a0659bea30de4f11c8" }, "downloads": -1, "filename": "dtspec-0.7.3.tar.gz", "has_sig": false, "md5_digest": "5eebb8c575e8408703f00d1c1ea13fdd", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 44258, "upload_time": "2021-10-12T23:54:38", "upload_time_iso_8601": "2021-10-12T23:54:38.157058Z", "url": "https://files.pythonhosted.org/packages/b6/f7/baab02a73dc7f9544e2325b572de8360b3cc5b7e41f0a63799db51f46950/dtspec-0.7.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.7.4": [ { "comment_text": "", "digests": { "md5": "fb8ff6866257323e9f545ffcfd750607", "sha256": "5c0f4c9c953e763b043ade66f5af30843bdfed8bfd2b79a197b4166681239d8f" }, "downloads": -1, "filename": "dtspec-0.7.4.tar.gz", "has_sig": false, "md5_digest": "fb8ff6866257323e9f545ffcfd750607", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 44275, "upload_time": "2021-10-19T23:16:41", "upload_time_iso_8601": "2021-10-19T23:16:41.364884Z", "url": "https://files.pythonhosted.org/packages/e5/9d/724f3b370fac1de710b8345b38e1bd6169a11ce081e21d8d07e24974df5e/dtspec-0.7.4.tar.gz", "yanked": false, "yanked_reason": null } ], "0.7.5": [ { "comment_text": "", "digests": { "md5": "2c5c3d465a8cd9c2e13cc3b921464b43", "sha256": "a5333779b69e357ad6d050dff7dd0eef0243d322ddd3deda0139c5417aa209a3" }, "downloads": -1, "filename": "dtspec-0.7.5.tar.gz", "has_sig": false, "md5_digest": "2c5c3d465a8cd9c2e13cc3b921464b43", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 44291, "upload_time": "2021-12-01T19:40:03", "upload_time_iso_8601": "2021-12-01T19:40:03.552622Z", "url": "https://files.pythonhosted.org/packages/38/94/ebc9c6bf397744ec6bfad655cd2c298cbb4dd573ce98bd5a1150699a7d09/dtspec-0.7.5.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2c5c3d465a8cd9c2e13cc3b921464b43", "sha256": "a5333779b69e357ad6d050dff7dd0eef0243d322ddd3deda0139c5417aa209a3" }, "downloads": -1, "filename": "dtspec-0.7.5.tar.gz", "has_sig": false, "md5_digest": "2c5c3d465a8cd9c2e13cc3b921464b43", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 44291, "upload_time": "2021-12-01T19:40:03", "upload_time_iso_8601": "2021-12-01T19:40:03.552622Z", "url": "https://files.pythonhosted.org/packages/38/94/ebc9c6bf397744ec6bfad655cd2c298cbb4dd573ce98bd5a1150699a7d09/dtspec-0.7.5.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }