{
"info": {
"author": "Pierre-Alain Jachiet - DREES",
"author_email": "ld-lab-github@sante.gouv.fr",
"bugtrack_url": null,
"classifiers": [],
"description": "==================\nTable Schema Faker\n==================\n\nGenerate tabular fake data conforming to a `Table Schema `_\n\nUsage\n=====\n\nInstallation\n------------\n\n.. code:: bash\n\n $ pip3 install tsfaker\n\n\nSimple usage\n------------\nGenerate 3 rows of fake data from a single table schema file.\n\n.. code:: bash\n\n $ tsfaker https://gitlab.com/healthdatahub/tsfaker/raw/master/tests/schemas/implemented_types.json --nrows 3 --pretty\n string number integer date datetime year yearmonth\n 0 QZluRNRoaJ 8524064526.189381 5603365028 1918-06-09 1963-02-25T15:27:14 1927 1968-03\n 1 OAXCFryYDVMWmRTnP 8084094810.096195 -9782888534 1995-06-06 1924-06-14T07:41:59 1928 1929-02\n 2 -6416720321.04726 -1060427558 2006-12-11 2002-12-25T07:41:47 1999 1914-11\n\n\nAdvanced usage\n--------------\n\nShow help message.\n\n.. code:: bash\n\n $ tsfaker --help\n Usage: tsfaker [OPTIONS] [SCHEMA_DESCRIPTORS]...\n ...\n\n\nDownload examples schemas from project **schema-snds**.\n\n.. code:: bash\n\n $ git clone https://gitlab.com/healthdatahub/schema-snds && cd schema-snds\n\n\nGenerate fake data for all schemas in a **schemas** folder, and write them to **fake_data** folder.\n\n.. code:: bash\n\n $ mkdir fake_data\n $ tsfaker schemas -o fake_data\n 2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnE.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnE.csv'\n 2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI MCO/T_MCOaa_nnFASTC.json' will be written on 'fake_data/PMSI/PMSI MCO/T_MCOaa_nnFASTC.csv'\n 2019-01-01 00:00:00 :: INFO :: Data generated from descriptor 'schemas/PMSI/PMSI SSR/T_SSRaa_nnE.json' will be written on 'fake_data/PMSI/PMSI SSR/T_SSRaa_nnE.csv'\n ...\n\n\nGoals\n=====\n\nWe aim to generate fake data conforming to a *schema*.\n\nWe do not aim to generate realistic data with statistical information (see related work).\n\nImplementation steps\n--------------------\n\n- Generate data conforming to types\n- Generate data conforming to formats and constraints, such as min/max, enum, missing values, unique, length, and regex\n- Generate multiple tables conforming to foreign key references, with optional tables' data provided through csv\n\nAPI\n---\n\n- We want to provide both a Python API and a command line API\n\nDevelopment methodology\n-----------------------\n\nWe will conform to Test Driven Development methodology, hence writing test before writing implementation.\n\nWe want generated data to be valid when using `goodtables `_.\n\nWe could go by conforming to more and more `content checks `_, which are included in table-schema specification.\n\nRelated Python libraries\n========================\n\nWe may use directly or get inspiration from the following libraries\n\nSimple data Generators\n----------------------\n\n- `numpy `_ comes with many functions to generate random data.\n\n- `rstr `_ and `exrex `_ generate random string following regular expressions.\n\n- `Faker `_ and `Mimesis `_ allow to generate fake data. They both focus on generating high level data, such as names, email or addresses, which does not seem necessary for us.\n\n- `DataScienceFaker `_ generate synthetic data conforming to statistical distributions. It is based on numpy and rstr.\n\nTable generator\n---------------\n\n- `pydbgen `_ is a shallow wrapper around Faker to generate tables as pandas dataframe, sqlite table or Excel files.\n\n- `pySyntheticDatasetGenerator `_ is a wrapper around dsfaker, that generate tables with their relations as described in yaml configuration files.\n\n- `datafiller `_ generate random data from database schema. API could be interesting.\n\n- `plaitpy `_ is a fake table generator from a yaml configuration file.\n\n\nRealistic data\n--------------\n\nGenerating realistic data - ie data carrying statistical information - could mean different things in different contexts :\n\n- realistic statistical distribution on single columns,\n- realistic temporal dynamics,\n- realistic correlations between pairs of columns,\n- realistic correlations between pairs of columns from different (joinable) tables,\n- etc.\n\nHence there is no universal way to generate realistic data. Most approaches follow two steps :\n\n1. learn a statistical model from the real data,\n2. generate data using this model.\n\nThe statistical model depends of the context, and is usually not expressed in the form of a generic schema, such as table-schema.\nHowever, a schema of your data will be often be necessary to *configure* this kind of libraries.\n\nThis topic is an active research area, with many articles but few production implementations :\n\n- `DataSynthesizer `_ (`article `__) learn a diferentially private Bayesian network capturing the correlation structure between attributes\n- `dpgan `_ (`article `__) Differentially Private Releasing via Deep Generative Model.\n- `SDV `_ (`article `__) Generative modeling for relational databases.\n- `medGAN `_ (`article `__) Generative adversarial network for generating electronic health records.\n\nThe statistical model may convey sensitive information and personnal data. \nIt is important fact to bear in mind, as protecting sensitive information is a common reason to generate fake data in the first place.\n\nSome tools offer ways to mitigate the risk from personal data leakage, with no formal guarantees.\nOther tools offer formal privacy guarantees through `differential privacy `_.\n\nAn active line of work is to use Generative Adversial Network to generate realistic data, for example dpgan (see above) or `Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing `__.\n\nWhen using Neural Network, one can use TensorFlow's `specific library `_.\n`PySyft project `_ aims to provide a generic implementation for PyTorch.\n\n\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://gitlab.com/healthdatahub/tsfaker/",
"keywords": "",
"license": "MPL-2.0",
"maintainer": "",
"maintainer_email": "",
"name": "tsfaker",
"package_url": "https://pypi.org/project/tsfaker/",
"platform": "",
"project_url": "https://pypi.org/project/tsfaker/",
"project_urls": {
"Homepage": "https://gitlab.com/healthdatahub/tsfaker/"
},
"release_url": "https://pypi.org/project/tsfaker/0.8/",
"requires_dist": [
"click",
"numpy",
"pandas",
"rstr",
"tableschema",
"dsfaker",
"sphinx ; extra == 'dev'",
"pytest ; extra == 'dev'",
"pytest-timeout ; extra == 'dev'",
"goodtables ; extra == 'dev'",
"tableschema (>=1.5.4) ; extra == 'dev'"
],
"requires_python": "~=3.5",
"summary": "Generate fake data conforming to a Table Schema",
"version": "0.8"
},
"last_serial": 5913066,
"releases": {
"0.1": [
{
"comment_text": "",
"digests": {
"md5": "bf40a147e89e1f4ef5a954c9fcc86050",
"sha256": "78e4cec51227450d6117c554e163877716f8a77e5611a3058bad37ad4d64ba3a"
},
"downloads": -1,
"filename": "tsfaker-0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bf40a147e89e1f4ef5a954c9fcc86050",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 10181,
"upload_time": "2019-06-06T17:51:49",
"url": "https://files.pythonhosted.org/packages/c4/6d/fae4a6905ef0d0b4310c4a0be6c76bbb7749446da8e7537282b76c0461fc/tsfaker-0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "eb4d1cd789b22aee1d60e40b2db6c4cd",
"sha256": "554c2003cba6a86a75ee3b27c8502d3438e454bcc12394e5584527372a10f9f5"
},
"downloads": -1,
"filename": "tsfaker-0.1.tar.gz",
"has_sig": false,
"md5_digest": "eb4d1cd789b22aee1d60e40b2db6c4cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 6880,
"upload_time": "2019-06-06T17:51:51",
"url": "https://files.pythonhosted.org/packages/2f/55/22753f8614e5fe335b6d17b43367ec34565dc18df4179ae94545867901f9/tsfaker-0.1.tar.gz"
}
],
"0.1.2": [
{
"comment_text": "",
"digests": {
"md5": "dc8627347d146a1aeb20be811d541291",
"sha256": "ab08d888fbfd516b8b85a34179ec47b8028d48f8e6e8ba8d1a0885dce7dd3c8c"
},
"downloads": -1,
"filename": "tsfaker-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dc8627347d146a1aeb20be811d541291",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 12163,
"upload_time": "2019-06-06T18:03:38",
"url": "https://files.pythonhosted.org/packages/65/4f/ea22cf585fe2e508556db01b4759b7dd7b625887b3800503370461b1d5c7/tsfaker-0.1.2-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "426627b2da7a56cbdf9a75a03bf8e40c",
"sha256": "268baf1543893064f0aa336ef8cbf03312a3a74ec8aa81dc3d3bcc4ecfb5d86c"
},
"downloads": -1,
"filename": "tsfaker-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "426627b2da7a56cbdf9a75a03bf8e40c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 6912,
"upload_time": "2019-06-06T18:03:40",
"url": "https://files.pythonhosted.org/packages/2b/60/a3c17f73233dc9850be71ba62e2323d2439994a16840b7abbea9ee96e489/tsfaker-0.1.2.tar.gz"
}
],
"0.2": [
{
"comment_text": "",
"digests": {
"md5": "438bdb230518d28ea1e6eaaccbc37668",
"sha256": "ecf58178a1f3f565cb6ecf55d7a77a824194a649256ab91bc626a6f9b60302d5"
},
"downloads": -1,
"filename": "tsfaker-0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "438bdb230518d28ea1e6eaaccbc37668",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 14411,
"upload_time": "2019-06-13T17:11:33",
"url": "https://files.pythonhosted.org/packages/ea/1c/06b27139ea392b5e097a1cb349e7b0bf0b81f71faf46e703468bb8af6921/tsfaker-0.2-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "13d82212edd23af1b3faf7de0e84b811",
"sha256": "52e0b1d01de38665bbdeb255a7f3523495e7d4024ded9f340eb4fcf53bade923"
},
"downloads": -1,
"filename": "tsfaker-0.2.tar.gz",
"has_sig": false,
"md5_digest": "13d82212edd23af1b3faf7de0e84b811",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 10463,
"upload_time": "2019-06-13T17:11:36",
"url": "https://files.pythonhosted.org/packages/68/60/d1c80b1689a2af6ef52f1e86a6207d9f647d9abff7f011ee7e6e5bb6401e/tsfaker-0.2.tar.gz"
}
],
"0.3": [
{
"comment_text": "",
"digests": {
"md5": "9dae107e75ab0f4d024dcc9829b7ceca",
"sha256": "57df0099189cac3c67b6833dc5b7e11764df1433b07b57bd59cb909ed477fbd6"
},
"downloads": -1,
"filename": "tsfaker-0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9dae107e75ab0f4d024dcc9829b7ceca",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 20559,
"upload_time": "2019-07-05T16:46:34",
"url": "https://files.pythonhosted.org/packages/6a/29/b890ab707c4e4471da18040f0a91946a1eed709aaf906cbbe14ef1b89a72/tsfaker-0.3-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "fbb01e86d134731e0ca6a8a849286798",
"sha256": "a8d190667ee013f2e1fe559d382a728d184a0ac197d9dc8b92c35d051337269e"
},
"downloads": -1,
"filename": "tsfaker-0.3.tar.gz",
"has_sig": false,
"md5_digest": "fbb01e86d134731e0ca6a8a849286798",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 13843,
"upload_time": "2019-07-05T16:46:36",
"url": "https://files.pythonhosted.org/packages/61/32/9cbcec585522a1e967e1cdbf374e93ac096524fd2a9c5d56fe06f4fc33b2/tsfaker-0.3.tar.gz"
}
],
"0.4": [
{
"comment_text": "",
"digests": {
"md5": "ee685cdddc8bc8aa74e9328d2ac48abb",
"sha256": "15eb0188127b0d6d0a84cb5f858148ab886cbd1e7923055d5329afcff8d184eb"
},
"downloads": -1,
"filename": "tsfaker-0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ee685cdddc8bc8aa74e9328d2ac48abb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 21396,
"upload_time": "2019-07-18T16:56:06",
"url": "https://files.pythonhosted.org/packages/59/75/b4391bb8b07774a7d1fa7fb5963db407e11a68710587df23a0e954f4afbc/tsfaker-0.4-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "b8fbc22def303871e77e09e29acc0eb8",
"sha256": "15021bebec2f326fb3ff5b96c5f8b29f6570091ee24473670c6b129367096196"
},
"downloads": -1,
"filename": "tsfaker-0.4.tar.gz",
"has_sig": false,
"md5_digest": "b8fbc22def303871e77e09e29acc0eb8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 14557,
"upload_time": "2019-07-18T16:56:09",
"url": "https://files.pythonhosted.org/packages/79/95/ed3a5610c52e0407bd6fd23aba692011ecdd5431aacf29778c72b47ccaa8/tsfaker-0.4.tar.gz"
}
],
"0.5": [
{
"comment_text": "",
"digests": {
"md5": "82e828d78efda79aaaf9c01abad820e3",
"sha256": "3eebc71f1df02c473c18e8f9c3e4e22de8fd0700f5bf0098047fe68ea1dc887a"
},
"downloads": -1,
"filename": "tsfaker-0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "82e828d78efda79aaaf9c01abad820e3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 22003,
"upload_time": "2019-09-16T09:40:00",
"url": "https://files.pythonhosted.org/packages/7f/73/2c131b1f0e221a7a62b5f074a6dd1c4b6fe54b158cf7c581f6236a7fc0ea/tsfaker-0.5-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "358cb67d63ce8a660351c7a1f8e546fe",
"sha256": "3568c51d5ea2ca2670ea374f3b1ce31595fd694ed87e3edc460f533e582c7d42"
},
"downloads": -1,
"filename": "tsfaker-0.5.tar.gz",
"has_sig": false,
"md5_digest": "358cb67d63ce8a660351c7a1f8e546fe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 15601,
"upload_time": "2019-09-16T09:40:04",
"url": "https://files.pythonhosted.org/packages/f7/61/e988c694d2e976bc42c8e6623c2bf48ceee395abd29d38dc20158763e7cc/tsfaker-0.5.tar.gz"
}
],
"0.6": [
{
"comment_text": "",
"digests": {
"md5": "f53fb7412e51d513371d185224a80a49",
"sha256": "013bbfec87a2dc515185d80eb0ace2f0543daa0e07939b340dc2e117ed0c09b1"
},
"downloads": -1,
"filename": "tsfaker-0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f53fb7412e51d513371d185224a80a49",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 22417,
"upload_time": "2019-09-18T15:41:28",
"url": "https://files.pythonhosted.org/packages/19/b7/9d52433b80eae2a09de2ea7d6f323378918f79fde3155fb9ff6c282afd32/tsfaker-0.6-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "bd606df14d4f62e34e338823a9a22881",
"sha256": "ddbb6d240f9c419ecfb7ea5279d4ee34555f151f3a0cf780d164ebbec3a7c702"
},
"downloads": -1,
"filename": "tsfaker-0.6.tar.gz",
"has_sig": false,
"md5_digest": "bd606df14d4f62e34e338823a9a22881",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 15976,
"upload_time": "2019-09-18T15:41:37",
"url": "https://files.pythonhosted.org/packages/7e/f5/69b13507a7d39bdb5988fe4e2a6bda4ce4163db394e3b3600f540d63f6de/tsfaker-0.6.tar.gz"
}
],
"0.7": [
{
"comment_text": "",
"digests": {
"md5": "258b2757675715daa9513e0daa27db99",
"sha256": "9645d43f9ac4f72601b29cbf24ad7b002787de01a66e879d85c0c24913b87647"
},
"downloads": -1,
"filename": "tsfaker-0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "258b2757675715daa9513e0daa27db99",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 22573,
"upload_time": "2019-10-01T14:54:07",
"url": "https://files.pythonhosted.org/packages/24/69/808a94eccf5028b844d70dcfa952f0174dc93ff7eb456933681e02c75a1a/tsfaker-0.7-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "010e84c60c7f7b7ce2ddd5627738cae0",
"sha256": "3aecce00ce964d5624ef2b89981b2ff6b8fd3dea924dca370bbb2b9b0e34caeb"
},
"downloads": -1,
"filename": "tsfaker-0.7.tar.gz",
"has_sig": false,
"md5_digest": "010e84c60c7f7b7ce2ddd5627738cae0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 16108,
"upload_time": "2019-10-01T14:54:15",
"url": "https://files.pythonhosted.org/packages/c0/f8/9807e05832cc90041ef7c7264c2ad93af739461b28e896e67ca2bb40341c/tsfaker-0.7.tar.gz"
}
],
"0.8": [
{
"comment_text": "",
"digests": {
"md5": "2ed76ac027c6d23832150f9679aed595",
"sha256": "3d97096a111022da973431fcd5ded2e34a37c281dbac117f099660430ec869ac"
},
"downloads": -1,
"filename": "tsfaker-0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2ed76ac027c6d23832150f9679aed595",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 22625,
"upload_time": "2019-10-01T15:31:49",
"url": "https://files.pythonhosted.org/packages/ff/43/1121b451a4319e44afa0b9823164d8cbe67c5c4f48deed52e1707a97b2ca/tsfaker-0.8-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "c3ae587d2819dcf3f06feb6eb5e8b538",
"sha256": "aac94c5f55a6ea95ec5b9d3a5c411901cbb6c8b949ee147fa05a2ca11ee9041b"
},
"downloads": -1,
"filename": "tsfaker-0.8.tar.gz",
"has_sig": false,
"md5_digest": "c3ae587d2819dcf3f06feb6eb5e8b538",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 16151,
"upload_time": "2019-10-01T15:32:26",
"url": "https://files.pythonhosted.org/packages/95/c1/67b842de725ec695336b393d60ac03ffdb08bbbbe2b5b695431c4af26c92/tsfaker-0.8.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "2ed76ac027c6d23832150f9679aed595",
"sha256": "3d97096a111022da973431fcd5ded2e34a37c281dbac117f099660430ec869ac"
},
"downloads": -1,
"filename": "tsfaker-0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2ed76ac027c6d23832150f9679aed595",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.5",
"size": 22625,
"upload_time": "2019-10-01T15:31:49",
"url": "https://files.pythonhosted.org/packages/ff/43/1121b451a4319e44afa0b9823164d8cbe67c5c4f48deed52e1707a97b2ca/tsfaker-0.8-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "c3ae587d2819dcf3f06feb6eb5e8b538",
"sha256": "aac94c5f55a6ea95ec5b9d3a5c411901cbb6c8b949ee147fa05a2ca11ee9041b"
},
"downloads": -1,
"filename": "tsfaker-0.8.tar.gz",
"has_sig": false,
"md5_digest": "c3ae587d2819dcf3f06feb6eb5e8b538",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.5",
"size": 16151,
"upload_time": "2019-10-01T15:32:26",
"url": "https://files.pythonhosted.org/packages/95/c1/67b842de725ec695336b393d60ac03ffdb08bbbbe2b5b695431c4af26c92/tsfaker-0.8.tar.gz"
}
]
}