{ "info": { "author": "Oskar Hollmann", "author_email": "oskar@hollmann.me", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6", "Topic :: Database :: Front-Ends" ], "description": "pgantomizer\n===========\n\n.. image:: https://travis-ci.org/asgeirrr/pgantomizer.svg?branch=master\n :target: https://travis-ci.org/asgeirrr/pgantomizer\n\n.. image:: https://coveralls.io/repos/github/asgeirrr/pgantomizer/badge.svg?branch=master\n :target: https://coveralls.io/github/asgeirrr/pgantomizer\n\n.. image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg\n :target: https://github.com/asgeirrr/pgantomizer/blob/master/LICENSE\n\n.. image:: https://badge.fury.io/py/pgantomizer.svg\n :target: https://badge.fury.io/py/pgantomizer\n\nAnonymize data in your PostgreSQL dababase with ease. Anonymization is handy if you need to provide data to\npeople that should not have access to the personal information of the users.\nImporting the data to third-party tools where you cannot guarantee what will happen to the data is also a common use case.\nThis tool will come in handy when GDPR will take effect in EU-countries.\n\n\nAnonymization Process\n---------------------\n\nThe rules for anonynimization are written in a single YAML file.\nColumns that should be left in the raw form without anonymization must be explicitly marked in the schema.\nThis ensures that adding the new column in the DB without thinking about its sensitivity does not leak the data.\nThe default name of the primary key is `id` but a custom one can be specified form the table in the schema.\nPrimary key is NOT anonymized by default.\n\nA sample YAML schema can be examined below.\n\n.. code:: yaml\n\n customer:\n raw: [language, currency]\n pk: customer_id\n customer_address:\n raw: [country, customer_id]\n custom_rules:\n address_line: aggregate_length\n\nSometimes it is needed to use a different anonymization function for a particular column.\nIt can be specified in the `custom_rules` directive (see example above).\nThere is a limited set of functions you can choose from. So far\n\n* **aggregate_length** - replaces content of the column with its length (can be used on any type that supports length function)\n\n\nCalling pgantomizer from the Command Line\n-----------------------------------------\n\n**pgantomizer_dump** is a helper script that dumps tables specified in the YAML schema file to a compressed file using `pg_dump`.\nJust pass the path to the schema and the DB connection details.\nMinimal working example taking advantage of default values of some of the required parameters:\n\n.. code:: bash\n\n pgantomizer_dump --schema my_schema.yaml --dbname original_postgres --user alaric\n\nTo see a list of all parameters, run:\n\n.. code:: bash\n\n pgantomizer_dump -h\n\nThe script is able to take the DB connection details from environmental variables\nfollowing the conventions of running Django in Docker. The presumed variable names are:\n`DB_DEFAULT_NAME`, `DB_DEFAULT_USER`, `DB_DEFAULT_PASS`, `DB_DEFAULT_SERVICE`, `DB_DEFAULT_PORT`.\n\n**pgantomizer** is the main script that loads the Postgre dump into a specified instance. Then all columns\nexcept primary keys and the ones specified in the schema as `raw` are anonymized according to their data type.\nFinally, the dump file is deleted by default to reduce risk of leakage of unanonymized data.\nThe connection details of the Postgres instance where the anonymized data should be loaded can be passed as arguments\n\n.. code:: bash\n\n pgantomizer --schema my_schema.yaml --dump-file ./to_anonymize.sql --dbname anonymized_postgres --user alaric --password anonymized_pass --host localhost --port 5432\n\nor through environmental variables with following names:\n`ANONYMIZED_DB_NAME`, `ANONYMIZED_DB_USER`, `ANONYMIZED_DB_PASS`, `ANONYMIZED_DB_HOST`, `ANONYMIZED_DB_PORT`.\n\n\nCalling pgantomizer from Python\n-------------------------------\n\nUse **dump_db** and **load_anonymize_remove** functions to dump anonymize the data from Python.\nIn the following example, DB connections for the original and anonymized instance are specified via ENV variables described above.\n\n.. code:: python\n\n from pgantomizer import dump_db, load_anonymize_remove\n\n dump_db('to_anonymize.sql', 'anonymization_schema.yaml')\n load_anonymize_remove('to_anonymize.sql', 'anonymization_schema.yaml')\n\nBoth functions have an optional **db_args** argument to pass the connection arguments explicitly in a dict.\nSee the example below how the dict should look like.\n\nIf you are only after anonymizing an existing database, there is a function `anonymize_db`\nthat will help you do that with a little extra work of parsing the YAML schema.\n\n.. code:: python\n\n import yaml\n\n from pgantomizer import anonymize_db\n\n anonymize_db(yaml.load(open('anonymization_schema.yaml')), {\n 'dbname': 'anonymized_postgres',\n 'user': 'alaric',\n 'password': 'anonymized_pass',\n 'host': 'localhost',\n 'port': '5432',\n })\n\nIf you would like to use environmental variables instead, use function `anonymize.get_db_args_from_env`\nto construct the dict from ENV.\n\n\nTODO\n----\n* expand this README\n* submit package automatically to PyPI\n* add --dry-run argument that will check the schema and output the operations to be performed\n* remove password argument and use `getpass` instead for better security", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/asgeirrr/pgantomizer", "keywords": "postgres anonymization dump", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "pgantomizer", "package_url": "https://pypi.org/project/pgantomizer/", "platform": "", "project_url": "https://pypi.org/project/pgantomizer/", "project_urls": { "Homepage": "https://github.com/asgeirrr/pgantomizer" }, "release_url": "https://pypi.org/project/pgantomizer/0.1.1/", "requires_dist": null, "requires_python": "", "summary": "Anonymize data in your PostgreSQL dababase with ease.", "version": "0.1.1" }, "last_serial": 3097758, "releases": { "0.0.6": [ { "comment_text": "", "digests": { "md5": "ffe0a13ed970136357efdd15fe27478e", "sha256": "8ebfe612563a94a090476d262156ecd1d65845688001d59a375a69134ea0f4dd" }, "downloads": -1, "filename": "pgantomizer-0.0.6.tar.gz", "has_sig": false, "md5_digest": "ffe0a13ed970136357efdd15fe27478e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7496, "upload_time": "2017-07-30T12:12:26", "url": "https://files.pythonhosted.org/packages/12/4a/1eda3eb49701379b4b15b8a3fcc75801632c94e25d3c5abe361ef6f1210b/pgantomizer-0.0.6.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "f1c52fceea06ac62bf050e3053b84e91", "sha256": "1321be7bf04867b9cce75e6c0e364ddd652fa268f508abf7f390a72a00783a9b" }, "downloads": -1, "filename": "pgantomizer-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "f1c52fceea06ac62bf050e3053b84e91", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13346, "upload_time": "2017-08-15T10:21:33", "url": "https://files.pythonhosted.org/packages/e7/78/ae2930ac4fc6523092f22996ecd2a6f68d5ee617f72b749679a5ab8b5a22/pgantomizer-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fcdaec47e166dcff2bec65eb5e1e853c", "sha256": "32c031b77c8a9b0c17cc9e8e10584f355682a91182c08075f99a3d86c4f2ed2c" }, "downloads": -1, "filename": "pgantomizer-0.1.0.tar.gz", "has_sig": false, "md5_digest": "fcdaec47e166dcff2bec65eb5e1e853c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8316, "upload_time": "2017-08-15T10:21:35", "url": "https://files.pythonhosted.org/packages/8a/44/29495d61ab1df5a020d7a933476a1811c7eb94236b511254b1135eaa65f3/pgantomizer-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "84351d14442125c8d7c14aa3090a9842", "sha256": "f20e4ca61c43a56d7ee9f2588a74409e9dd32b6f49ccaabf8033e0d42071077e" }, "downloads": -1, "filename": "pgantomizer-0.1.1.tar.gz", "has_sig": false, "md5_digest": "84351d14442125c8d7c14aa3090a9842", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8311, "upload_time": "2017-08-15T10:36:14", "url": "https://files.pythonhosted.org/packages/ef/97/5296278309111819fe676e63b04f2bb2d9289ec5d2cb468f4746e3f4bec9/pgantomizer-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "84351d14442125c8d7c14aa3090a9842", "sha256": "f20e4ca61c43a56d7ee9f2588a74409e9dd32b6f49ccaabf8033e0d42071077e" }, "downloads": -1, "filename": "pgantomizer-0.1.1.tar.gz", "has_sig": false, "md5_digest": "84351d14442125c8d7c14aa3090a9842", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8311, "upload_time": "2017-08-15T10:36:14", "url": "https://files.pythonhosted.org/packages/ef/97/5296278309111819fe676e63b04f2bb2d9289ec5d2cb468f4746e3f4bec9/pgantomizer-0.1.1.tar.gz" } ] }