{ "info": { "author": "Onno Valkering", "author_email": "onnovalkering@users.noreply.github.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "\n# Corruptor\n[![PyPI](https://img.shields.io/pypi/v/corruptor.svg)](https://pypi.org/project/corruptor)\n[![PyPI - License](https://img.shields.io/pypi/l/corruptor.svg)](https://pypi.org/project/corruptor)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/corruptor.svg)](https://pypi.org/project/corruptor)\n\nWant to realistically corrupt your (textual) data? Use Corruptor!\n\n```shell\npip install corruptor\n```\n\nThe supported type of corruptions:\n\n- OCR variation\n- Phonetic variation\n- Typing error\n- Edit (insert, delete, replace, swap)\n\n## Getting started\nThere are three different classes that can be used.\n\n### `BasicCorruptor`\nThe basic corruptor provides methods for each type of corruption, using default configuration.\n\n```python\n>>> from corruptor import BasicCorruptor\n>>> basic = BasicCorruptor()\n>>> basic.ocr('johnson')\n'johnst0n'\n>>> basic.phonetic('johnson')\n'johnzon'\n>>> basic.typo('johnson')\n'johhson'\n>>> basic.insert('johnson')\n'johnsson'\n>>> basic.delete('johnson')\n'jhnson'\n>>> basic.replace('johnson')\n'johnsin'\n>>> basic.swap('johnson')\n'johnsno'\n```\n\n### `ProbabilisticCorruptor`\nThis class selects the type of corruption at random, based on provided weights.\n\n```python\n>>> from corruptor import ProbabilisticCorruptor\n>>> prob = ProbabilisticCorruptor({'none': 0.33, 'phonetic': 0.33, 'typo': 0.33})\n>>> prob.corrupt('conner')\n'conner'\n>>> prob.corrupt('conner')\n'conneah'\n>>> prob.corrupt('conner')\n'conber'\n```\n\n### `DataFrameCorruptor`\nIn short, the DataFrame corruptor randomly corrupts `n` rows of a pandas DataFrame.\n\n```python\n>>> import pandas as pd\n>>> from corruptor import DataFrameCorruptor\n>>> df = pd.DataFrame({'firstname': ['frank', 'john'], 'lastname': ['johnson', 'conner']})\n>>> dfc = DataFrameCorruptor({'firstname': (0.5, prob), 'lastname': (0.5, prob)})\n>>> dfc.corrupt(df, n=2)\n firstname lastname\n0 frahk johnson\n1 john conber\n```\n\n\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/onnovalkering/corruptor", "keywords": "", "license": "MPL-2.0", "maintainer": "", "maintainer_email": "", "name": "corruptor", "package_url": "https://pypi.org/project/corruptor/", "platform": "", "project_url": "https://pypi.org/project/corruptor/", "project_urls": { "Homepage": "https://github.com/onnovalkering/corruptor" }, "release_url": "https://pypi.org/project/corruptor/0.3.1/", "requires_dist": [ "future" ], "requires_python": ">=3.6.0", "summary": "Realistic data corruption (based on GeCo).", "version": "0.3.1" }, "last_serial": 3899265, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "e0bc3f3d8c5a98d07a86d7389061fc4e", "sha256": "0608a5bb5b3f3ff82bceb2da2b3f494f4634238c520caa9c28a41baab19b931d" }, "downloads": -1, "filename": "corruptor-0.2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e0bc3f3d8c5a98d07a86d7389061fc4e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 8767, "upload_time": "2018-05-19T10:28:38", "url": "https://files.pythonhosted.org/packages/01/54/0b775490dd43148489fc23bcb078fdf14885d6115e8e7850f07db789ce0f/corruptor-0.2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4edd5473b17fc23d3f09ccb89e65786d", "sha256": "bf846b77eea64c52039affefd9ddb8c8f2611c7198062e9519c52ddb64a4d86a" }, "downloads": -1, "filename": "corruptor-0.2.0.tar.gz", "has_sig": false, "md5_digest": "4edd5473b17fc23d3f09ccb89e65786d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 8919, "upload_time": "2018-05-19T10:28:40", "url": "https://files.pythonhosted.org/packages/ef/a4/273bce5f7ed791e3397e4f963900523faaa76a6145340b9b22e5d5dcd0fe/corruptor-0.2.0.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "7c40bfc4484d17b05661500a7b321671", "sha256": "8db8f56b1e791dd6b92b2595f79c1b2c0835136b7dec495cd27edc4b783fdbf0" }, "downloads": -1, "filename": "corruptor-0.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "7c40bfc4484d17b05661500a7b321671", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 10085, "upload_time": "2018-05-25T14:54:09", "url": "https://files.pythonhosted.org/packages/cb/0f/f1c70beee5e0ffab30c369049b36676e160042c8a31d2ea7fcd28355e322/corruptor-0.3.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c6847570d82856e3115e7bdeb10563e5", "sha256": "c340197278b03968b66f511aa0322650b558d3e3dcda7d4cb33e34cf2b14929e" }, "downloads": -1, "filename": "corruptor-0.3.0.tar.gz", "has_sig": false, "md5_digest": "c6847570d82856e3115e7bdeb10563e5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 9607, "upload_time": "2018-05-25T14:54:11", "url": "https://files.pythonhosted.org/packages/07/bf/d6722eb31222441836f8f02312dc9d39068af196a8d5695a7721041a83db/corruptor-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "d858455b35871e3182f525411c69bb08", "sha256": "bd4f9c787df0c93c318e2a125244ca915f47642b9abc4b1f862cefaadd05cf87" }, "downloads": -1, "filename": "corruptor-0.3.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d858455b35871e3182f525411c69bb08", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 28359, "upload_time": "2018-05-25T15:14:47", "url": "https://files.pythonhosted.org/packages/32/ba/0edf1ddc828d36c8fa7a2a218ca8e367a21f2a866954f6c03330e8bab26b/corruptor-0.3.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e9ac7b4316f42f76e313324f89817a4d", "sha256": "0895067b3adca967ea752b8f7b803facd104b75e062a0219b85e4304c21890f1" }, "downloads": -1, "filename": "corruptor-0.3.1.tar.gz", "has_sig": false, "md5_digest": "e9ac7b4316f42f76e313324f89817a4d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 27289, "upload_time": "2018-05-25T15:14:49", "url": "https://files.pythonhosted.org/packages/c4/9a/fe594c4b1f357d14b19fb0b0377d48f5d59c89104f000c9c632f61bc4f96/corruptor-0.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d858455b35871e3182f525411c69bb08", "sha256": "bd4f9c787df0c93c318e2a125244ca915f47642b9abc4b1f862cefaadd05cf87" }, "downloads": -1, "filename": "corruptor-0.3.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d858455b35871e3182f525411c69bb08", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 28359, "upload_time": "2018-05-25T15:14:47", "url": "https://files.pythonhosted.org/packages/32/ba/0edf1ddc828d36c8fa7a2a218ca8e367a21f2a866954f6c03330e8bab26b/corruptor-0.3.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e9ac7b4316f42f76e313324f89817a4d", "sha256": "0895067b3adca967ea752b8f7b803facd104b75e062a0219b85e4304c21890f1" }, "downloads": -1, "filename": "corruptor-0.3.1.tar.gz", "has_sig": false, "md5_digest": "e9ac7b4316f42f76e313324f89817a4d", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 27289, "upload_time": "2018-05-25T15:14:49", "url": "https://files.pythonhosted.org/packages/c4/9a/fe594c4b1f357d14b19fb0b0377d48f5d59c89104f000c9c632f61bc4f96/corruptor-0.3.1.tar.gz" } ] }