{ "info": { "author": "Sterling Paramore", "author_email": "sterling.paramore@insidetrack.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "# pandas-mapper\n\nThe pandas-mapper is a Python pacckage that provides a concise syntax\nfor applying mapping transformations to\n[Pandas](http://pandas.pydata.org/) Dataframes commonly required for\nETL workflows. Possibly the biggest benefit to using pandas-mapper\nover the native\n[pandas.Dataframe.apply](http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.apply.html)\nmethod is a robust error handling mechanism. Instead of raising an\nerror, mapping errors can be redirected to an errors Dataframe, which\ncan then be handled by the user as needed.\n\n\n## Getting started\n\nTo get started, install pandas-mapper in your project using pip\n\n```\npip install pandas-mapper\n```\n\nand then use it your project by importing the package\n\n```\nimport pandas_mapper\n```\n\nWhen you import this package in your project, it adds the `mapping` method to Pandas dataframe\nobjects. Suppose you had a dataframe containing integers and the English word for the integer\nand you want to translate the names to Spanish.\n\n```python\nimport pandas as pd\nimport pandas_mapper\n\ndf = pd.DataFrame(\n {\n 'num': [1, 2, 3],\n 'name': ['one', 'two', 'three'],\n 'num_name': ['1-one', '2-two', '3-three']\n }\n)\n\n```\n\n| num | name | num_name |\n| - | - | - |\n| 1 | one | 1-one |\n| 2 | two | 2-two |\n| 3 | three | 3-three |\n\n\nA stupidly-simple translation method might be\n\n```python\ndef translate(val):\n if val == 1:\n return 'uno'\n elif val == 2:\n return 'dos'\n elif val == 3:\n return 'tres'\n else:\n raise ValueError('Unknown translation: {}'.format(val))\n```\n\nThe translation can be accomplished using pandas-mapper via\n\n```python\nmapper = df.mapping([('num', 'translated', translate)])\ntranslated_df = mapper.mapped\n```\n\nThe first argument of the mapping method is a list of tuples, where\nthe first element of the tuple is the source column(s), the second element\nis the target column(s), and the (optional) third element is the\ntransform. In this example, we only have one map in the list, so the result is\na dataframe with a single column:\n\n| translated |\n| - |\n| uno |\n| dos |\n| tres |\n\n\n\n## Handling errors\n\nOur stupidly-simple translation will raise an error if we supply it with the number `4`. Suppose\nwe added another record with the number `4` to our `df` defined above. If we apply the same\nmapping as above, pandas-mapper will raise a `ValueError`. However, if we supply the `mapping`\nmethod with the `on_error='redirect'` option via\n\n```\nmapper = df.mapping([('num', 'translated', translate)], on_error='redirect')\ntranslated_df = mapper.mapped\ntranslation_errors_df = mapper.errors\n```\n\nthen we get two dataframes, one with the translated records (`mapper.mapped`):\n\n| translated |\n| - |\n| uno |\n| dos |\n| tres |\n\nand another with the error records (`mapper.errors`):\n\n| num | name | num_name | __errror__ |\n| - | - | - | - |\n| 4 | four | 4-four | {'msg': 'ValueError(4): Unknown translation: 4... |\n\n\n## Mapping cardinalities\n\nEach map can be defined with 0 or more sources, 0 or more targets, and\n0 or 1 transform functions. The expected arguments and return values of the\ntransform function is depenedent on the number of source and target columns used.\n\n* If the mapping has a single source and single target columns, then the\n transform function should accept a single value and return a single value.\n* If the mapping involves multiple source columns, then the\n function should accept a single dict-like object where the\n keys are the names of the source columns. In this case, if the mapping\n has a single target, then the retun value of the transform function should contain\n a single value. However, if the mapping has multiple targets, then the return\n value should be the same dict-like object that was passed to the function, with\n the target-column keys of that object have been modified in place by the function.\n* If the mapping has no source columns, then the transform can either be a constant\n (e.g., the integer 5), or a function that accepts no arguments but returns a value\n (which may be useful if you want to use a random number generator).\n\n### Examples\n\n#### Zero-to-one\n\nZero-to-one mappings can be used to set a column to a constant:\n\n```python\ndf.mapping([None, 'five', transform=5])\n```\n\nOr defined by some function that generates output:\n\n```python\nimport random\ndf.mapping([(None, 'rando', random.random)]).mapped\n```\n\n#### One-to-one\nOur translation function defined above is an example of a one-to-one transform:\n\n```python\ndf.mapping([('num', 'translated', translate)])\n```\n\n#### Many-to-one\nConcatenation is an example of a many-to-one operation:\n\n```python\ndf.mapping([(['num', 'name'], 'num-name', lambda row: '-'.join(row.apply(str)))])\n```\n\n#### Many-to-Many\nDeconcatenation is an example of a one-to-many operation. One-to-many operations\nrequire the same method signature as many-to-many:\n\n```python\ndef deconcatenate(row):\n split_values = row['num_name'].split('-')\n row[num'] = split_values[0]\n row[name'] = split_values[1]\n return row\n\ndf.mapping([('num_name', ['num', 'name'], deconcatenate)])\n```\n\n\n## Other options\n\nThe mapping method also supports an `inplace` option, which is `False` by default. This\nwill modify the dataframe in place, bringing along all columns that it started with. For example:\n\n```python\ndf.mapping([('num', 'translated', translate)], inplace=True).mapped\n```\n\n| num | name | num_name | translated |\n| - | - | - | - |\n| 1 | one | 1-one | uno |\n| 2 | two | 2-two | dos |\n| 3 | three | 3-three | tres |\n\nModifying the dataframe inplace can be useful when you need to chain together transformations,\nlike when the output of one map in needed as the input for another map.\n\n## Contributor Setup\n\nDownload and install the [docker community edition](https://www.docker.com/)\nthat is appropriate for your host os.\n\nWe use [invoke](http://www.pyinvoke.org/) to setup and control the environment\nfor developing and testing this project. This will require that you install\ninvoke in your host OS. You may be able to get away with just running\n`pip install invoke`. However, the recommended method is to download and install\n[miniconda](https://conda.io/miniconda.html). Then, create a project-specific\nenvironment and install invoke in this environment:\n\n```\nconda create --name pandas-mapper python=3.6\nsource activate pandas-mapper\npip install invoke\n```\n\n**Note**: if you use miniconda, you will have to run `source activate pandas-mapper`\neach time you start a new terminal session.\n\nOnce invoke is installed, you can build the docker containers to use the dev/test environment\n\n```\ninv build\n```\n\nSpin up the dev/test environment via\n\n```\ninv up --jupyter-port=8892\n```\n\nRun the test suite via\n\n```\ninv test\n```", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/inside-track/pandas-mapper", "keywords": "etl data workflows pandas", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pandas-mapper", "package_url": "https://pypi.org/project/pandas-mapper/", "platform": "", "project_url": "https://pypi.org/project/pandas-mapper/", "project_urls": { "Homepage": "https://github.com/inside-track/pandas-mapper" }, "release_url": "https://pypi.org/project/pandas-mapper/0.1.2/", "requires_dist": null, "requires_python": ">=3", "summary": "Pandas Mapper", "version": "0.1.2" }, "last_serial": 4224633, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "6848a7c0cd09c82e7abc32d6a0a32fb0", "sha256": "0cd78f04f9c4535e91301db9dd8f4386eef93d35b43ea1c5bf711ce8a6ea5b94" }, "downloads": -1, "filename": "pandas-mapper-0.1.0.tar.gz", "has_sig": false, "md5_digest": "6848a7c0cd09c82e7abc32d6a0a32fb0", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7870, "upload_time": "2018-06-18T23:50:19", "url": "https://files.pythonhosted.org/packages/17/e3/d520be291f985cb26d08a2c50bdc72d3e7b2ef3e26d1ba68a23215c1ecd5/pandas-mapper-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "e137480d592b537d2571eb7c41dd465e", "sha256": "6d2a76ac342330d340fb3fdfb7dd954bc40171d75a3a083a8301167529488bef" }, "downloads": -1, "filename": "pandas-mapper-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e137480d592b537d2571eb7c41dd465e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7858, "upload_time": "2018-06-27T00:02:53", "url": "https://files.pythonhosted.org/packages/68/80/93fc6103a6fa88b119c87d5ee04c3f0866e6e39fa014a735a8327d2187ba/pandas-mapper-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "14559860bee7cb53065fd514a940b19c", "sha256": "4c97c6f9476b4f509dde6518a455a467f61f06a03452f9ad0790b79bc4be8589" }, "downloads": -1, "filename": "pandas-mapper-0.1.2.tar.gz", "has_sig": false, "md5_digest": "14559860bee7cb53065fd514a940b19c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7919, "upload_time": "2018-08-31T00:11:48", "url": "https://files.pythonhosted.org/packages/fd/8e/fea2a3e15235ad208dc45f8cd89ea6e5bc8055eb93c4285fa4c2625270eb/pandas-mapper-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "14559860bee7cb53065fd514a940b19c", "sha256": "4c97c6f9476b4f509dde6518a455a467f61f06a03452f9ad0790b79bc4be8589" }, "downloads": -1, "filename": "pandas-mapper-0.1.2.tar.gz", "has_sig": false, "md5_digest": "14559860bee7cb53065fd514a940b19c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 7919, "upload_time": "2018-08-31T00:11:48", "url": "https://files.pythonhosted.org/packages/fd/8e/fea2a3e15235ad208dc45f8cd89ea6e5bc8055eb93c4285fa4c2625270eb/pandas-mapper-0.1.2.tar.gz" } ] }