{ "info": { "author": "Justin Bois", "author_email": "bois@caltech.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Programming Language :: Python :: 3" ], "description": "\n# Altair-catplot\n\nA utility to use Altair to generate box plots, jitter plots, and ECDFs, i.e. plots with a categorical variable where a data transformation not covered in Altair is required.\n\n## Motivation\n\n[Altair](https://altair-viz.github.io) is a Python interface for [Vega-Lite](http://vega.github.io/vega-lite). The resulting plots are easily displayed in JupyterLab and/or exported. The grammar of Vega-Lite which is largely present in Altair is well-defined, well-documented, and clear. This is one of many strong features of Altair and Vega-Lite.\n\nThere is always a trade-off when using high level plotting libraries. You can rapidly make plots, but they are less configurable. The developers of Altair have (wisely, in my opinion) adhered to the grammar of Vega-Lite. If Vega-Lite does not have a feature, Altair does not try to add it.\n\nThe developers of Vega-Lite have an [have plans to add more functionality](https://github.com/vega/vega-lite/pull/4096/files). Indeed, in the soon to be released (as of August 23, 2018) Vega-Lite 3.0, box plots are included. Adding a jitter transform is also planned. It would be useful to be able to conveniently make jitter and box plots with the current features of Vega-Lite and Altair. I wrote Altair-catplot to fill in this gap until the functionality is implemented in Vega-Lite and Altair.\n\nThe box plots and jitter plots I have in mind apply to the case where one axis is quantitative and the other axis is nominal or ordinal (that is, categorical). So, we are making plots with one categorical variable and one quantitative. Hence the name, Altair-catplot.\n\n## Installation\n\nYou can install altair-catplot using pip. You will need to have a recent version of Altair and all of its dependencies installed.\n\n pip install altair_catplot\n\n## Usage\n\nI will import Altair-catplot as `altcat`, and while I'm at it will import the other modules we need.\n\n\n```python\nimport numpy as np\nimport pandas as pd\n\nimport altair as alt\nimport altair_catplot as altcat\n```\n\nEvery plot is made using the `altcat.catplot()` function. It has the following call signature.\n\n```python\ncatplot(data=None,\n height=Undefined,\n width=Undefined, \n mark=Undefined,\n encoding=Undefined,\n transform=None,\n sort=Undefined,\n jitter_width=0.2,\n box_mark=Undefined,\n whisker_mark=Undefined,\n box_overlay=False,\n **kwargs)\n```\n\nThe `data`, `mark`, `encoding`, and `transform` arguments must all be provided. The `data`, `mark`, and `encoding` fields are as for `alt.Chart()`. Note that these are specified as constructor attributes, not as you would using Altair's more idiomatic methods like `mark_point()`, `encode()`, etc.\n\nIn this package, I consider a box plot, jitter plot, or ECDF to be transforms of the data, as they are constructed by performing some aggegration of transformation to the data. The exception is for a box plot, since in Vega-Lite 3.0+'s specification for box plots, where `boxplot` is a mark.\n\nThe utility is best shown by example, so below I present several.\n\n## Sample data\n\nTo demonstrate usage, I will first create a data frame with sample data for plotting.\n\n\n```python\nnp.random.seed(4288233)\n\ndata = {'data ' + str(i): np.random.normal(*musig, size=50) \n for i, musig in enumerate(zip([0, 1, 2, 3], [1, 1, 2, 3]))}\n\ndf = pd.DataFrame(data=data).melt()\ndf['dummy metadata'] = np.random.choice(['poodle', 'beagle', 'collie', 'dalmation', 'terrier'],\n size=len(df))\n\ndf.head()\n```\n\n\n\n\n
| \n | variable | \nvalue | \ndummy metadata | \n
|---|---|---|---|
| 0 | \ndata 0 | \n1.980946 | \ncollie | \n
| 1 | \ndata 0 | \n-0.442286 | \ndalmation | \n
| 2 | \ndata 0 | \n1.093249 | \nterrier | \n
| 3 | \ndata 0 | \n-0.233622 | \ncollie | \n
| 4 | \ndata 0 | \n-0.799315 | \ndalmation | \n