{
"info": {
"author": "Ted Petrou",
"author_email": "ted@dunderdata.com",
"bugtrack_url": null,
"classifiers": [
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3"
],
"description": "\n# How to use pandas_cub\n\nThe README.ipynb notebook will serve as the documentation and usage guide to pandas_cub.\n\n## Installation\n\n`pip install pandas-cub`\n\n## What is pandas_cub?\npandas_cub is a simple data analysis library that emulates the functionality of the pandas library. The library is not meant for serious work. It was built as an assignment for one of Ted Petrou's Python classes. If you would like to complete the assignment on your own, visit [this repository][1]. There are about 40 steps and 100 tests that you must pass in order to rebuild the library. It is a good challenge and teaches you the fundamentals of how to build your own data analysis library.\n\n## pandas_cub functionality\n\npandas_cub has limited functionality but is still capable of a wide variety of data analysis tasks.\n\n* Subset selection with the brackets\n* Arithmetic and comparison operators (+, -, <, !=, etc...)\n* Aggregation of columns with most of the common functions (min, max, mean, median, etc...)\n* Grouping via pivot tables\n* String-only methods for columns containing strings\n* Reading in simple comma-separated value files\n* Several other methods\n\n\n## pandas_cub DataFrame\n\npandas_cub has a single main object, the DataFrame, to hold all of the data. The DataFrame is capable of holding 4 data types - booleans, integers, floats, and strings. All data is stored in NumPy arrays. panda_cub DataFrames have no index (as in pandas). The columns must be strings.\n\n### Missing value representation\nBoolean and integer columns will have no missing value representation. The NumPy NaN is used for float columns and the Python None is used for string columns.\n\n## Code Examples\n\npandas_cub syntax is very similar to pandas, but implements much fewer methods. The below examples will cover just about all of the API.\n\n[1]: https://github.com/tdpetrou/pandas_cub\n\n### Reading data with `read_csv`\n\npandas_cub consists of a single function, `read_csv`, that has a single parameter, the location of the file you would like to read in as a DataFrame. This function can only handle simple CSV's and the delimiter must be a comma. A sample employee dataset is provided in the data directory. Notice that the visual output of the DataFrame is nearly identical to that of a pandas DataFrame. The `head` method returns the first 5 rows by default.\n\n\n```python\nimport pandas_cub as pdc\n```\n\n\n```python\ndf = pdc.read_csv('data/employee.csv')\ndf.head()\n```\n\n\n\n\n
| dept | race | gender | salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 |
\n\n\n\n### DataFrame properties\n\nThe `shape` property returns a tuple of the number of rows and columns\n\n\n```python\ndf.shape\n```\n\n\n\n\n (1535, 4)\n\n\n\nThe `len` function returns just the number of rows.\n\n\n```python\nlen(df)\n```\n\n\n\n\n 1535\n\n\n\nThe `dtypes` property returns a DataFrame of the column names and their respective data type.\n\n\n```python\ndf.dtypes\n```\n\n\n\n\n | Column Name | Data Type |
|---|
| 0 | dept | string |
| 1 | race | string |
| 2 | gender | string |
| 3 | salary | int |
\n\n\n\nThe `columns` property returns a list of the columns.\n\n\n```python\ndf.columns\n```\n\n\n\n\n ['dept', 'race', 'gender', 'salary']\n\n\n\nSet new columns by assigning the `columns` property to a list.\n\n\n```python\ndf.columns = ['department', 'race', 'gender', 'salary']\ndf.head()\n```\n\n\n\n\n | department | race | gender | salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 |
\n\n\n\nThe `values` property returns a single numpy array of all the data.\n\n\n```python\ndf.values\n```\n\n\n\n\n array([['Houston Police Department-HPD', 'White', 'Male', 45279],\n ['Houston Fire Department (HFD)', 'White', 'Male', 63166],\n ['Houston Police Department-HPD', 'Black', 'Male', 66614],\n ...,\n ['Houston Police Department-HPD', 'White', 'Male', 43443],\n ['Houston Police Department-HPD', 'Asian', 'Male', 55461],\n ['Houston Fire Department (HFD)', 'Hispanic', 'Male', 51194]],\n dtype=object)\n\n\n\n### Subset selection\n\nSubset selection is handled with the brackets. To select a single column, place that column name in the brackets.\n\n\n```python\ndf['race'].head()\n```\n\n\n\n\n | race |
|---|
| 0 | White |
| 1 | White |
| 2 | Black |
| 3 | Asian |
| 4 | White |
\n\n\n\nSelect multiple columns with a list of strings.\n\n\n```python\ndf[['race', 'salary']].head()\n```\n\n\n\n\n | race | salary |
|---|
| 0 | White | 45279 |
| 1 | White | 63166 |
| 2 | Black | 66614 |
| 3 | Asian | 71680 |
| 4 | White | 42390 |
\n\n\n\nSimultaneously select rows and columns by passing the brackets the row selection followed by the column selection separated by a comma. Here we use integers for rows and strings for columns.\n\n\n```python\nrows = [10, 50, 100]\ncols = ['salary', 'race']\ndf[rows, cols]\n```\n\n\n\n\n | salary | race |
|---|
| 0 | 77076 | Black |
| 1 | 81239 | White |
| 2 | 81239 | White |
\n\n\n\nYou can use integers for the columns as well.\n\n\n```python\nrows = [10, 50, 100]\ncols = [2, 0]\ndf[rows, cols]\n```\n\n\n\n\n | gender | department |
|---|
| 0 | Male | Houston Police Department-HPD |
| 1 | Male | Houston Police Department-HPD |
| 2 | Male | Houston Police Department-HPD |
\n\n\n\nYou can use a single integer and not just a list.\n\n\n```python\ndf[99, 3]\n```\n\n\n\n\n\n\n\n\nOr a single string for the columns\n\n\n```python\ndf[99, 'salary']\n```\n\n\n\n\n\n\n\n\nYou can use a slice for the rows\n\n\n```python\ndf[20:100:10, ['race', 'gender']]\n```\n\n\n\n\n | race | gender |
|---|
| 0 | White | Male |
| 1 | White | Male |
| 2 | Hispanic | Male |
| 3 | White | Male |
| 4 | White | Male |
| 5 | Hispanic | Male |
| 6 | Hispanic | Male |
| 7 | Black | Female |
\n\n\n\nYou can also slice the columns with either integers or strings\n\n\n```python\ndf[20:100:10, :2]\n```\n\n\n\n\n | department | race |
|---|
| 0 | Houston Police Department-HPD | White |
| 1 | Houston Fire Department (HFD) | White |
| 2 | Houston Police Department-HPD | Hispanic |
| 3 | Houston Police Department-HPD | White |
| 4 | Houston Fire Department (HFD) | White |
| 5 | Houston Police Department-HPD | Hispanic |
| 6 | Houston Fire Department (HFD) | Hispanic |
| 7 | Houston Police Department-HPD | Black |
\n\n\n\n\n```python\ndf[20:100:10, 'department':'gender']\n```\n\n\n\n\n | department | race | gender |
|---|
| 0 | Houston Police Department-HPD | White | Male |
| 1 | Houston Fire Department (HFD) | White | Male |
| 2 | Houston Police Department-HPD | Hispanic | Male |
| 3 | Houston Police Department-HPD | White | Male |
| 4 | Houston Fire Department (HFD) | White | Male |
| 5 | Houston Police Department-HPD | Hispanic | Male |
| 6 | Houston Fire Department (HFD) | Hispanic | Male |
| 7 | Houston Police Department-HPD | Black | Female |
\n\n\n\nYou can do boolean selection if you pass the brackets a one-column boolean DataFrame.\n\n\n```python\nfilt = df['salary'] > 100000\nfilt.head()\n```\n\n\n\n\n | salary |
|---|
| 0 | False |
| 1 | False |
| 2 | False |
| 3 | False |
| 4 | False |
\n\n\n\n\n```python\ndf[filt].head()\n```\n\n\n\n\n | department | race | gender | salary |
|---|
| 0 | Public Works & Engineering-PWE | White | Male | 107962 |
| 1 | Health & Human Services | Black | Male | 180416 |
| 2 | Houston Fire Department (HFD) | Hispanic | Male | 165216 |
| 3 | Health & Human Services | White | Female | 100791 |
| 4 | Houston Airport System (HAS) | White | Male | 120916 |
\n\n\n\n\n```python\ndf[filt, ['race', 'salary']].head()\n```\n\n\n\n\n | race | salary |
|---|
| 0 | White | 107962 |
| 1 | Black | 180416 |
| 2 | Hispanic | 165216 |
| 3 | White | 100791 |
| 4 | White | 120916 |
\n\n\n\n### Assigning Columns\nYou can only assign an entire new column or overwrite an old one. You cannot assign a subset of the data. You can assign a new column with a single value like this:\n\n\n```python\ndf['bonus'] = 1000\ndf.head()\n```\n\n\n\n\n | department | race | gender | salary | bonus |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 1000 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 1000 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 1000 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 1000 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 1000 |
\n\n\n\nYou can assign with a numpy array the same length as a column.\n\n\n```python\nimport numpy as np\ndf['bonus'] = np.random.randint(100, 5000, len(df))\ndf.head()\n```\n\n\n\n\n | department | race | gender | salary | bonus |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 3536 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 1296 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 511 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 4267 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 3766 |
\n\n\n\nYou can assign a new column with a one column DataFrame.\n\n\n```python\ndf['salary'] + df['bonus']\n```\n\n\n\n\n | salary |
|---|
| 0 | 48815 |
| 1 | 64462 |
| 2 | 67125 |
| 3 | 75947 |
| 4 | 46156 |
| 5 | 110001 |
| 6 | 53738 |
| 7 | 185348 |
| 8 | 32575 |
| 9 | 57918 |
| ... | ... |
| 1525 | 32936 |
| 1526 | 49294 |
| 1527 | 34218 |
| 1528 | 82795 |
| 1529 | 104900 |
| 1530 | 46408 |
| 1531 | 67050 |
| 1532 | 47368 |
| 1533 | 60013 |
| 1534 | 52624 |
\n\n\n\n\n```python\ndf['total salary'] = df['salary'] + df['bonus']\ndf.head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 1296 | 64462 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 511 | 67125 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 3766 | 46156 |
\n\n\n\n### Arithmetic and comparison operators\n\n\n```python\ndf1 = df[['salary', 'bonus']] * 5\ndf1.head()\n```\n\n\n\n\n | salary | bonus |
|---|
| 0 | 226395 | 17680 |
| 1 | 315830 | 6480 |
| 2 | 333070 | 2555 |
| 3 | 358400 | 21335 |
| 4 | 211950 | 18830 |
\n\n\n\n\n```python\ndf1 = df[['salary', 'bonus']] > 100000\ndf1.head()\n```\n\n\n\n\n | salary | bonus |
|---|
| 0 | False | False |
| 1 | False | False |
| 2 | False | False |
| 3 | False | False |
| 4 | False | False |
\n\n\n\n\n```python\ndf1 = df['race'] == 'White'\ndf1.head()\n```\n\n\n\n\n | race |
|---|
| 0 | True |
| 1 | True |
| 2 | False |
| 3 | False |
| 4 | True |
\n\n\n\n### Aggregation\n\nMost of the common aggregation methods are available. They only work down the columns and not across the rows.\n\n\n```python\ndf.min()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Health & Human Services | Asian | Female | 24960 | 101 | 25913 |
\n\n\n\nColumns that the aggregation does not work are dropped.\n\n\n```python\ndf.mean()\n```\n\n\n\n\n | salary | bonus | total salary |
|---|
| 0 | 56278.746 | 2594.283 | 58873.029 |
\n\n\n\n\n```python\ndf.argmax()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | 3 | 0 | 0 | 145 | 1516 | 145 |
\n\n\n\n\n```python\ndf['salary'].argmin()\n```\n\n\n\n\n\n\n\n\nCheck if all salaries are greater than 20000\n\n\n```python\ndf1 = df['salary'] > 20000\ndf1.all()\n```\n\n\n\n\n\n\n\n\nCount the number of non-missing values\n\n\n```python\ndf.count()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | 1535 | 1535 | 1535 | 1535 | 1535 | 1535 |
\n\n\n\nGet number of unique values.\n\n\n```python\ndf.nunique()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | 6 | 5 | 2 | 548 | 1318 | 1524 |
\n\n\n\n### Non-Aggregating Methods\nThese are methods that do not return a single value.\n\nGet the unique values of each column. The `unique` method returns a list of DataFrames containing the unique values for each column.\n\n\n```python\ndfs = df.unique()\n```\n\n\n```python\ndfs[0]\n```\n\n\n\n\n | department |
|---|
| 0 | Health & Human Services |
| 1 | Houston Airport System (HAS) |
| 2 | Houston Fire Department (HFD) |
| 3 | Houston Police Department-HPD |
| 4 | Parks & Recreation |
| 5 | Public Works & Engineering-PWE |
\n\n\n\n\n```python\ndfs[1]\n```\n\n\n\n\n | race |
|---|
| 0 | Asian |
| 1 | Black |
| 2 | Hispanic |
| 3 | Native American |
| 4 | White |
\n\n\n\n\n```python\ndfs[2]\n```\n\n\n\n\n\n\n\n\nRename columns with a dictionary.\n\n\n```python\ndf.rename({'department':'dept', 'bonus':'BONUS'}).head()\n```\n\n\n\n\n | dept | race | gender | salary | BONUS | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 1296 | 64462 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 511 | 67125 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 3766 | 46156 |
\n\n\n\nDrop columns with a string or list of strings.\n\n\n```python\ndf.drop('race').head()\n```\n\n\n\n\n | department | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | Male | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | Male | 63166 | 1296 | 64462 |
| 2 | Houston Police Department-HPD | Male | 66614 | 511 | 67125 |
| 3 | Public Works & Engineering-PWE | Male | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | Male | 42390 | 3766 | 46156 |
\n\n\n\n\n```python\ndf.drop(['race', 'gender']).head()\n```\n\n\n\n\n | department | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | 63166 | 1296 | 64462 |
| 2 | Houston Police Department-HPD | 66614 | 511 | 67125 |
| 3 | Public Works & Engineering-PWE | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | 42390 | 3766 | 46156 |
\n\n\n\n### Non-aggregating methods that keep all columns\nThe next several methods are non-aggregating methods that return a DataFrame with the same exact shape as the original. They only work on boolean, integer and float columns and ignore string columns.\n\nAbsolute value\n\n\n```python\ndf.abs().head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 1296 | 64462 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 511 | 67125 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 3766 | 46156 |
\n\n\n\nCumulative min, max, and sum\n\n\n```python\ndf.cummax().head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 3536 | 64462 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 3536 | 67125 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | White | Male | 71680 | 4267 | 75947 |
\n\n\n\nClip values to be within a range.\n\n\n```python\ndf.clip(40000, 60000).head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 40000 | 48815 |
| 1 | Houston Fire Department (HFD) | White | Male | 60000 | 40000 | 60000 |
| 2 | Houston Police Department-HPD | Black | Male | 60000 | 40000 | 60000 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 60000 | 40000 | 60000 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 40000 | 46156 |
\n\n\n\nRound numeric columns\n\n\n```python\ndf.round(-3).head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45000 | 4000 | 49000 |
| 1 | Houston Fire Department (HFD) | White | Male | 63000 | 1000 | 64000 |
| 2 | Houston Police Department-HPD | Black | Male | 67000 | 1000 | 67000 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 72000 | 4000 | 76000 |
| 4 | Houston Airport System (HAS) | White | Male | 42000 | 4000 | 46000 |
\n\n\n\nCopy the DataFrame\n\n\n```python\ndf.copy().head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | 45279 | 3536 | 48815 |
| 1 | Houston Fire Department (HFD) | White | Male | 63166 | 1296 | 64462 |
| 2 | Houston Police Department-HPD | Black | Male | 66614 | 511 | 67125 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 71680 | 4267 | 75947 |
| 4 | Houston Airport System (HAS) | White | Male | 42390 | 3766 | 46156 |
\n\n\n\nTake the nth difference.\n\n\n```python\ndf.diff(2).head(10)\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | nan | nan | nan |
| 1 | Houston Fire Department (HFD) | White | Male | nan | nan | nan |
| 2 | Houston Police Department-HPD | Black | Male | 21335.000 | -3025.000 | 18310.000 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 8514.000 | 2971.000 | 11485.000 |
| 4 | Houston Airport System (HAS) | White | Male | -24224.000 | 3255.000 | -20969.000 |
| 5 | Public Works & Engineering-PWE | White | Male | 36282.000 | -2228.000 | 34054.000 |
| 6 | Houston Fire Department (HFD) | Hispanic | Male | 10254.000 | -2672.000 | 7582.000 |
| 7 | Health & Human Services | Black | Male | 72454.000 | 2893.000 | 75347.000 |
| 8 | Public Works & Engineering-PWE | Black | Male | -22297.000 | 1134.000 | -21163.000 |
| 9 | Health & Human Services | Black | Male | -125147.000 | -2283.000 | -127430.000 |
\n\n\n\nFind the nth percentage change.\n\n\n```python\ndf.pct_change(2).head(10)\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | White | Male | nan | nan | nan |
| 1 | Houston Fire Department (HFD) | White | Male | nan | nan | nan |
| 2 | Houston Police Department-HPD | Black | Male | 0.471 | -0.855 | 0.375 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 0.135 | 2.292 | 0.178 |
| 4 | Houston Airport System (HAS) | White | Male | -0.364 | 6.370 | -0.312 |
| 5 | Public Works & Engineering-PWE | White | Male | 0.506 | -0.522 | 0.448 |
| 6 | Houston Fire Department (HFD) | Hispanic | Male | 0.242 | -0.710 | 0.164 |
| 7 | Health & Human Services | Black | Male | 0.671 | 1.419 | 0.685 |
| 8 | Public Works & Engineering-PWE | Black | Male | -0.424 | 1.037 | -0.394 |
| 9 | Health & Human Services | Black | Male | -0.694 | -0.463 | -0.688 |
\n\n\n\nSort the DataFrame by one or more columns\n\n\n```python\ndf.sort_values('salary').head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | Black | Female | 24960 | 953 | 25913 |
| 1 | Public Works & Engineering-PWE | Hispanic | Male | 26104 | 4258 | 30362 |
| 2 | Public Works & Engineering-PWE | Black | Female | 26125 | 3247 | 29372 |
| 3 | Houston Airport System (HAS) | Hispanic | Female | 26125 | 832 | 26957 |
| 4 | Houston Airport System (HAS) | Black | Female | 26125 | 2461 | 28586 |
\n\n\n\nSort descending\n\n\n```python\ndf.sort_values('salary', asc=False).head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Fire Department (HFD) | White | Male | 210588 | 3724 | 214312 |
| 1 | Houston Police Department-HPD | White | Male | 199596 | 848 | 200444 |
| 2 | Houston Airport System (HAS) | Black | Male | 186192 | 1778 | 187970 |
| 3 | Health & Human Services | Black | Male | 180416 | 4932 | 185348 |
| 4 | Public Works & Engineering-PWE | White | Female | 178331 | 2124 | 180455 |
\n\n\n\nSort by multiple columns\n\n\n```python\ndf.sort_values(['race', 'salary']).head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Airport System (HAS) | Asian | Female | 26125 | 4446 | 30571 |
| 1 | Houston Police Department-HPD | Asian | Male | 27914 | 2855 | 30769 |
| 2 | Houston Police Department-HPD | Asian | Male | 28169 | 2572 | 30741 |
| 3 | Public Works & Engineering-PWE | Asian | Male | 28995 | 2874 | 31869 |
| 4 | Public Works & Engineering-PWE | Asian | Male | 30347 | 4938 | 35285 |
\n\n\n\nRandomly sample the DataFrame\n\n\n```python\ndf.sample(n=3)\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Fire Department (HFD) | White | Male | 62540 | 2995 | 65535 |
| 1 | Public Works & Engineering-PWE | White | Male | 63336 | 1547 | 64883 |
| 2 | Houston Police Department-HPD | White | Male | 52514 | 1150 | 53664 |
\n\n\n\nRandomly sample a fraction\n\n\n```python\ndf.sample(frac=.005)\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Houston Police Department-HPD | Hispanic | Female | 60347 | 1200 | 61547 |
| 1 | Public Works & Engineering-PWE | Black | Male | 49109 | 3598 | 52707 |
| 2 | Health & Human Services | Black | Female | 48984 | 4602 | 53586 |
| 3 | Houston Police Department-HPD | White | Male | 55461 | 2813 | 58274 |
| 4 | Houston Airport System (HAS) | Black | Female | 29286 | 1877 | 31163 |
| 5 | Houston Police Department-HPD | Asian | Male | 66614 | 4480 | 71094 |
| 6 | Houston Fire Department (HFD) | White | Male | 28024 | 4475 | 32499 |
\n\n\n\nSample with replacement\n\n\n```python\ndf.sample(n=10000, replace=True).head()\n```\n\n\n\n\n | department | race | gender | salary | bonus | total salary |
|---|
| 0 | Parks & Recreation | Black | Female | 31075 | 1665 | 32740 |
| 1 | Public Works & Engineering-PWE | Hispanic | Male | 67038 | 644 | 67682 |
| 2 | Houston Police Department-HPD | Black | Male | 37024 | 1532 | 38556 |
| 3 | Health & Human Services | Black | Female | 57433 | 3106 | 60539 |
| 4 | Public Works & Engineering-PWE | Black | Male | 53373 | 924 | 54297 |
\n\n\n\n### String-only methods\n\nUse the `str` accessor to call methods available just to string columns. Pass the name of the string column as the first parameter for all these methods.\n\n\n```python\ndf.str.count('department', 'P').head()\n```\n\n\n\n\n | department |
|---|
| 0 | 2 |
| 1 | 0 |
| 2 | 2 |
| 3 | 2 |
| 4 | 0 |
\n\n\n\n\n```python\ndf.str.lower('department').head()\n```\n\n\n\n\n | department |
|---|
| 0 | houston police department-hpd |
| 1 | houston fire department (hfd) |
| 2 | houston police department-hpd |
| 3 | public works & engineering-pwe |
| 4 | houston airport system (has) |
\n\n\n\n\n```python\ndf.str.find('department', 'Houston').head()\n```\n\n\n\n\n | department |
|---|
| 0 | 0 |
| 1 | 0 |
| 2 | 0 |
| 3 | -1 |
| 4 | 0 |
\n\n\n\n### Grouping\n\npandas_cub provides the `value_counts` method for simple frequency counting of unique values and `pivot_table` for grouping and aggregating.\n\nThe `value_counts` method returns a list of DataFrames, one for each column.\n\n\n```python\ndfs = df[['department', 'race', 'gender']].value_counts()\n```\n\n\n```python\ndfs[0]\n```\n\n\n\n\n | department | count |
|---|
| 0 | Houston Police Department-HPD | 570 |
| 1 | Houston Fire Department (HFD) | 365 |
| 2 | Public Works & Engineering-PWE | 341 |
| 3 | Health & Human Services | 103 |
| 4 | Houston Airport System (HAS) | 103 |
| 5 | Parks & Recreation | 53 |
\n\n\n\n\n```python\ndfs[1]\n```\n\n\n\n\n | race | count |
|---|
| 0 | White | 542 |
| 1 | Black | 518 |
| 2 | Hispanic | 381 |
| 3 | Asian | 87 |
| 4 | Native American | 7 |
\n\n\n\n\n```python\ndfs[2]\n```\n\n\n\n\n | gender | count |
|---|
| 0 | Male | 1135 |
| 1 | Female | 400 |
\n\n\n\nIf your DataFrame has one column, a DataFrame and not a list is returned. You can also return the relative frequency by setting the `normalize` parameter to `True`.\n\n\n```python\ndf['race'].value_counts(normalize=True)\n```\n\n\n\n\n | race | count |
|---|
| 0 | White | 0.353 |
| 1 | Black | 0.337 |
| 2 | Hispanic | 0.248 |
| 3 | Asian | 0.057 |
| 4 | Native American | 0.005 |
\n\n\n\nThe `pivot_table` method allows to group by one or two columns and aggregate values from another column. Let's find the average salary for each race and gender. All parameters must be strings.\n\n\n```python\ndf.pivot_table(rows='race', columns='gender', values='salary', aggfunc='mean')\n```\n\n\n\n\n | race | Female | Male |
|---|
| 0 | Asian | 58304.222 | 60622.957 |
| 1 | Black | 48133.382 | 51853.000 |
| 2 | Hispanic | 44216.960 | 55493.064 |
| 3 | Native American | 58844.333 | 68850.500 |
| 4 | White | 66415.528 | 63439.196 |
\n\n\n\nIf you don't provide `values` or `aggfunc` then by default it will return frequency (a contingency table).\n\n\n```python\ndf.pivot_table(rows='race', columns='gender')\n```\n\n\n\n\n | race | Female | Male |
|---|
| 0 | Asian | 18 | 69 |
| 1 | Black | 207 | 311 |
| 2 | Hispanic | 100 | 281 |
| 3 | Native American | 3 | 4 |
| 4 | White | 72 | 470 |
\n\n\n\nYou can group by just a single column.\n\n\n```python\ndf.pivot_table(rows='department', values='salary', aggfunc='mean')\n```\n\n\n\n\n | department | mean |
|---|
| 0 | Health & Human Services | 51324.981 |
| 1 | Houston Airport System (HAS) | 53990.369 |
| 2 | Houston Fire Department (HFD) | 59960.441 |
| 3 | Houston Police Department-HPD | 60428.746 |
| 4 | Parks & Recreation | 39426.151 |
| 5 | Public Works & Engineering-PWE | 50207.806 |
\n\n\n\n\n```python\ndf.pivot_table(columns='department', values='salary', aggfunc='mean')\n```\n\n\n\n\n | Health & Human Services | Houston Airport System (HAS) | Houston Fire Department (HFD) | Houston Police Department-HPD | Parks & Recreation | Public Works & Engineering-PWE |
|---|
| 0 | 51324.981 | 53990.369 | 59960.441 | 60428.746 | 39426.151 | 50207.806 |
\n\n\n\n\n",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/dexplo/pandas_cub",
"keywords": "",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "pandas-cub",
"package_url": "https://pypi.org/project/pandas-cub/",
"platform": "",
"project_url": "https://pypi.org/project/pandas-cub/",
"project_urls": {
"Homepage": "https://github.com/dexplo/pandas_cub"
},
"release_url": "https://pypi.org/project/pandas-cub/0.0.7/",
"requires_dist": null,
"requires_python": "",
"summary": "A simple data analysis library similar to pandas",
"version": "0.0.7"
},
"last_serial": 4800648,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "1558040f86b7f9618e6d959d8c515f7b",
"sha256": "a4802283a8caa3f6bb690c3a1d25a3fdbec32a3611d527c839ded6d6637571d8"
},
"downloads": -1,
"filename": "pandas_cub-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1558040f86b7f9618e6d959d8c515f7b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 13291,
"upload_time": "2019-02-08T05:38:56",
"url": "https://files.pythonhosted.org/packages/2f/8c/ffc3590586ed96fb67743a720662e4024477b67c7f39ff56be78376d1c96/pandas_cub-0.0.1-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "f7ac41b724c275f517e2820641fc1566",
"sha256": "3534b9b1ca565490ab7baa825679eb453eb144087d7605725a218db4d240d6b6"
},
"downloads": -1,
"filename": "pandas_cub-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "f7ac41b724c275f517e2820641fc1566",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 11852,
"upload_time": "2019-02-08T05:38:58",
"url": "https://files.pythonhosted.org/packages/9f/c1/4d13c9370495da74ef30f0534b490dd26ec4cbe285c214ac626192841fc9/pandas_cub-0.0.1.tar.gz"
}
],
"0.0.2": [
{
"comment_text": "",
"digests": {
"md5": "4d06284b0065b109d3114f6d7388f713",
"sha256": "42774592f4c0f9c6d1b70945286a944bfdbc34e93a190b53ef2b5299d7a195da"
},
"downloads": -1,
"filename": "pandas_cub-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4d06284b0065b109d3114f6d7388f713",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19407,
"upload_time": "2019-02-08T16:07:10",
"url": "https://files.pythonhosted.org/packages/ef/7e/9b9200916463951f5490669683678a12454c316495e756fe3cd74eca1c60/pandas_cub-0.0.2-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "f51e7e7d6567a71033ae4e6a8ab92de6",
"sha256": "15102dca17ee2eb81817e72da08aff214b929d2e45d6760585ba470c12e940e3"
},
"downloads": -1,
"filename": "pandas_cub-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "f51e7e7d6567a71033ae4e6a8ab92de6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29423,
"upload_time": "2019-02-08T16:07:12",
"url": "https://files.pythonhosted.org/packages/5e/0d/e7fd7a3d09138589787737b64b915ec904cecf651b2a8ae0c97c0435c49d/pandas_cub-0.0.2.tar.gz"
}
],
"0.0.3": [
{
"comment_text": "",
"digests": {
"md5": "d5efc3b32f839df05bf271ee8059a311",
"sha256": "7587494e2ad5a595947a549220647ba3f3fcae1eea36507668a88d36ee48eab3"
},
"downloads": -1,
"filename": "pandas_cub-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d5efc3b32f839df05bf271ee8059a311",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19420,
"upload_time": "2019-02-08T17:06:37",
"url": "https://files.pythonhosted.org/packages/4d/14/615a3d26fe85ed7d66f2d09d1e183afcca8576a1b7bfb8ee1a8dcc482f05/pandas_cub-0.0.3-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "815037587cedeae1fe8762cf9eab820b",
"sha256": "cc2a10eb9af7b40527a275c7b21f110e493b890a291b27e97a631e036ba94e16"
},
"downloads": -1,
"filename": "pandas_cub-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "815037587cedeae1fe8762cf9eab820b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29499,
"upload_time": "2019-02-08T17:06:39",
"url": "https://files.pythonhosted.org/packages/67/ca/9b5f532af8b91e20cd4df44c602828d34b3b803413e6ce241e426149bdd2/pandas_cub-0.0.3.tar.gz"
}
],
"0.0.4": [
{
"comment_text": "",
"digests": {
"md5": "56dc4e3890bb0c731656484b5c115ebd",
"sha256": "904fd9f954278429b0f1993b63d65689f845ad3f95b0c9b684c2c8bcc9ded814"
},
"downloads": -1,
"filename": "pandas_cub-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "56dc4e3890bb0c731656484b5c115ebd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19591,
"upload_time": "2019-02-08T17:17:03",
"url": "https://files.pythonhosted.org/packages/c5/fe/af5e0ce8a142c4711898eda492caa643b8da3183eb08c42e0892a7ff1c29/pandas_cub-0.0.4-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "7b355a9c256ae24675adeddd3bffac7d",
"sha256": "f0725d6f3bc24d8a64ba7622aeb81fe34ffdfbf5f512de98d91f2c07f1225d9a"
},
"downloads": -1,
"filename": "pandas_cub-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "7b355a9c256ae24675adeddd3bffac7d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 35960,
"upload_time": "2019-02-08T17:17:04",
"url": "https://files.pythonhosted.org/packages/24/29/26b88424db6c7b6a133782160d8a519da57b69ec1c48f0c5f63e9ebafb45/pandas_cub-0.0.4.tar.gz"
}
],
"0.0.5": [
{
"comment_text": "",
"digests": {
"md5": "8fc41b8e87929e4fa53696a594bfd5ea",
"sha256": "9b1555b2d26f92fd38a1d5a5cd1d9de39077162f172a3d342c10074e067f0fd3"
},
"downloads": -1,
"filename": "pandas_cub-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8fc41b8e87929e4fa53696a594bfd5ea",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19430,
"upload_time": "2019-02-08T17:25:29",
"url": "https://files.pythonhosted.org/packages/f8/5d/d6602ed9202d1b92a6dbb7b3f0ea1ddd06629f5373b74373197535eb24b0/pandas_cub-0.0.5-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "0326c693a119661c3d433768cd15e93c",
"sha256": "7bf003f0fb5e98f0fe77afd523b58c3ea97ef9d31278fc37e137e2979a2cdc63"
},
"downloads": -1,
"filename": "pandas_cub-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "0326c693a119661c3d433768cd15e93c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29541,
"upload_time": "2019-02-08T17:25:31",
"url": "https://files.pythonhosted.org/packages/1a/70/e10b1e4ab0d9462ef1c0feed1224cc5d0f45d8d4b268949b4b0c2ed9b8c7/pandas_cub-0.0.5.tar.gz"
}
],
"0.0.6": [
{
"comment_text": "",
"digests": {
"md5": "984296ba1aecea5cb5b854ddfbd56b9c",
"sha256": "7950efdaf84c984c926536d6301725c62ac3c96a0d6c7bc23275e8b574e4546a"
},
"downloads": -1,
"filename": "pandas_cub-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "984296ba1aecea5cb5b854ddfbd56b9c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19449,
"upload_time": "2019-02-08T17:32:26",
"url": "https://files.pythonhosted.org/packages/ae/b5/875e6b2f55158ee0ade90254fcce8c60c3366da7e4c83f5ab713a88ec99d/pandas_cub-0.0.6-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "af86d54c49384a6d293f5bc76552be54",
"sha256": "d647002bbf5951d21b805ae639c18928d342ff06f904dfceeb5660e292defa02"
},
"downloads": -1,
"filename": "pandas_cub-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "af86d54c49384a6d293f5bc76552be54",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 29627,
"upload_time": "2019-02-08T17:32:28",
"url": "https://files.pythonhosted.org/packages/76/98/6948f6177c5eb409f16c818f16113db1e87f27fc2865eb89e0cedaea6278/pandas_cub-0.0.6.tar.gz"
}
],
"0.0.7": [
{
"comment_text": "",
"digests": {
"md5": "49dae94e7b4edfb8bbee50d3ff6a657e",
"sha256": "4387cff7c1789d43fc8104f488949718ca8524a696c930860a59627706dc1762"
},
"downloads": -1,
"filename": "pandas_cub-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "49dae94e7b4edfb8bbee50d3ff6a657e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19726,
"upload_time": "2019-02-09T23:12:45",
"url": "https://files.pythonhosted.org/packages/63/bd/f50abfc6829676589fdd90add2a7b2c71f918875331004b5c8b70694a674/pandas_cub-0.0.7-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "12bc30ffaf6c1edac0e249404c02df87",
"sha256": "06caacbe1d804e86d58e8d2f37afdf8cf402d6ce8f2c2495ef25c16b718f643c"
},
"downloads": -1,
"filename": "pandas_cub-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "12bc30ffaf6c1edac0e249404c02df87",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 30450,
"upload_time": "2019-02-09T23:12:47",
"url": "https://files.pythonhosted.org/packages/f8/1f/32165664749f3cfcaa14a03443e65ed264e624894675180262e710e09d9a/pandas_cub-0.0.7.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "49dae94e7b4edfb8bbee50d3ff6a657e",
"sha256": "4387cff7c1789d43fc8104f488949718ca8524a696c930860a59627706dc1762"
},
"downloads": -1,
"filename": "pandas_cub-0.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "49dae94e7b4edfb8bbee50d3ff6a657e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 19726,
"upload_time": "2019-02-09T23:12:45",
"url": "https://files.pythonhosted.org/packages/63/bd/f50abfc6829676589fdd90add2a7b2c71f918875331004b5c8b70694a674/pandas_cub-0.0.7-py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "12bc30ffaf6c1edac0e249404c02df87",
"sha256": "06caacbe1d804e86d58e8d2f37afdf8cf402d6ce8f2c2495ef25c16b718f643c"
},
"downloads": -1,
"filename": "pandas_cub-0.0.7.tar.gz",
"has_sig": false,
"md5_digest": "12bc30ffaf6c1edac0e249404c02df87",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 30450,
"upload_time": "2019-02-09T23:12:47",
"url": "https://files.pythonhosted.org/packages/f8/1f/32165664749f3cfcaa14a03443e65ed264e624894675180262e710e09d9a/pandas_cub-0.0.7.tar.gz"
}
]
}