{ "info": { "author": "Shashi Deep Dulam", "author_email": "shashidulam83@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7" ], "description": "# Data Analysis Library Sharks\n\nThis repository contains a detailed note of a new data analysis library Sharks.\n\n\n\n## Objectives\n\nSharks :\n\n* Has A DataFrame class with data stored in numpy arrays\n* Can Select subsets of data with the brackets operator\n* Can Use special methods defined in the Python data model\n* Have a nicely formatted display of the DataFrame in the notebook\n* Can Implement aggregation methods - sum, min, max, mean, median, etc...\n* Can Implement non-aggregation methods such as isna, unique, rename, drop\n* Can Group by one or two columns\n* Have methods specific to string columns\n* Can Read in data from a comma-separated value file\n\n\n\n## Functionalities of Sharks\n\n\n### 1. DataFrame constructor input types\n\nOur DataFrame class is constructed with a single parameter.\n\nSpecifically, input types must qualify the following:\n\n* Sharks will raise a `TypeError` if `data` is not a dictionary\n* Sharks will raise a `TypeError` if the keys of `data` are not strings\n* Sharks will raise a `TypeError` if the values of `data` are not numpy arrays\n* Sharks will raise a `ValueError` if the values of `data` are not 1-dimensional\n\n\n\n\n### 2. Array lengths\n\nWe are now guaranteed that `data` is a dictionary of strings mapped to one-dimensional arrays. Each column of data in our DataFrame must have the same number of elements. \n\n### 3. Convert unicode arrays to object\n\nWhenever you create a numpy array of Python strings, it will default the data type of that array to unicode. Take a look at the following simple numpy array created from strings. Its data type, found in the `dtype` attribute is shown to be 'U' plus the length of the longest string.\n\n```python\n>>> a = np.array(['cat', 'dog', 'snake'])\n>>> a.dtype\ndtype('>> s = df['a'] > 10\n>>> df[s]\n```\n\n\n### 14. Check for simultaneous selection of rows and columns\n\n\n\n### 15. Select a single cell of data\n\nSharks can select a single cell of data with `df[rs, cs]`. We will assume `rs` is an integer and `cs` is either an integer or a string.\n\n\n### 16. Simultaneously select rows as booleans, lists, or slices\n\nSharks can select rows and columns simultaneously with `df[rs, cs]`. We will allow `rs` to be either a single-column boolean DataFrame, a list of integers, or a slice. \n\n### 17. Simultaneous selection with multiple columns as a list\n\n\n### 18. Simultaneous selection with column slices\n\nSharks will allow columns to be sliced with either strings or integers. The following selections will be acceptable.\n\n\n### 19. Tab Completion for column names\n\n\n### 20. Create a new column or overwrite an old column\n\n\n### 21. `head` and `tail` methods\n\nThe `head` and `tail` methods each accept a single integer parameter `n` which is defaulted to 5. \n\n### 22. Generic aggregation methods\n\nSharks can implement several methods that perform an aggregation. These methods all return a single value for each column. The following aggregation methods are defined.\n\n* min\n* max\n* mean\n* median\n* sum\n* var\n* std\n* all\n* any\n* argmax - index of the maximum\n* argmin - index of the minimum\n\n\n### 23. `isna` method\n\nThe `isna` method will return a DataFrame the same shape as the original but with boolean values for every single value. Each value will be tested whether it is missing or not. Use `np.isnan` except in the case for strings which you can use a vectorized equality expression to `None`.\n\n### 24. `count` method\n\nThe `count` method returns a single-row DataFrame with the number of non-missing values for each column. You will want to use the result of `isna`.\n\n\n\n### 25. `unique` method\n\nThis method will return the unique values for each column in the DataFrame. Specifically, it will return a list of one-column DataFrames of unique values in each column. If there is a single column, just return the DataFrame.\n\n\n\n### 26. `nunique` method\n\nReturn a single-row DataFrame with the number of unique values for each column.\n\n\n### 27. `value_counts` method\n\n\n### 28. Normalize options for `value_counts`\n\n\n### 29. `rename` method\n\nThe `rename` method renames one or more column names. Accept a dictionary of old column names mapped to new column names. Return a DataFrame. Raise a `TypeError` if `columns` is not a dictionary.\n\n\n\n### 30. `drop` method\n\nAccept a single string or a list of column names as strings. Return a DataFrame without those columns. Raise a `TypeError` if a string or list is not provided.\n\n\n\n### 31. Non-aggregation methods\n\nThere are several non-aggregation methods that function similarly. All of the following non-aggregation methods return a DataFrame that is the same shape as the origin.\n\n* `abs`\n* `cummin`\n* `cummax`\n* `cumsum`\n* `clip`\n* `round`\n* `copy`\n\n\n\n\n\n### 32. Arithmetic and Comparison Operators\n\nAll the common arithmetic and comparison operators will be made available to our DataFrame. For example, `df + 5` uses the plus operator to add 5 to each element of the DataFrame. Take a look at some of the following examples:\n\n```python\ndf + 5\ndf - 5\ndf > 5\ndf != 5\n5 + df\n5 < df\n```\n\n\n### 33. `sort_values` method\n\nThis method will sort the rows of the DataFrame by one or more columns. Allow the parameter `by` to be either a single column name as a string or a list of column names as strings. The DataFrame will be sorted by this column or columns.\n\n\n\n### 34. `pivot_table` method\n\n\n### 35. Reading simple CSVs\n\nThe `read_csv` function, will read in simple comma-separated value files (CSVs) and return a DataFrame.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Shashideep83/shark_lib", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "sharks", "package_url": "https://pypi.org/project/sharks/", "platform": "", "project_url": "https://pypi.org/project/sharks/", "project_urls": { "Homepage": "https://github.com/Shashideep83/shark_lib" }, "release_url": "https://pypi.org/project/sharks/1.0.2/", "requires_dist": [ "numpy" ], "requires_python": "", "summary": "A Python package for data analysis.", "version": "1.0.2" }, "last_serial": 5831346, "releases": { "1.0.1": [ { "comment_text": "", "digests": { "md5": "25b1f5d9156efe8271200486475b76bc", "sha256": "43cf031e2da68528baa9b352ddbe723dac4fc4847d08b604d2c8173ca213fbcc" }, "downloads": -1, "filename": "sharks-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "25b1f5d9156efe8271200486475b76bc", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11578, "upload_time": "2019-09-15T09:53:56", "url": "https://files.pythonhosted.org/packages/40/71/1ea6c7fd369c8bcdbf640ea9aa46632f0e393dc5dfbb32d0c755b7a4f424/sharks-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2e194016b330f82504740fcb60ff1805", "sha256": "9b82c68fea1dfb52d0fa31969df09109a3f70e22d5c0ac4ee301e0b985a2e2b7" }, "downloads": -1, "filename": "sharks-1.0.1.tar.gz", "has_sig": false, "md5_digest": "2e194016b330f82504740fcb60ff1805", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13786, "upload_time": "2019-09-15T09:53:59", "url": "https://files.pythonhosted.org/packages/f3/da/93ac900fc7fc6d70dc1464b8e18bf9c7968acb37a2b1e3085d1aff99a70a/sharks-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "30001b8964afaaac5ed6eeb95cc77dc6", "sha256": "730b2d818d10a436d65ca0fc0552026277b438aed3722ed325f2e7b72cce1e41" }, "downloads": -1, "filename": "sharks-1.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "30001b8964afaaac5ed6eeb95cc77dc6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11575, "upload_time": "2019-09-15T10:08:11", "url": "https://files.pythonhosted.org/packages/97/c1/6840e8b7eac7b11b40b2fe568444df4cfbe31fe497135de3b72a77c1e915/sharks-1.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "53eac988def538c468d5d98f3e009d7d", "sha256": "073e5b0ddcca400b193f97b77bd5e737aedaa37cf60481f8f5deab73db80182b" }, "downloads": -1, "filename": "sharks-1.0.2.tar.gz", "has_sig": false, "md5_digest": "53eac988def538c468d5d98f3e009d7d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13770, "upload_time": "2019-09-15T10:08:12", "url": "https://files.pythonhosted.org/packages/c2/1e/47803e24de2b9f5ce3f22c23be6c70439765cb7fcf406ea668b0e372ce90/sharks-1.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "30001b8964afaaac5ed6eeb95cc77dc6", "sha256": "730b2d818d10a436d65ca0fc0552026277b438aed3722ed325f2e7b72cce1e41" }, "downloads": -1, "filename": "sharks-1.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "30001b8964afaaac5ed6eeb95cc77dc6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11575, "upload_time": "2019-09-15T10:08:11", "url": "https://files.pythonhosted.org/packages/97/c1/6840e8b7eac7b11b40b2fe568444df4cfbe31fe497135de3b72a77c1e915/sharks-1.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "53eac988def538c468d5d98f3e009d7d", "sha256": "073e5b0ddcca400b193f97b77bd5e737aedaa37cf60481f8f5deab73db80182b" }, "downloads": -1, "filename": "sharks-1.0.2.tar.gz", "has_sig": false, "md5_digest": "53eac988def538c468d5d98f3e009d7d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13770, "upload_time": "2019-09-15T10:08:12", "url": "https://files.pythonhosted.org/packages/c2/1e/47803e24de2b9f5ce3f22c23be6c70439765cb7fcf406ea668b0e372ce90/sharks-1.0.2.tar.gz" } ] }