{
    "info": {
        "author": "Ralph Puzon",
        "author_email": "puzonralph@gmail.com",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "# rosebud\n\n```rosebud``` is an open-source tool for pulling CSVs into python, and immediately extracting basic statistics (mean, median, mode, quartile\ndata, etc.) into variables, and has the capability of plotting these preliminary statistics via seaborn pairplots. rosebud also uses the visual capabilities of [missingno](https://github.com/ResidentMario/missingno) to show a visual representation of the missing data within the CSVs.\n\nrosebud is currently a single-handed project, and welcomes the community for contributions.  \n\n## Installation\n\nCurrent implementation will require user to download the script into the working directory, and import into script by using\n\n```python\nfrom rosebud import *\n```\nrosebud will be installable via the [pip](https://pip.pypa.io/en/stable/) package manager in the future.\n\n##### Required packages:\n- os\n- numpy\n- pandas\n- matplotlib.pyplot\n- seaborn\n- [missingno](https://github.com/ResidentMario/missingno)\n-  warnings\n\n## Usage\n\n### tablesandstats()\n```python\ntablesandstats(filepath, show_plots = 'all')\n```\n\nrosebud's ```tablesandstats()``` function is the libraries main function that does the following:\n- Import the .csv file into your workspace, and turn the file into a DataFrame.\n- Create the statistic variables of the DataFrame, following the pattern name ```columnName_indexName```, derived from python's ```df.summary()``` function.\n- Generate a correlation matrix heatmap of the DataFrame.\n- Generate a \"missing value ratio\" grid for your columns, which shows what percentage of the data are null/NaN values.\n- Creates a visualization of the missing valuves via missingno's ```missingo.matrix()``` function\n\n###### *for the following example, we will be using a file called 'Future_500'\n\nExample:\n\ncalling:\n```python\ntablesandvars(\"C:/Users/YourName/.../Future_500.csv\")\n```\n\nreturns:\n```text\nRosebud is creating tables and statistics from Future 500...\n\nMeasures of Center and Basic Descriptive Statistics of Future 500: \n                ID    Inception    Employees        Profit\ncount  500.000000   499.000000   498.000000  4.980000e+02\nmean   250.500000  2010.174349   148.610442  6.539474e+06\nstd    144.481833     3.228211   397.353657  3.869934e+06\nmin      1.000000  1999.000000     1.000000  1.243400e+04\n25%    125.750000  2009.000000    27.250000  3.272074e+06\n50%    250.500000  2011.000000    56.000000  6.513366e+06\n75%    375.250000  2012.000000   126.000000  9.303951e+06\nmax    500.000000  2014.000000  7125.000000  1.962453e+07\n\nFeature Data Types of Future 500:\nID             int64\nName          object\nIndustry      object\nInception    float64\nEmployees    float64\nState         object\nCity          object\nRevenue       object\nExpenses      object\nProfit       float64\nGrowth        object\n\n\nFeature Corrrelations of Future 500:\n```\n![Future_500_correlation_hmap](https://github.com/RalphPuzon/rosebud/blob/master/rosebud_readme_images/Future_500_correlation_hmap.jpg)\n\n```text\nDataset completeness of Future 500:\n\n- Future 500 missing value ratio (percentage):\n\nID           0.0\nName         0.0\nIndustry     0.4\nInception    0.2\nEmployees    0.4\nState        0.8\nCity         0.0\nRevenue      0.4\nExpenses     0.6\nProfit       0.4\nGrowth       0.2\n\n- Visual representation of missing value ratio:\n```\n![Future_500_data_completeness](https://github.com/RalphPuzon/rosebud/blob/master/rosebud_readme_images/Future_500_data_completeness.jpg)\n```\nTables created:\n* Future_500\n* Future_500_Normalized\n```\nvariable example from above process:\n```python\nprint(NAICS_count)\n231985.0\n```\n#### tablesandstats() parameters:\n- filepath = the directory of the .csv file\n- show_plots = you can choose which specific plots are shown on screen. takes the following values: \n\n  - 'all': show all charts\n  - 'none': show no charts\n  - 'heatmap': show correlation grid only\n  - 'completeness': show missing data visualization only\n\n### processfolder()\n```python\nprocessfolder(folderpath, show_plots = 'all')\n```\n```processfolder()``` is a wrapper for ```tablesandstats()``` which allows you to perform the ```tablesandstats()``` function on all .csv files in a folder. This function takes in the same parameters as ```tablesandstats()```, but expects a folder path containing the .csv files, instead of the individual file path.\n\n### survey()\n```python\nsurvey(filepath, filter_by = 'all', regress = False)\n```\n```survey()``` takes in the file path, normalizes data, and performs pair plotting of the features as determined by correlation grid, stratified by levels of correlation.\nThe function also prints out the pairwise correlations stratification of the features. \n\n###### *for the following example, we will be using a file called 'data_numsOnly'\n\nExample:\n\ncalling:\n```python\nsurvey(\"C:/Users/YourName/.../data_numsOnly.csv\", filter_by = 'strong_pos')\n```\n\nreturns:\n```text\nRosebud is surveying out the data in data numsOnly (note: graphical scale is derived from a normalized data set)\n\n!! NOTE: data numsOnly contains NaN values, which may affect true correlation value !!\n\nStrong positive correlations:\n\n[['Establishments' 'Average Employment']\n ['Establishments' 'Total Wage']\n ['Average Employment' 'Total Wage']]\n\nWeak positive correlations:\n\n[['NAICS' 'Year']]\n\nFeatures with no correlations:\n\n[['NAICS' 'Establishments']\n ['NAICS' 'Average Employment']\n ['NAICS' 'Total Wage']\n ['NAICS' 'Annual Average Salary']\n ['Year' 'Establishments']\n ['Year' 'Average Employment']\n ['Year' 'Total Wage']\n ['Year' 'Annual Average Salary']\n ['Establishments' 'Annual Average Salary']\n ['Establishments' 'Years Active']\n ['Average Employment' 'Annual Average Salary']\n ['Average Employment' 'Years Active']\n ['Total Wage' 'Annual Average Salary']\n ['Total Wage' 'Years Active']\n ['Annual Average Salary' 'Years Active']]\n\nWeak Negative correlations:\n\n[['NAICS' 'Years Active']]\n\nStrong Negative correlations:\n\n[['Year' 'Years Active']]\n\nPairwise relationship graphs of strong positive correlation features:\n```\n![data_numsOnly_Strong_Positive_Feature_Correlations](https://github.com/RalphPuzon/rosebud/blob/master/rosebud_readme_images/data_numsOnly_Strong_Positive_Feature_Correlations.jpg)\n\n#### survey() parameters:\n- filepath = the directory of the .csv file\n- filter_by = select the stata of correlation you want plotted:\n  - 'all' = plot all correlations (note: large datasets will take a long time for visualization)\n  - 'strong_pos' = strong positive correlation\n  - 'weak_pos' = weak positive correlation\n  - 'no_corr' No correlation\n  - 'weak_neg' = weak negative correlation\n  - 'strong_neg' = strong negative correlation\n\n- regress = include a best-fit line to the pairplots\n\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/)\n```\n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/ralphpuzon/rosebud",
        "keywords": "data,data visualization,data exploration,data analysis,missing data,data science,pandas,python,jupyter",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "rosebud",
        "package_url": "https://pypi.org/project/rosebud/",
        "platform": "",
        "project_url": "https://pypi.org/project/rosebud/",
        "project_urls": {
            "Homepage": "https://github.com/ralphpuzon/rosebud"
        },
        "release_url": "https://pypi.org/project/rosebud/0.1/",
        "requires_dist": [
            "numpy",
            "pandas",
            "matplotlib",
            "seaborn",
            "missingno"
        ],
        "requires_python": "",
        "summary": "A python data exploration and analytics package",
        "version": "0.1"
    },
    "last_serial": 6004580,
    "releases": {
        "0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "e83a4e427a63c13484eea98350af1d5b",
                    "sha256": "68b9512aad5bcb32de3dcbdc2a199a23bacd9ba2496e4f72b85beae4ad67a413"
                },
                "downloads": -1,
                "filename": "rosebud-0.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "e83a4e427a63c13484eea98350af1d5b",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 8340,
                "upload_time": "2019-10-20T20:25:10",
                "url": "https://files.pythonhosted.org/packages/0c/a5/ae60bc40f79cf22d4fd709d521b2c6e5d978755efa7cc08226f4552bc4fc/rosebud-0.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "25845401c2b605885f278d2f59fbb306",
                    "sha256": "3e7ea33b393779c309d313d628180240e2392866dbba8c526731f9b3aa392c13"
                },
                "downloads": -1,
                "filename": "rosebud-0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "25845401c2b605885f278d2f59fbb306",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 7509,
                "upload_time": "2019-10-20T20:25:13",
                "url": "https://files.pythonhosted.org/packages/c9/88/510d41244ea4a4f91026b9cb8299bd1ef0be9c73b833d14caf05216e6f93/rosebud-0.1.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e83a4e427a63c13484eea98350af1d5b",
                "sha256": "68b9512aad5bcb32de3dcbdc2a199a23bacd9ba2496e4f72b85beae4ad67a413"
            },
            "downloads": -1,
            "filename": "rosebud-0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e83a4e427a63c13484eea98350af1d5b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 8340,
            "upload_time": "2019-10-20T20:25:10",
            "url": "https://files.pythonhosted.org/packages/0c/a5/ae60bc40f79cf22d4fd709d521b2c6e5d978755efa7cc08226f4552bc4fc/rosebud-0.1-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "25845401c2b605885f278d2f59fbb306",
                "sha256": "3e7ea33b393779c309d313d628180240e2392866dbba8c526731f9b3aa392c13"
            },
            "downloads": -1,
            "filename": "rosebud-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "25845401c2b605885f278d2f59fbb306",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7509,
            "upload_time": "2019-10-20T20:25:13",
            "url": "https://files.pythonhosted.org/packages/c9/88/510d41244ea4a4f91026b9cb8299bd1ef0be9c73b833d14caf05216e6f93/rosebud-0.1.tar.gz"
        }
    ]
}