{ "info": { "author": "Martin Thoma", "author_email": "info@martin-thoma.de", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development", "Topic :: Utilities" ], "description": "[![Python Support](https://img.shields.io/pypi/pyversions/edapy.svg)](https://pypi.org/project/edapy/)\n[![Build Status](https://travis-ci.org/MartinThoma/edapy.svg?branch=master)](https://travis-ci.org/MartinThoma/edapy)\n[![Coverage Status](https://coveralls.io/repos/github/MartinThoma/edapy/badge.svg?branch=master)](https://coveralls.io/github/MartinThoma/edapy?branch=master)\n\nedapy is a first resource to analyze a new dataset.\n\n## Installation\n\n```\n$ pip install git+https://github.com/MartinThoma/edapy.git\n```\n\nFor the pdf part, you also need `pdftotext`:\n\n```\n$ sudo apt-get install poppler-utils\n```\n\n\n## Usage\n\n```\n$ edapy --help\nUsage: edapy [OPTIONS] COMMAND [ARGS]...\n\n edapy is a tool for exploratory data analysis with Python.\n\n You can use it to get a first idea what a CSV is about or to get an\n overview over a directory of PDF files.\n\nOptions:\n --version Show the version and exit.\n --help Show this message and exit.\n\nCommands:\n csv Analyze CSV files.\n images Analyze image files.\n pdf Analyze PDF files.\n```\n\nThe workflow is as follows:\n\n* `edapy pdf find --path . --output results.csv` creates a `results.csv`\n for you. This `results.csv` contains meta data about all PDF files in the\n `path` directory.\n* `edapy csv predict --csv_path my-new.csv --types types.yaml` will start /\n resume a process in which the user is lead through a series of questions. In\n those questions, the user has to decide which delimiter, quotechar is used\n and which types the columns have.\n* `edapy` generates a `types.yaml` file which can be used to load the CSV in\n other applications with `df = edapy.load_csv(csv_path, yaml_path)`.\n\n\n## Example types.yaml\n\nFor the [Titanic Dataset](https://www.kaggle.com/c/titanic/data), the resulting\n`types.yaml` looks as follows:\n\n```\ncolumns:\n- dtype: other\n name: Name\n- dtype: int\n name: Parch\n- dtype: float\n name: Age\n- dtype: other\n name: Ticket\n- dtype: float\n name: Fare\n- dtype: int\n name: PassengerId\n- dtype: other\n name: Cabin\n- dtype: other\n name: Embarked\n- dtype: int\n name: Pclass\n- dtype: int\n name: Survived\n- dtype: other\n name: Sex\n- dtype: int\n name: SibSp\ncsv_meta:\n delimiter: ','\n```\n\nA sample run then would look like this:\n\n```\n$ edapy csv predict --types types_titanik.yaml --csv_path train.csv\nNumber of datapoints: 891\n2018-04-16 21:51:56,279 WARNING Column 'Survived' has only 2 different values ([0, 1]). You might want to make it a 'category'\n2018-04-16 21:51:56,280 WARNING Column 'Pclass' has only 3 different values ([3, 1, 2]). You might want to make it a 'category'\n2018-04-16 21:51:56,281 WARNING Column 'Sex' has only 2 different values (['male', 'female']). You might want to make it a 'category'\n2018-04-16 21:51:56,282 WARNING Column 'SibSp' has only 7 different values ([0, 1, 2, 4, 3, 8, 5]). You might want to make it a 'category'\n2018-04-16 21:51:56,283 WARNING Column 'Parch' has only 7 different values ([0, 1, 2, 5, 3, 4, 6]). You might want to make it a 'category'\n2018-04-16 21:51:56,285 WARNING Column 'Embarked' has only 3 different values (['S', 'C', 'Q']). You might want to make it a 'category'\n\n## Integer Columns\nColumn name: Non-nan mean std min 25% 50% 75% max\nPassengerId: 891 446.00 257.35 1 224 446 668 891\nSurvived : 891 0.38 0.49 0 0 0 1 1\nPclass : 891 2.31 0.84 1 2 3 3 3\nSibSp : 891 0.52 1.10 0 0 0 1 8\nParch : 891 0.38 0.81 0 0 0 0 6\n\n## Float Columns\nColumn name: Non-nan mean std min 25% 50% 75% max\nAge : 714 29.70 14.53 0.42 20.12 28.00 38.00 80.00\nFare : 891 32.20 49.69 0.00 7.91 14.45 31.00 512.33\n\n## Other Columns\nColumn name: Non-nan unique top (count)\nName : 891 891 Goldschmidt, Mr. George B (1)\nSex : 891 2 male (577)\nTicket : 891 681 347082 (7)\nCabin : 204 148 C23 C25 C27 (4)\nEmbarked : 889 4 S (644)\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/MartinThoma/edapy", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/MartinThoma/edapy", "keywords": "EDA,Data Science", "license": "MIT", "maintainer": "Martin Thoma", "maintainer_email": "info@martin-thoma.de", "name": "edapy", "package_url": "https://pypi.org/project/edapy/", "platform": "Linux", "project_url": "https://pypi.org/project/edapy/", "project_urls": { "Download": "https://github.com/MartinThoma/edapy", "Homepage": "https://github.com/MartinThoma/edapy" }, "release_url": "https://pypi.org/project/edapy/0.2.3/", "requires_dist": [ "cfg-load (>=0.3.1)", "click (>=6.7)", "pandas (>=0.20.3)", "Pillow (>=4.2.1)", "PyPDF2 (>=1.26.0)", "PyYAML (>=3.12)" ], "requires_python": "", "summary": "A tookit for exploratoriy data analysis.", "version": "0.2.3" }, "last_serial": 3945573, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9a55570fd64510c996d309dd8802f267", "sha256": "88dac90268b0cd2608e2e44b9bdefa3619f0cc184b50930af792b3b816c52096" }, "downloads": -1, "filename": "edapy-0.1.0-py2-none-any.whl", "has_sig": false, "md5_digest": "9a55570fd64510c996d309dd8802f267", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 8529, "upload_time": "2017-12-25T07:29:36", "url": "https://files.pythonhosted.org/packages/29/50/f7096491ce1e9a05754792a34389f62c92d33261f96f60230b4a97e1464b/edapy-0.1.0-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2add073c3b07d49877a8b48eeb20f613", "sha256": "363f270f0c89ff52bc980d9bb4424bcc2a4afadcddb29752a0d90e5b5ffd16b7" }, "downloads": -1, "filename": "edapy-0.1.0.tar.gz", "has_sig": false, "md5_digest": "2add073c3b07d49877a8b48eeb20f613", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6053, "upload_time": "2017-12-25T07:29:37", "url": "https://files.pythonhosted.org/packages/a6/09/7bf09c929fb84557753653388007e6111e0fa8fbacfbc39ca617121a06e5/edapy-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "af75697e42bcff7fa079b0c08f270241", "sha256": "3ef8b7cf0e9a2f6380fde7b6835caef4b315de3eeffe4b32d64250e2e2a4455e" }, "downloads": -1, "filename": "edapy-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "af75697e42bcff7fa079b0c08f270241", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9845, "upload_time": "2018-01-05T10:43:06", "url": "https://files.pythonhosted.org/packages/de/61/6d4120b64de128d1ef4764a57bb40ba3c57a6a04be92c6e6d9b58866f60d/edapy-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d0f696ebb80f8f7a9fa0a7e5a20753f0", "sha256": "e12e3e98241607cf1dd4f784246305c764f7148d61dee204fa1e052ebb6afb5a" }, "downloads": -1, "filename": "edapy-0.2.0.tar.gz", "has_sig": false, "md5_digest": "d0f696ebb80f8f7a9fa0a7e5a20753f0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7677, "upload_time": "2018-01-05T10:43:09", "url": "https://files.pythonhosted.org/packages/7f/48/4edec8245448b91bc856b0890c318a0c136547f9882c4770c35bcbc7a785/edapy-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "9d809d2e54ff8d42175513a3c1c7af1f", "sha256": "11e1ca443fe662fba66b2c4840ec11f5e0dfa5cd1af64895c84974829823c7a9" }, "downloads": -1, "filename": "edapy-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "9d809d2e54ff8d42175513a3c1c7af1f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12581, "upload_time": "2018-05-26T22:20:42", "url": "https://files.pythonhosted.org/packages/3b/17/f85c18f2174620a95e816b96410183d089dd556055c97b6959f21d734053/edapy-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9f151f9864e94208a87efd255becf770", "sha256": "bde0cc72af070e9c57d8a18c41118af8f5c0b2c1d1a23c0e3524b9f179b486b1" }, "downloads": -1, "filename": "edapy-0.2.1.tar.gz", "has_sig": false, "md5_digest": "9f151f9864e94208a87efd255becf770", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11395, "upload_time": "2018-05-26T22:20:43", "url": "https://files.pythonhosted.org/packages/ad/66/ff0c6f72b4aa6409f226ddbad30f7a6ff434b8ae4b67d0800e03fe2b3aa4/edapy-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "5d03e13d1daf0e8e4c3f211b03f37286", "sha256": "93da4b7b603dcdf7e199e6542b2e522e016a30dee944653eb46627d1a02dd27f" }, "downloads": -1, "filename": "edapy-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "5d03e13d1daf0e8e4c3f211b03f37286", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14129, "upload_time": "2018-06-09T16:13:15", "url": "https://files.pythonhosted.org/packages/a7/6f/bdcfe66ff753ed976435d1bbd72a7e9d440917255e24e851d0c65c33ad63/edapy-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "083a6b47210701c8043188a80737fbe0", "sha256": "da26081f3aa2737f3ce26a4c9219b6153940e9ef9121c56fe29c890b8148dcba" }, "downloads": -1, "filename": "edapy-0.2.2.tar.gz", "has_sig": false, "md5_digest": "083a6b47210701c8043188a80737fbe0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13199, "upload_time": "2018-06-09T16:13:16", "url": "https://files.pythonhosted.org/packages/53/ea/62624febb5d0f41bbb3a2b0d2de27bbb28eafd4207c5afe95f39a3bc5090/edapy-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "b2b7ac43d0aecce5db4b085be55a72f6", "sha256": "9efe027ebf2b519800b94b621e82a433a580a1cad0f0c30597c21259ba0b586a" }, "downloads": -1, "filename": "edapy-0.2.3-py3-none-any.whl", "has_sig": false, "md5_digest": "b2b7ac43d0aecce5db4b085be55a72f6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14156, "upload_time": "2018-06-09T16:14:34", "url": "https://files.pythonhosted.org/packages/48/73/173ee023eb4b6088dedb935187b90f9710aad735a3b2ef8d607467639a79/edapy-0.2.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4e7fd90137b599fd754c69bc1c02419", "sha256": "ee9b5473990a9234c39dc9984f1b85148c538089a83dcffb7bb3d53345e260f5" }, "downloads": -1, "filename": "edapy-0.2.3.tar.gz", "has_sig": false, "md5_digest": "c4e7fd90137b599fd754c69bc1c02419", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13253, "upload_time": "2018-06-09T16:14:36", "url": "https://files.pythonhosted.org/packages/0f/34/06b8491c4784c51d88d6aabb16c09eb2c1d665683003a1d8c8dc1e0bd12a/edapy-0.2.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b2b7ac43d0aecce5db4b085be55a72f6", "sha256": "9efe027ebf2b519800b94b621e82a433a580a1cad0f0c30597c21259ba0b586a" }, "downloads": -1, "filename": "edapy-0.2.3-py3-none-any.whl", "has_sig": false, "md5_digest": "b2b7ac43d0aecce5db4b085be55a72f6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14156, "upload_time": "2018-06-09T16:14:34", "url": "https://files.pythonhosted.org/packages/48/73/173ee023eb4b6088dedb935187b90f9710aad735a3b2ef8d607467639a79/edapy-0.2.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c4e7fd90137b599fd754c69bc1c02419", "sha256": "ee9b5473990a9234c39dc9984f1b85148c538089a83dcffb7bb3d53345e260f5" }, "downloads": -1, "filename": "edapy-0.2.3.tar.gz", "has_sig": false, "md5_digest": "c4e7fd90137b599fd754c69bc1c02419", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13253, "upload_time": "2018-06-09T16:14:36", "url": "https://files.pythonhosted.org/packages/0f/34/06b8491c4784c51d88d6aabb16c09eb2c1d665683003a1d8c8dc1e0bd12a/edapy-0.2.3.tar.gz" } ] }