{ "info": { "author": "Brendan Herger & Jordan Osborn", "author_email": "jordan@osborn.dev", "bugtrack_url": null, "classifiers": [], "description": "# tf2-keras-pandas\n[![Documentation Status](https://readthedocs.org/projects/keras-pandas/badge/?version=latest)](https://keras-pandas.readthedocs.io/en/latest/?badge=latest)\n\n**tl;dr:** keras-pandas allows users to rapidly build and iterate on deep learning models. Updated for tensorflow 2.0\n\nGetting data formatted and into keras can be tedious, time consuming, and require domain expertise, whether your a \nveteran or new to Deep Learning. `keras-pandas` overcomes these issues by (automatically) providing:\n\n - **Data transformations**: A cleaned, transformed and correctly formatted `X` and `y` (good for keras, sklearn or any \n other ML \n platform)\n - **Data piping**: Correctly formatted keras input, hidden and output layers to quickly start iterating on \n\nThese approaches are build on best in world approaches from practitioners, kaggle grand masters, papers, blog posts, \nand coffee chats, to simple entry point into the world of deep learning, and a strong foundation for deep learning \nexperts. \n\nFor more info, check out the:\n\n - [Code](https://github.com/jordanosborn/tf2-keras-pandas)\n - [Documentation](http://keras-pandas.readthedocs.io/en/latest/intro.html)\n - [PyPi](https://pypi.org/project/tf2-keras-pandas/)\n - [Issue tracker](https://github.com/bjherger/keras-pandas/issues)\n\n## Quick Start\n\nLet's build a model with the [lending club data set](https://www.lendingclub.com/info/download-data.action). This data set is \nparticularly fun because this data set contains a mix of text, categorical and numerical data types, and features a \nlot of null values. \n\n```bash\npip install --upgrade tf2-keras-pandas\n```\n\n```python\nfrom tensorflow.keras import Model\nfrom keras_pandas import lib\nfrom keras_pandas.Automater import Automater\nfrom sklearn.model_selection import train_test_split\n\n# Load data\nobservations = lib.load_lending_club()\n\n# Train /test split\ntrain_observations, test_observations = train_test_split(observations)\ntrain_observations = train_observations.copy()\ntest_observations = test_observations.copy()\n\n# List out variable types\n\ndata_type_dict = {'numerical': ['loan_amnt', 'annual_inc', 'open_acc', 'dti', 'delinq_2yrs',\n 'inq_last_6mths', 'mths_since_last_delinq', 'pub_rec', 'revol_bal',\n 'revol_util',\n 'total_acc', 'pub_rec_bankruptcies'],\n 'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',\n 'application_type', 'disbursement_method'],\n 'text': ['desc', 'purpose', 'title']}\noutput_var = 'loan_status'\n\n# Create and fit Automater\nauto = Automater(data_type_dict=data_type_dict, output_var=output_var)\nauto.fit(train_observations)\n\n# Transform data\ntrain_X, train_y = auto.fit_transform(train_observations)\ntest_X, test_y = auto.transform(test_observations)\n\n# Create and fit keras (deep learning) model.\nx = auto.input_nub\nx = auto.output_nub(x)\n\nmodel = Model(inputs=auto.input_layers, outputs=x)\nmodel.compile(optimizer='adam', loss=auto.suggest_loss())\n```\n\nAnd that's it! In a couple of lines, we've created a model that accepts a few dozen variables, and can create a world\n class deep learning model\n\n## Usage\n\n### Installation\n\nYou can install `tf2-keras-pandas` with `pip`:\n\n```bash\npip install -U tf2-keras-pandas\n```\n\n### Creating an Automater\n\nThe `Automater` object is the central object in `keras-pandas`. It accepts a dictionary of the format `{'datatype': \n['var1', var2']}`\n\nFor example we could create an automater using the built in `numerical`, `categorical`, and `text` datatypes, by \ncalling: \n\n```python\n# List out variable types\ndata_type_dict = {'numerical': ['loan_amnt', 'annual_inc', 'open_acc', 'dti', 'delinq_2yrs',\n 'inq_last_6mths', 'mths_since_last_delinq', 'pub_rec', 'revol_bal',\n 'revol_util',\n 'total_acc', 'pub_rec_bankruptcies'],\n 'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',\n 'application_type', 'disbursement_method'],\n 'text': ['desc', 'purpose', 'title']}\noutput_var = 'loan_status'\n\n# Create and fit Automater\nauto = Automater(data_type_dict=data_type_dict, output_var=output_var)\n```\n\nAs a side note, the response variable must be in one of the variable type lists (e.g. `loan_status` is in `categorical_vars`)\n\n#### One variable type\n\nIf you only have one variable type, only use one variable type!\n\n```python\n# List out variable types\ndata_type_dict = {'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',\n 'application_type', 'disbursement_method']}\noutput_var = 'loan_status'\n\n# Create and fit Automater\nauto = Automater(data_type_dict=data_type_dict, output_var=output_var)\n```\n\n#### Multiple variable types\n\nIf you have multiple variable types, feel free to use all of them! Built in datatypes are listed in `Automater.datatype_handlers`\n\n```python\n# List out variable types\ndata_type_dict = {'numerical': ['loan_amnt', 'annual_inc', 'open_acc', 'dti', 'delinq_2yrs',\n 'inq_last_6mths', 'mths_since_last_delinq', 'pub_rec', 'revol_bal',\n 'revol_util',\n 'total_acc', 'pub_rec_bankruptcies'],\n 'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',\n 'application_type', 'disbursement_method'],\n 'text': ['desc', 'purpose', 'title']}\noutput_var = 'loan_status'\n\n# Create and fit Automater\nauto = Automater(data_type_dict=data_type_dict, output_var=output_var)\n```\n\n#### Custom datatypes\n\nIf there's a specific datatype you'd like to use that's not built in (such as images, videos, or geospatial), you can \ninclude it by using `Automater`'s `datatype_handlers` parameter. \n\nA template datatype can be found in `keras_pandas/data_types/Abstract.py`. Filling out this template will yield a new\n datatype handler. If you're happy with your work and want to share your new datatype handler, create a PR (and check\n out `contributing.md`)\n\n#### No `output_var`\n\nIf your model doesn't need a response var, or your use case doesn't use `keras-pandas`'s output functionality, you \ncan skip the `output_var` by setting it to None\n\n```python\n# List out variable types\ndata_type_dict = {'categorical': ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'addr_state',\n 'application_type', 'disbursement_method']}\noutput_var = None\n\n# Create and fit Automater\nauto = Automater(data_type_dict=data_type_dict, output_var=output_var)\n```\n\n### Fitting the Automater\n\nBefore use, the `Automator` must be fit. The `fit()` method accepts a pandas DataFrame, which must contain all of the \ncolumns listed during initialization.\n\n```python\nauto.fit(observations)\n```\n\n### Transforming data\n\nNow, we can use our `Automater` to transform the dataset, from a pandas DataFrame to numpy objects properly formatted\nfor Keras's input and output layers. \n\n```python\nX, y = auto.transform(observations, df_out=False)\n```\n\nThis will return two objects:\n\n - `X`: An array, containing numpy object for each Keras input. This is generally one Keras input for each user \n input variable. \n - `y`: A numpy object, containing the response variable (if one was provided) \n\n### Using input / output nubs\n\nSetting up correctly formatted, heuristically 'good' input and output layers is often\n\n - Tedious\n - Time consuming\n - Difficult for those new to Keras\n\nWith this in mind, `keras-pandas` provides correctly formatted input and output 'nubs'. \n\nThe input nub is correctly formatted to accept the output from `auto.transform()`. It contains one Keras Input layer \nfor each generated input, may contain addition layers, and has all input piplines joined with a `Concatenate` layer. \n\nThe output layer is correctly formatted to accept the response variable numpy object. \n\n## Changelog\n\n - PR title (#PR number, or #Issue if no PR)\n - There's nothing here! (yet)\n\n### Development\n\n - Updated README and setup.py links (No PR)\n\n### 3.1.0\n\n - Add boolean datatype (#104)\n - Added Contributing.md section for new datatypes (#101)\n - Added datatypes to docs in index.rst (#101)\n - Modified documentation to automatically generate API docs (#101)\n\n\n### 3.0.1\n\n - Changing CI to Circleci (#100)\n - Adding datatypes to CONTRIBUTING.md, adding CONTRIBUTING.md to docs (#96)\n - Adding docs badge (#95)\n - Adding support for unusual variable names / format keras names to be valid in name scope (#92)\n - Adding examples (#93)\n - Upgraded `requests` library to `requests==2.20.1`, based on security concern (#94)\n\n\n### 3.0.0\n\nBrand new release, with\n\nAdded\n\n - New `Datatype` interface, with easier to understand pipelines for each datatype\n - All existing datatypes (`Numerical`, `Categorical`, `Text` & `TimeSeries`) re-implmented in this new format\n - Support for custom data types generated by users\n - Duck-typing helper method (`keras_pandas/lib.check_valid_datatype()`) to confirm that a datatype has valid \n signature\n - New testing, streamlined and standardized\n - Support for transforming unseen categorical levels, via the `UNK` token (experimental)\n\nModified\n\n - Updated `Automater` interface, which accepts a dictionary of data types\n - Heavily updated README\n - More consistent logging and data formatting for sample data sets\n\nRemoved\n\n - Removed examples, will be re-implemented in future release\n - All existing unittests\n - Bulk of new datatypes in `contributing.md`, will be re-added in future release\n\n### 2.2.0\n\n - Add timeseries support (#78)\n - Add timeseries examples (#79)\n\n### 2.1.0\n\n - Boolean support deprecated. Boolean (bool) data type can be treated as a special case of categorical data types\n\n### 2.0.2\n\n - Remove a lot of the unnecessary dependencies (#75)\n - Update dependencies to contemporary versions (#74)\n\n### 2.0.1\n\n - Fix issue w/ PyPi conflict\n\n### 2.0.0\n\n - Adding CI/CD and PyPi links, and updating contact section w/ about the author (#70)\n - Major rewrite / update of examples (#72)\n - Fixes bug in embedding transformer. Embeddings will now be at least length 1. \n - Add functionality to check if `resp_var` is in the list of user provided variables\n - Added better null filling w/ `CategoricalImputer`\n - Added filling unseen values w/ `CategoricalImputer`\n - Converted default transformer pipeline to use `copy.deepcopy` instead of `copy.copy`. This was a hotfix for a \n previously unknown issue. \n - Standardizing setting logging level, only in test base class and examples (when `__main__`)\n\n\n### 1.3.5\n\n - Adding regression example w/ inverse_transformation (#64)\n - Fixing issue where web socket connections were being opened needlessly (#65)\n\n### 1.3.4\n\n - Adding `Manifest.in`, with including files references in `setup.py` (#54) \n\n### 1.3.2\n\n - Fixed poorly written text embedding index unit test (#52)\n - Added license (#49)\n\n### Earlier\n\n - Lots of things happened. Break things and move fast \n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jordanosborn/tf2-keras-pandas", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "tf2-keras-pandas", "package_url": "https://pypi.org/project/tf2-keras-pandas/", "platform": "", "project_url": "https://pypi.org/project/tf2-keras-pandas/", "project_urls": { "Code": "https://github.com/jordanosborn/tf2-keras-pandas", "Documentation": "http://keras-pandas.readthedocs.io/en/latest/intro.html", "Homepage": "https://github.com/jordanosborn/tf2-keras-pandas", "Issue tracker": "https://github.com/jordanosborn/tf2-keras-pandas/issues", "PyPi": "https://pypi.org/project/tf2-keras-pandas/" }, "release_url": "https://pypi.org/project/tf2-keras-pandas/3.1.8/", "requires_dist": [ "gensim (==3.8)", "h5py (==2.8.0)", "m2r (==0.2.1)", "numpy (==1.17)", "pandas (==0.25.1)", "requests (==2.22)", "scikit-learn (==0.21.3)", "sklearn-pandas (==1.8.0)", "tensorflow (==2.0.0)", "xlrd (==1.2.0)" ], "requires_python": "", "summary": "Easy and rapid deep learning - updated for tensorflow 2.0", "version": "3.1.8" }, "last_serial": 5914906, "releases": { "3.1.0": [ { "comment_text": "", "digests": { "md5": "af32d811721d450fea9f348003221787", "sha256": "5cc87218f05efa14dd2754825b5b99c491b07ba66e533ad69f55a7c395f2d9fa" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.0.tar.gz", "has_sig": false, "md5_digest": "af32d811721d450fea9f348003221787", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 35230, "upload_time": "2019-09-04T21:15:49", "url": "https://files.pythonhosted.org/packages/a8/b7/4011675f065f8da2921ff1c3df788a83dcf6145e51b94121d1a7df04e19b/tf2-keras-pandas-3.1.0.tar.gz" } ], "3.1.2": [ { "comment_text": "", "digests": { "md5": "761f11e12ce19e91123b6e43f2f0fa10", "sha256": "4ca839dff8e96796f23d11d843b483969e5b5e5b3b00ab25a07e655a069ec554" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.2.tar.gz", "has_sig": false, "md5_digest": "761f11e12ce19e91123b6e43f2f0fa10", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 35265, "upload_time": "2019-09-04T21:23:27", "url": "https://files.pythonhosted.org/packages/b2/2c/348d0ef55df8f39dcfd6a964ee7a23cf881f27f02f15e3fd65c989e869c3/tf2-keras-pandas-3.1.2.tar.gz" } ], "3.1.3": [ { "comment_text": "", "digests": { "md5": "e0593a61fbe6fc30aa60a5c41f7aadbf", "sha256": "a7d5d4484b1a52a2447dbcddcbc0aadd179afe1fcf6c3c818991ff754d43a416" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.3.tar.gz", "has_sig": false, "md5_digest": "e0593a61fbe6fc30aa60a5c41f7aadbf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34195, "upload_time": "2019-09-04T21:35:16", "url": "https://files.pythonhosted.org/packages/29/05/5d6c914ebf3ee5c8c6f7308e4dd6cb7810f841415a78eb6180b92709f7e3/tf2-keras-pandas-3.1.3.tar.gz" } ], "3.1.4": [ { "comment_text": "", "digests": { "md5": "c0dac74934ea487a108500f5b95e324b", "sha256": "32a06be91511c01a9cd00ee89eff4df2a05aced7cdaaf22e4f61ae2f0682b362" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.4.tar.gz", "has_sig": false, "md5_digest": "c0dac74934ea487a108500f5b95e324b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34199, "upload_time": "2019-09-04T21:37:48", "url": "https://files.pythonhosted.org/packages/b8/d1/9202b566eb40cfc33685e8305d1bd7c7963eea405249af1653c2dbb6b3b1/tf2-keras-pandas-3.1.4.tar.gz" } ], "3.1.5": [ { "comment_text": "", "digests": { "md5": "1f6fa0d5b6c87f487afda3736045f09b", "sha256": "551fb95d73904119f86ac4c54994f6b6c7b33c9fc69b75f738e28c1ecd6cc7fd" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.5.tar.gz", "has_sig": false, "md5_digest": "1f6fa0d5b6c87f487afda3736045f09b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34203, "upload_time": "2019-09-04T21:41:44", "url": "https://files.pythonhosted.org/packages/c5/02/0002ef24d4dc6fc543ce71cb61ec2407da6059bbbb4860b8003b89033636/tf2-keras-pandas-3.1.5.tar.gz" } ], "3.1.6": [ { "comment_text": "", "digests": { "md5": "b79ed0018340ad9986aad36259cb512a", "sha256": "5c44a1e67c9d8f306a6edb8c2d7b36c23aa9346c586f3e84ad150fda5c25402e" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.6.tar.gz", "has_sig": false, "md5_digest": "b79ed0018340ad9986aad36259cb512a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34215, "upload_time": "2019-09-04T21:46:18", "url": "https://files.pythonhosted.org/packages/b6/4f/b573b54bbbec0c12659a07363defdc831c571845d39ddc262dffc2490831/tf2-keras-pandas-3.1.6.tar.gz" } ], "3.1.7": [ { "comment_text": "", "digests": { "md5": "70e04b67cef90262e47bb41db07774b8", "sha256": "55263eb2b6400776f2974029cf02f0ab06013e5c34726cdad55ea8cab615470e" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.7.tar.gz", "has_sig": false, "md5_digest": "70e04b67cef90262e47bb41db07774b8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34227, "upload_time": "2019-09-04T21:55:17", "url": "https://files.pythonhosted.org/packages/13/f1/b994ba29e56833398e2d50394764a234c9aba193b52c9dc38b107c62224b/tf2-keras-pandas-3.1.7.tar.gz" } ], "3.1.8": [ { "comment_text": "", "digests": { "md5": "848c0ee985745d212d12b568c33e3df5", "sha256": "89c6141f976d5e444518365aa96d76c6d1299f3432fb4f5052ec1c7604f49fd4" }, "downloads": -1, "filename": "tf2_keras_pandas-3.1.8-py3-none-any.whl", "has_sig": false, "md5_digest": "848c0ee985745d212d12b568c33e3df5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 40796, "upload_time": "2019-10-01T21:10:30", "url": "https://files.pythonhosted.org/packages/7a/c1/de4ca198020a237758462d27131617451cdafa61a80a67daf467b4218e53/tf2_keras_pandas-3.1.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4f4c8eb66333b44e7ed05895a962689b", "sha256": "ef3d3bd8f9e5216d99f29de75d02c5829ca1a144b21a0b094f58c3a8b3bb4775" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.8.tar.gz", "has_sig": false, "md5_digest": "4f4c8eb66333b44e7ed05895a962689b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34228, "upload_time": "2019-10-01T21:10:32", "url": "https://files.pythonhosted.org/packages/5a/96/9bd8f7c9442a0149cb729f0bdedb50a3e47d20408ff5a7d0847c9faf6bcb/tf2-keras-pandas-3.1.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "848c0ee985745d212d12b568c33e3df5", "sha256": "89c6141f976d5e444518365aa96d76c6d1299f3432fb4f5052ec1c7604f49fd4" }, "downloads": -1, "filename": "tf2_keras_pandas-3.1.8-py3-none-any.whl", "has_sig": false, "md5_digest": "848c0ee985745d212d12b568c33e3df5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 40796, "upload_time": "2019-10-01T21:10:30", "url": "https://files.pythonhosted.org/packages/7a/c1/de4ca198020a237758462d27131617451cdafa61a80a67daf467b4218e53/tf2_keras_pandas-3.1.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4f4c8eb66333b44e7ed05895a962689b", "sha256": "ef3d3bd8f9e5216d99f29de75d02c5829ca1a144b21a0b094f58c3a8b3bb4775" }, "downloads": -1, "filename": "tf2-keras-pandas-3.1.8.tar.gz", "has_sig": false, "md5_digest": "4f4c8eb66333b44e7ed05895a962689b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 34228, "upload_time": "2019-10-01T21:10:32", "url": "https://files.pythonhosted.org/packages/5a/96/9bd8f7c9442a0149cb729f0bdedb50a3e47d20408ff5a7d0847c9faf6bcb/tf2-keras-pandas-3.1.8.tar.gz" } ] }