{ "info": { "author": "Mihaela Mares", "author_email": "mihaela.andreea.mares@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "\n-----------------\n\n# feature_stuff: a python machine learning library for advanced feature extraction, processing and interpretation.\n\n\n\n \n \n\n\n \n \n\n\n \n \n\n\n \n \n\n
Latest Release\n see on pypi.org\n
Package Status\n\t\tsee on pypi.org\n
License\n see on github\n
Build Status\n see on travis\n \n
\n\n\n\n## What is it\n\n**feature_stuff** is a Python package providing fast and flexible algorithms and functions\nfor extracting, processing and interpreting features:\n\n**Numeric feature extraction**\n\n\n \n \n\n\n \n \n\n\n \n \n\n\n \n \n\n\n \n \n\n
feature_stuff.add_interactions\n generic function for adding interaction features to a data frame either by passing them as a list or\n by passing a boosted trees model to extract the interactions from.\n
feature_stuff.target_encoding\n\t\ttarget encoding of a feature column using exponential prior smoothing or mean prior smoothing\n
feature_stuff.cv_target_encoding\n target encoding of a feature column taking cross-validation folds as input\n
feature_stuff.add_knn_values\n creates a new feature with the K-nearest-neighbours of the values of a given feature\n
feature_stuff.model_features_insights_extractions.add_group_values\n generic and memory efficient enrichment of features dataframe with group values\n
\n\n**Model feature insights extraction**\n\n\n \n \n\n\n
get_xgboost_interactions\n takes a trained xgboost model and returns a list of interactions between features, to the order of maximum\n depth of all trees.\n
\n\n\n## Installation\n\nBinary installers for the latest released version are available at the [Python\npackage index](https://pypi.org/project/feature-stuff) .\n\n```sh\n# or PyPI\npip install feature_stuff\n```\n\nThe source code is currently hosted on GitHub at:\nhttps://github.com/hiflyin/Feature-Stuff\n\n\n## Installation from sources\n\nIn the `Feature-Stuff` directory (same one where you found this file after\ncloning the git repo), execute:\n\n```sh\npython setup.py install\n```\n\nor for installing in [development mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs):\n\n```sh\npython setup.py develop\n```\n\nAlternatively, you can use `pip` if you want all the dependencies pulled\nin automatically (the `-e` option is for installing it in [development\nmode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs)):\n\n```sh\npip install -e .\n```\n\n## How to use it\n\nBelow are examples for some functions. See the attached API of each function/ algorithm, for a complete documentation.\n\n# feature_stuff.add_interactions\n\n Inputs:\n df: a pandas dataframe\n model: boosted trees model (currently xgboost supported only). Can be None in which case the interactions have\n to be provided\n interactions: list in which each element is a list of features/columns in df, default: None\n\n Output: df containing the group values added to it\n\n\nExample on extracting interactions from tree based models and adding\nthem as new features to your dataset.\n\n```python\nimport feature_stuff as fs\nimport pandas as pd\nimport xgboost as xgb\n\ndata = pd.DataFrame({\"x0\":[0,1,0,1], \"x1\":range(4), \"x2\":[1,0,1,0]})\nprint data\n x0 x1 x2\n0 0 0 1\n1 1 1 0\n2 0 2 1\n3 1 3 0\n\ntarget = data.x0 * data.x1 + data.x2*data.x1\nprint target.tolist()\n[0, 1, 2, 3]\n\nmodel = xgb.train({'max_depth': 4, \"seed\": 123}, xgb.DMatrix(data, label=target), num_boost_round=2)\nfs.addInteractions(data, model)\n\n# at least one of the interactions in target must have been discovered by xgboost\nprint data\n x0 x1 x2 inter_0\n0 0 0 1 0\n1 1 1 0 1\n2 0 2 1 0\n3 1 3 0 3\n\n# if we want to inspect the interactions extracted\nfrom feature_stuff import model_features_insights_extractions as insights\nprint insights.get_xgboost_interactions(model)\n[['x0', 'x1']]\n\n```\n\n# feature_stuff.target_encoding\n\n Inputs:\n df: a pandas dataframe containing the column for which to calculate target encoding (categ_col)\n ref_df: a pandas dataframe containing the column for which to calculate target encoding and the target (y_col)\n for example we might want to use train data as ref_df to encode test data\n categ_col: the name of the categorical column for which to calculate target encoding\n y_col: the name of the target column, or target variable to predict\n smoothing_func: the name of the function to be used for calculating the weights of the corresponding target\n value inside ref_df. Default: exponentialPriorSmoothing.\n aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col\n smoothing_prior_weight: a prior weight to put on each category. Default 1.\n\n Output: df containing a new column called containing the encodings of categ_col\n\nExample on extracting target encodings from categorical features and adding them as new features to your dataset.\n\n```\nimport feature_stuff as fs\nimport pandas as pd\n\ntrain_data = pd.DataFrame({\"x0\":[0,1,0,1]})\ntest_data = pd.DataFrame({\"x0\":[1, 0, 0, 1]})\ntarget = range(4)\n\ntrain_data = fs.target_encoding(train_data, train_data, \"x0\", target, smoothing_func=fs.exponentialPriorSmoothing,\n aggr_func=\"mean\", smoothing_prior_weight=1)\ntest_data = fs.target_encoding(test_data, train_data, \"x0\", target, smoothing_func=fs.exponentialPriorSmoothing,\n aggr_func=\"mean\", smoothing_prior_weight=1)\n\n#train data with target encoding of \"x0\"\nprint(train_data)\n x0 y_xx g_xx x0_bayes_mean\n0 0 0 0 1.134471\n1 1 1 0 1.865529\n2 0 2 0 1.134471\n3 1 3 0 1.865529\n\n#test data with target encoding of \"x0\"\nprint(test_data)\n x0 x0_bayes_mean\n0 1 1.865529\n1 0 1.134471\n2 0 1.134471\n3 1 1.865529\n\n\n```\n\n# feature_stuff.cv_target_encoding\n\n Inputs:\n df: a pandas dataframe containing the column for which to calculate target encoding (categ_col) and the target\n categ_cols: a list or array with the the names of the categorical columns for which to calculate target encoding\n y_col: a numpy array of the target variable to predict\n cv_folds: a list with fold pairs as tuples of numpy arrays for cross-val target encoding\n smoothing_func: the name of the function to be used for calculating the weights of the corresponding target\n value inside ref_df. Default: exponentialPriorSmoothing.\n aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col\n smoothing_prior_weight: a prior weight to put on each category. Default 1.\n verbosity: 0-none, 1-high_level, 2-detailed\n\n Output: df containing a new column called containing the encodings of categ_col\n\nSee feature_stuff.target_encoding example above.\n\n\n## Contributing to feature-stuff\n\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/hiflyin/Advanced-Feature-Stuff-Lib", "keywords": "machine_learning data_science AI ML feature_extraction", "license": "", "maintainer": "", "maintainer_email": "", "name": "feature-stuff", "package_url": "https://pypi.org/project/feature-stuff/", "platform": "", "project_url": "https://pypi.org/project/feature-stuff/", "project_urls": { "Homepage": "https://github.com/hiflyin/Advanced-Feature-Stuff-Lib" }, "release_url": "https://pypi.org/project/feature-stuff/0.0.dev6/", "requires_dist": [ "pandas (>=0.19.2)", "numpy (>=1.12.1)", "scikit-learn (>=0.18)", "scipy (>=0.19.0)", "xgboost (>=0.6)", "check-manifest; extra == 'dev'", "coverage; extra == 'test'" ], "requires_python": "", "summary": "Feature extraction, processing and interpretation algorithms and functions for machine learning and data science.", "version": "0.0.dev6" }, "last_serial": 4015469, "releases": { "0.0.dev0": [ { "comment_text": "", "digests": { "md5": "c4d8dbd7bc58418a7f92ea8aa832b498", "sha256": "45ef8986f7122b4d9383d6f38e6c55f42abd6a3ba08b3f46464cb70083b77a7d" }, "downloads": -1, "filename": "feature_stuff-0.0.dev0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c4d8dbd7bc58418a7f92ea8aa832b498", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 8004, "upload_time": "2018-06-16T10:39:42", "url": "https://files.pythonhosted.org/packages/3f/76/a3e4c330d0e89278b59cab470f867c25dc10bfa61734341d2b15977d9ff9/feature_stuff-0.0.dev0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7c9ec2ef3a7a5bb59ec6c668bd65c6e3", "sha256": "04d2c701686a2cc2cd4072e508f14ed8ee193456c0dfb7adf66b622decbd5a50" }, "downloads": -1, "filename": "feature_stuff-0.0.dev0.tar.gz", "has_sig": false, "md5_digest": "7c9ec2ef3a7a5bb59ec6c668bd65c6e3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6181, "upload_time": "2018-06-16T10:39:44", "url": "https://files.pythonhosted.org/packages/5f/08/6f293e62ada273272f8aefd02c81963282a803b7b8d137c97da459b047fb/feature_stuff-0.0.dev0.tar.gz" } ], "0.0.dev2": [ { "comment_text": "", "digests": { "md5": "a8a037c6ac877e4da5e2576bb45e2a00", "sha256": "22479593071fc7c977b3c9896ae5ff57c40d3b05940a54eee2c6fe5c0c2917ec" }, "downloads": -1, "filename": "feature_stuff-0.0.dev2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a8a037c6ac877e4da5e2576bb45e2a00", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12001, "upload_time": "2018-06-16T12:35:25", "url": "https://files.pythonhosted.org/packages/29/45/3baa00148d4d74d1ac1ad4dda3c830548726c7892d6dff70dd1c20eacdd3/feature_stuff-0.0.dev2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "efb3b55dc793c62889cfc889a26820e4", "sha256": "754a4bea67cce08abd11361d6a1f3bf36628ab45b96e38e861a62060a5f0fc43" }, "downloads": -1, "filename": "feature_stuff-0.0.dev2.tar.gz", "has_sig": false, "md5_digest": "efb3b55dc793c62889cfc889a26820e4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7064, "upload_time": "2018-06-16T12:35:26", "url": "https://files.pythonhosted.org/packages/9e/48/1c9e3c2b2cb9a939ceb9d3a78e9bd28e2bd8ef19ce1257b7b5823a55974e/feature_stuff-0.0.dev2.tar.gz" } ], "0.0.dev3": [ { "comment_text": "", "digests": { "md5": "9dbe6e834f36d2e635759101ee84d1d8", "sha256": "0cf72fbe548bb189dccb60289fc135317bb8b0b52b9adb1e4d42fa25c4095f35" }, "downloads": -1, "filename": "feature_stuff-0.0.dev3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9dbe6e834f36d2e635759101ee84d1d8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 11995, "upload_time": "2018-06-16T12:41:08", "url": "https://files.pythonhosted.org/packages/6e/57/f4c2695b889c9804e411d78d87d72365be745583934e7125edb4c06b4f42/feature_stuff-0.0.dev3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ec2fd9c74198aca40a7fcedd603607e0", "sha256": "d5034b194aedc6ac8522c86f5ea7b8865db6e13a1a37d50c2acefbb2ce95818d" }, "downloads": -1, "filename": "feature_stuff-0.0.dev3.tar.gz", "has_sig": false, "md5_digest": "ec2fd9c74198aca40a7fcedd603607e0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7051, "upload_time": "2018-06-16T12:41:09", "url": "https://files.pythonhosted.org/packages/8a/7c/bdb8a01a9be656c7db27813c7e0bf54d7bba2f752947dfd079a3c3417518/feature_stuff-0.0.dev3.tar.gz" } ], "0.0.dev4": [ { "comment_text": "", "digests": { "md5": "f1285bd50ea7ef58afdec3db3e6ed185", "sha256": "7dc58237206c0b798df1a02e66a0128ad7e5741127757a56255f7246a0942c25" }, "downloads": -1, "filename": "feature_stuff-0.0.dev4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "f1285bd50ea7ef58afdec3db3e6ed185", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12003, "upload_time": "2018-06-16T13:06:04", "url": "https://files.pythonhosted.org/packages/37/22/3eb70c545f656f890f9cf31f8c760a2191a650ad18a3e16495fc1c3dc082/feature_stuff-0.0.dev4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fc5800c46ded33f3f5aa64cc8756db41", "sha256": "e64b280ca3f1318c086c8073f65ee0dd07b87cb93e117cb474f4a6fc61b97963" }, "downloads": -1, "filename": "feature_stuff-0.0.dev4.tar.gz", "has_sig": false, "md5_digest": "fc5800c46ded33f3f5aa64cc8756db41", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7057, "upload_time": "2018-06-16T13:06:05", "url": "https://files.pythonhosted.org/packages/c5/a4/22b5a38d22fb25678b56691ba520ed00931752ae26b088a0e96b93c1adbf/feature_stuff-0.0.dev4.tar.gz" } ], "0.0.dev5": [ { "comment_text": "", "digests": { "md5": "379474c0a948a0534d6bb055d41de22a", "sha256": "def5309ed0907cf8aff09a380362baedef313fafb9f77141c394daf16cf22c4f" }, "downloads": -1, "filename": "feature_stuff-0.0.dev5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "379474c0a948a0534d6bb055d41de22a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 16753, "upload_time": "2018-06-20T22:40:14", "url": "https://files.pythonhosted.org/packages/ef/27/f01d3c33e4046c2b513b38cf38746a78f7a5391df90df99182c0a7161053/feature_stuff-0.0.dev5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4dda635b20d46792994c974c94cb0c36", "sha256": "00b12bd8ac03b93734e1dfbf94d35ddc88b394ea476b7bdde9d6d78e9278be3e" }, "downloads": -1, "filename": "feature_stuff-0.0.dev5.tar.gz", "has_sig": false, "md5_digest": "4dda635b20d46792994c974c94cb0c36", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9032, "upload_time": "2018-06-20T22:40:15", "url": "https://files.pythonhosted.org/packages/50/b6/fb70b4ab2eb976f228ec455ccc04225909ac8e4d61dcf7357f4534abddc8/feature_stuff-0.0.dev5.tar.gz" } ], "0.0.dev6": [ { "comment_text": "", "digests": { "md5": "42142d7ecee5303a0a627d8a53f02e92", "sha256": "2d5e78c0b7bbc09f0e8083172812e41029b51e6afe13f0e67c118e68e69ec25e" }, "downloads": -1, "filename": "feature_stuff-0.0.dev6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "42142d7ecee5303a0a627d8a53f02e92", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 18256, "upload_time": "2018-06-29T14:09:14", "url": "https://files.pythonhosted.org/packages/fd/7d/69b3d1487c7fb8bdaa58a471d1fd4db3b9e95d01a2a39caaabf151d71d4e/feature_stuff-0.0.dev6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "07d62fbe8aeb7a8a53cd810bd014155b", "sha256": "55bc052c2080035e4cc2bda2b76a8601594a76337ffb1501e748f4bd7cfda0f0" }, "downloads": -1, "filename": "feature_stuff-0.0.dev6.tar.gz", "has_sig": false, "md5_digest": "07d62fbe8aeb7a8a53cd810bd014155b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10227, "upload_time": "2018-06-29T14:09:15", "url": "https://files.pythonhosted.org/packages/36/70/9bf7590e0bcabca8ca98d1077c58f4f1a97833850561e86ce6c093076e55/feature_stuff-0.0.dev6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "42142d7ecee5303a0a627d8a53f02e92", "sha256": "2d5e78c0b7bbc09f0e8083172812e41029b51e6afe13f0e67c118e68e69ec25e" }, "downloads": -1, "filename": "feature_stuff-0.0.dev6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "42142d7ecee5303a0a627d8a53f02e92", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 18256, "upload_time": "2018-06-29T14:09:14", "url": "https://files.pythonhosted.org/packages/fd/7d/69b3d1487c7fb8bdaa58a471d1fd4db3b9e95d01a2a39caaabf151d71d4e/feature_stuff-0.0.dev6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "07d62fbe8aeb7a8a53cd810bd014155b", "sha256": "55bc052c2080035e4cc2bda2b76a8601594a76337ffb1501e748f4bd7cfda0f0" }, "downloads": -1, "filename": "feature_stuff-0.0.dev6.tar.gz", "has_sig": false, "md5_digest": "07d62fbe8aeb7a8a53cd810bd014155b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10227, "upload_time": "2018-06-29T14:09:15", "url": "https://files.pythonhosted.org/packages/36/70/9bf7590e0bcabca8ca98d1077c58f4f1a97833850561e86ce6c093076e55/feature_stuff-0.0.dev6.tar.gz" } ] }