{
"info": {
"author": "Mihaela Mares",
"author_email": "mihaela.andreea.mares@gmail.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 2",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6",
"Topic :: Scientific/Engineering :: Artificial Intelligence"
],
"description": "\n-----------------\n\n# feature_stuff: a python machine learning library for advanced feature extraction, processing and interpretation.\n\n
\n\n\n\n## What is it\n\n**feature_stuff** is a Python package providing fast and flexible algorithms and functions\nfor extracting, processing and interpreting features:\n\n**Numeric feature extraction**\n\n\n | feature_stuff.add_interactions | \n \n generic function for adding interaction features to a data frame either by passing them as a list or\n by passing a boosted trees model to extract the interactions from.\n | \n
\n\n | feature_stuff.target_encoding | \n \n\t\ttarget encoding of a feature column using exponential prior smoothing or mean prior smoothing\n | \n
\n\n | feature_stuff.cv_target_encoding | \n \n target encoding of a feature column taking cross-validation folds as input\n | \n
\n\n | feature_stuff.add_knn_values | \n \n creates a new feature with the K-nearest-neighbours of the values of a given feature\n | \n
\n\n | feature_stuff.model_features_insights_extractions.add_group_values | \n \n generic and memory efficient enrichment of features dataframe with group values\n | \n
\n
\n\n**Model feature insights extraction**\n\n\n | get_xgboost_interactions | \n \n takes a trained xgboost model and returns a list of interactions between features, to the order of maximum\n depth of all trees.\n | \n
\n\n
\n\n\n## Installation\n\nBinary installers for the latest released version are available at the [Python\npackage index](https://pypi.org/project/feature-stuff) .\n\n```sh\n# or PyPI\npip install feature_stuff\n```\n\nThe source code is currently hosted on GitHub at:\nhttps://github.com/hiflyin/Feature-Stuff\n\n\n## Installation from sources\n\nIn the `Feature-Stuff` directory (same one where you found this file after\ncloning the git repo), execute:\n\n```sh\npython setup.py install\n```\n\nor for installing in [development mode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs):\n\n```sh\npython setup.py develop\n```\n\nAlternatively, you can use `pip` if you want all the dependencies pulled\nin automatically (the `-e` option is for installing it in [development\nmode](https://pip.pypa.io/en/latest/reference/pip_install.html#editable-installs)):\n\n```sh\npip install -e .\n```\n\n## How to use it\n\nBelow are examples for some functions. See the attached API of each function/ algorithm, for a complete documentation.\n\n# feature_stuff.add_interactions\n\n Inputs:\n df: a pandas dataframe\n model: boosted trees model (currently xgboost supported only). Can be None in which case the interactions have\n to be provided\n interactions: list in which each element is a list of features/columns in df, default: None\n\n Output: df containing the group values added to it\n\n\nExample on extracting interactions from tree based models and adding\nthem as new features to your dataset.\n\n```python\nimport feature_stuff as fs\nimport pandas as pd\nimport xgboost as xgb\n\ndata = pd.DataFrame({\"x0\":[0,1,0,1], \"x1\":range(4), \"x2\":[1,0,1,0]})\nprint data\n x0 x1 x2\n0 0 0 1\n1 1 1 0\n2 0 2 1\n3 1 3 0\n\ntarget = data.x0 * data.x1 + data.x2*data.x1\nprint target.tolist()\n[0, 1, 2, 3]\n\nmodel = xgb.train({'max_depth': 4, \"seed\": 123}, xgb.DMatrix(data, label=target), num_boost_round=2)\nfs.addInteractions(data, model)\n\n# at least one of the interactions in target must have been discovered by xgboost\nprint data\n x0 x1 x2 inter_0\n0 0 0 1 0\n1 1 1 0 1\n2 0 2 1 0\n3 1 3 0 3\n\n# if we want to inspect the interactions extracted\nfrom feature_stuff import model_features_insights_extractions as insights\nprint insights.get_xgboost_interactions(model)\n[['x0', 'x1']]\n\n```\n\n# feature_stuff.target_encoding\n\n Inputs:\n df: a pandas dataframe containing the column for which to calculate target encoding (categ_col)\n ref_df: a pandas dataframe containing the column for which to calculate target encoding and the target (y_col)\n for example we might want to use train data as ref_df to encode test data\n categ_col: the name of the categorical column for which to calculate target encoding\n y_col: the name of the target column, or target variable to predict\n smoothing_func: the name of the function to be used for calculating the weights of the corresponding target\n value inside ref_df. Default: exponentialPriorSmoothing.\n aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col\n smoothing_prior_weight: a prior weight to put on each category. Default 1.\n\n Output: df containing a new column called containing the encodings of categ_col\n\nExample on extracting target encodings from categorical features and adding them as new features to your dataset.\n\n```\nimport feature_stuff as fs\nimport pandas as pd\n\ntrain_data = pd.DataFrame({\"x0\":[0,1,0,1]})\ntest_data = pd.DataFrame({\"x0\":[1, 0, 0, 1]})\ntarget = range(4)\n\ntrain_data = fs.target_encoding(train_data, train_data, \"x0\", target, smoothing_func=fs.exponentialPriorSmoothing,\n aggr_func=\"mean\", smoothing_prior_weight=1)\ntest_data = fs.target_encoding(test_data, train_data, \"x0\", target, smoothing_func=fs.exponentialPriorSmoothing,\n aggr_func=\"mean\", smoothing_prior_weight=1)\n\n#train data with target encoding of \"x0\"\nprint(train_data)\n x0 y_xx g_xx x0_bayes_mean\n0 0 0 0 1.134471\n1 1 1 0 1.865529\n2 0 2 0 1.134471\n3 1 3 0 1.865529\n\n#test data with target encoding of \"x0\"\nprint(test_data)\n x0 x0_bayes_mean\n0 1 1.865529\n1 0 1.134471\n2 0 1.134471\n3 1 1.865529\n\n\n```\n\n# feature_stuff.cv_target_encoding\n\n Inputs:\n df: a pandas dataframe containing the column for which to calculate target encoding (categ_col) and the target\n categ_cols: a list or array with the the names of the categorical columns for which to calculate target encoding\n y_col: a numpy array of the target variable to predict\n cv_folds: a list with fold pairs as tuples of numpy arrays for cross-val target encoding\n smoothing_func: the name of the function to be used for calculating the weights of the corresponding target\n value inside ref_df. Default: exponentialPriorSmoothing.\n aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col\n smoothing_prior_weight: a prior weight to put on each category. Default 1.\n verbosity: 0-none, 1-high_level, 2-detailed\n\n Output: df containing a new column called containing the encodings of categ_col\n\nSee feature_stuff.target_encoding example above.\n\n\n## Contributing to feature-stuff\n\nAll contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.\n\n\n\n",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/hiflyin/Advanced-Feature-Stuff-Lib",
"keywords": "machine_learning data_science AI ML feature_extraction",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "feature-stuff",
"package_url": "https://pypi.org/project/feature-stuff/",
"platform": "",
"project_url": "https://pypi.org/project/feature-stuff/",
"project_urls": {
"Homepage": "https://github.com/hiflyin/Advanced-Feature-Stuff-Lib"
},
"release_url": "https://pypi.org/project/feature-stuff/0.0.dev6/",
"requires_dist": [
"pandas (>=0.19.2)",
"numpy (>=1.12.1)",
"scikit-learn (>=0.18)",
"scipy (>=0.19.0)",
"xgboost (>=0.6)",
"check-manifest; extra == 'dev'",
"coverage; extra == 'test'"
],
"requires_python": "",
"summary": "Feature extraction, processing and interpretation algorithms and functions for machine learning and data science.",
"version": "0.0.dev6"
},
"last_serial": 4015469,
"releases": {
"0.0.dev0": [
{
"comment_text": "",
"digests": {
"md5": "c4d8dbd7bc58418a7f92ea8aa832b498",
"sha256": "45ef8986f7122b4d9383d6f38e6c55f42abd6a3ba08b3f46464cb70083b77a7d"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "c4d8dbd7bc58418a7f92ea8aa832b498",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 8004,
"upload_time": "2018-06-16T10:39:42",
"url": "https://files.pythonhosted.org/packages/3f/76/a3e4c330d0e89278b59cab470f867c25dc10bfa61734341d2b15977d9ff9/feature_stuff-0.0.dev0-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "7c9ec2ef3a7a5bb59ec6c668bd65c6e3",
"sha256": "04d2c701686a2cc2cd4072e508f14ed8ee193456c0dfb7adf66b622decbd5a50"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev0.tar.gz",
"has_sig": false,
"md5_digest": "7c9ec2ef3a7a5bb59ec6c668bd65c6e3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6181,
"upload_time": "2018-06-16T10:39:44",
"url": "https://files.pythonhosted.org/packages/5f/08/6f293e62ada273272f8aefd02c81963282a803b7b8d137c97da459b047fb/feature_stuff-0.0.dev0.tar.gz"
}
],
"0.0.dev2": [
{
"comment_text": "",
"digests": {
"md5": "a8a037c6ac877e4da5e2576bb45e2a00",
"sha256": "22479593071fc7c977b3c9896ae5ff57c40d3b05940a54eee2c6fe5c0c2917ec"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev2-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "a8a037c6ac877e4da5e2576bb45e2a00",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 12001,
"upload_time": "2018-06-16T12:35:25",
"url": "https://files.pythonhosted.org/packages/29/45/3baa00148d4d74d1ac1ad4dda3c830548726c7892d6dff70dd1c20eacdd3/feature_stuff-0.0.dev2-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "efb3b55dc793c62889cfc889a26820e4",
"sha256": "754a4bea67cce08abd11361d6a1f3bf36628ab45b96e38e861a62060a5f0fc43"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev2.tar.gz",
"has_sig": false,
"md5_digest": "efb3b55dc793c62889cfc889a26820e4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7064,
"upload_time": "2018-06-16T12:35:26",
"url": "https://files.pythonhosted.org/packages/9e/48/1c9e3c2b2cb9a939ceb9d3a78e9bd28e2bd8ef19ce1257b7b5823a55974e/feature_stuff-0.0.dev2.tar.gz"
}
],
"0.0.dev3": [
{
"comment_text": "",
"digests": {
"md5": "9dbe6e834f36d2e635759101ee84d1d8",
"sha256": "0cf72fbe548bb189dccb60289fc135317bb8b0b52b9adb1e4d42fa25c4095f35"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev3-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "9dbe6e834f36d2e635759101ee84d1d8",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 11995,
"upload_time": "2018-06-16T12:41:08",
"url": "https://files.pythonhosted.org/packages/6e/57/f4c2695b889c9804e411d78d87d72365be745583934e7125edb4c06b4f42/feature_stuff-0.0.dev3-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "ec2fd9c74198aca40a7fcedd603607e0",
"sha256": "d5034b194aedc6ac8522c86f5ea7b8865db6e13a1a37d50c2acefbb2ce95818d"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev3.tar.gz",
"has_sig": false,
"md5_digest": "ec2fd9c74198aca40a7fcedd603607e0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7051,
"upload_time": "2018-06-16T12:41:09",
"url": "https://files.pythonhosted.org/packages/8a/7c/bdb8a01a9be656c7db27813c7e0bf54d7bba2f752947dfd079a3c3417518/feature_stuff-0.0.dev3.tar.gz"
}
],
"0.0.dev4": [
{
"comment_text": "",
"digests": {
"md5": "f1285bd50ea7ef58afdec3db3e6ed185",
"sha256": "7dc58237206c0b798df1a02e66a0128ad7e5741127757a56255f7246a0942c25"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev4-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1285bd50ea7ef58afdec3db3e6ed185",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 12003,
"upload_time": "2018-06-16T13:06:04",
"url": "https://files.pythonhosted.org/packages/37/22/3eb70c545f656f890f9cf31f8c760a2191a650ad18a3e16495fc1c3dc082/feature_stuff-0.0.dev4-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "fc5800c46ded33f3f5aa64cc8756db41",
"sha256": "e64b280ca3f1318c086c8073f65ee0dd07b87cb93e117cb474f4a6fc61b97963"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev4.tar.gz",
"has_sig": false,
"md5_digest": "fc5800c46ded33f3f5aa64cc8756db41",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7057,
"upload_time": "2018-06-16T13:06:05",
"url": "https://files.pythonhosted.org/packages/c5/a4/22b5a38d22fb25678b56691ba520ed00931752ae26b088a0e96b93c1adbf/feature_stuff-0.0.dev4.tar.gz"
}
],
"0.0.dev5": [
{
"comment_text": "",
"digests": {
"md5": "379474c0a948a0534d6bb055d41de22a",
"sha256": "def5309ed0907cf8aff09a380362baedef313fafb9f77141c394daf16cf22c4f"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev5-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "379474c0a948a0534d6bb055d41de22a",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 16753,
"upload_time": "2018-06-20T22:40:14",
"url": "https://files.pythonhosted.org/packages/ef/27/f01d3c33e4046c2b513b38cf38746a78f7a5391df90df99182c0a7161053/feature_stuff-0.0.dev5-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "4dda635b20d46792994c974c94cb0c36",
"sha256": "00b12bd8ac03b93734e1dfbf94d35ddc88b394ea476b7bdde9d6d78e9278be3e"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev5.tar.gz",
"has_sig": false,
"md5_digest": "4dda635b20d46792994c974c94cb0c36",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9032,
"upload_time": "2018-06-20T22:40:15",
"url": "https://files.pythonhosted.org/packages/50/b6/fb70b4ab2eb976f228ec455ccc04225909ac8e4d61dcf7357f4534abddc8/feature_stuff-0.0.dev5.tar.gz"
}
],
"0.0.dev6": [
{
"comment_text": "",
"digests": {
"md5": "42142d7ecee5303a0a627d8a53f02e92",
"sha256": "2d5e78c0b7bbc09f0e8083172812e41029b51e6afe13f0e67c118e68e69ec25e"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev6-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "42142d7ecee5303a0a627d8a53f02e92",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 18256,
"upload_time": "2018-06-29T14:09:14",
"url": "https://files.pythonhosted.org/packages/fd/7d/69b3d1487c7fb8bdaa58a471d1fd4db3b9e95d01a2a39caaabf151d71d4e/feature_stuff-0.0.dev6-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "07d62fbe8aeb7a8a53cd810bd014155b",
"sha256": "55bc052c2080035e4cc2bda2b76a8601594a76337ffb1501e748f4bd7cfda0f0"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev6.tar.gz",
"has_sig": false,
"md5_digest": "07d62fbe8aeb7a8a53cd810bd014155b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10227,
"upload_time": "2018-06-29T14:09:15",
"url": "https://files.pythonhosted.org/packages/36/70/9bf7590e0bcabca8ca98d1077c58f4f1a97833850561e86ce6c093076e55/feature_stuff-0.0.dev6.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "42142d7ecee5303a0a627d8a53f02e92",
"sha256": "2d5e78c0b7bbc09f0e8083172812e41029b51e6afe13f0e67c118e68e69ec25e"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev6-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "42142d7ecee5303a0a627d8a53f02e92",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 18256,
"upload_time": "2018-06-29T14:09:14",
"url": "https://files.pythonhosted.org/packages/fd/7d/69b3d1487c7fb8bdaa58a471d1fd4db3b9e95d01a2a39caaabf151d71d4e/feature_stuff-0.0.dev6-py2.py3-none-any.whl"
},
{
"comment_text": "",
"digests": {
"md5": "07d62fbe8aeb7a8a53cd810bd014155b",
"sha256": "55bc052c2080035e4cc2bda2b76a8601594a76337ffb1501e748f4bd7cfda0f0"
},
"downloads": -1,
"filename": "feature_stuff-0.0.dev6.tar.gz",
"has_sig": false,
"md5_digest": "07d62fbe8aeb7a8a53cd810bd014155b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10227,
"upload_time": "2018-06-29T14:09:15",
"url": "https://files.pythonhosted.org/packages/36/70/9bf7590e0bcabca8ca98d1077c58f4f1a97833850561e86ce6c093076e55/feature_stuff-0.0.dev6.tar.gz"
}
]
}