{ "info": { "author": "Ahmet Erdem", "author_email": "ahmeterd4@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# LOFO Importance\nLOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.\n\nLOFO first evaluates the performance of the model with all the input features included, then iteratively removes one feature at a time, retrains the model, and evaluates its performance on a validation set. The mean and standard deviation (across the folds) of the importance of each feature is then reported.\n\nIf a model is not passed as an argument to LOFO Importance, it will run LightGBM as a default model.\n\n## Install\nLOFO Importance can be installed using\n```\npip install lofo-importance\n```\n\n## Advantages of LOFO Importance \nLOFO has several advantages compared to other importance types:\n* It does not favor granular features\n* It generalises well to unseen test sets\n* It is model agnostic\n* It gives negative importance to features that hurt performance upon inclusion\n\n## Example on Kaggle's Microsoft Malware Prediction Competition\nIn this Kaggle competition, Microsoft provides a malware dataset to predict whether or not a machine will soon be hit with malware. One of the features, Centos_OSVersion is very predictive on the training set, since some OS versions are probably more prone to bugs and failures than others. However, upon splitting the data out of time, we obtain validation sets with OS versions that have not occurred in the training set. Therefore, the model will not have learned the relationship between the target and this seasonal feature. By evaluating this feature's importance using other importance types, Centos_OSVersion seems to have high importance, because its importance was evaluated using only the training set. However, LOFO Importance depends on a validation scheme, so it will not only give this feature low importance, but even negative importance.\n\n```\nimport pandas as pd\nfrom sklearn.model_selection import KFold\nfrom lofo import LOFOImportance, Dataset, plot_importance\n%matplotlib inline\n\n# import data\ntrain_df = pd.read_csv(\"../input/train.csv\", dtype=dtypes)\n\n# extract a sample of the data\nsample_df = train_df.sample(frac=0.01, random_state=0)\nsample_df.sort_values(\"AvSigVersion\", inplace=True)\n\n# define the validation scheme\ncv = KFold(n_splits=4, shuffle=False, random_state=0)\n\n# define the binary target and the features\ndataset = Dataset(df=sample_df, target=\"HasDetections\", features=[col for col in train_df.columns if col != target])\n\n# define the validation scheme and scorer. The default model is LightGBM\nlofo_imp = LOFOImportance(dataset, cv=cv, scoring=\"roc_auc\")\n\n# get the mean and standard deviation of the importances in pandas format\nimportance_df = lofo_imp.get_importance()\n\n# plot the means and standard deviations of the importances\nplot_importance(importance_df, figsize=(12, 20))\n```\n![alt text](docs/plot_importance.png?raw=true \"Title\")\n\n\n## Flofo Importance\n\nIf running the LOFO Importance package is too time-costly for you, you can use Fast LOFO. Fast LOFO, or FLOFO takes, as inputs, an already trained model and a validation set, and does a pseudo-random permutation on the values of each feature, one by one, then uses the trained model to make predictions on the validation set. The mean of the FLOFO importance is then the difference in the performance of the model on the validation set over several randomised permutations.\nThe difference between FLOFO importance and permutation importance is that the permutations on a feature's values are done within groups, where groups are obtained by grouping the validation set by k=2 features. These k features are chosen at random n=10 times, and the mean and standard deviation of the FLOFO importance are calculated based on these n runs.\nThe reason this grouping makes the measure of importance better is that permuting a feature's value is no longer completely random. In fact, the permutations are done within groups of similar samples, so the permutations are equivalent to noising the samples. This ensures that:\n* The permuted feature values are very unlikely to be replaced by unrealistic values.\n* A feature that is predictable by features among the chosen n*k features will be replaced by very similar values during permutation. Therefore, it will only slightly affect the model performance (and will yield a small FLOFO importance). This solves the correlated feature overestimation problem.\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "lofo-importance", "package_url": "https://pypi.org/project/lofo-importance/", "platform": "", "project_url": "https://pypi.org/project/lofo-importance/", "project_urls": null, "release_url": "https://pypi.org/project/lofo-importance/0.2.4/", "requires_dist": [ "numpy", "pandas", "scipy", "scikit-learn", "tqdm", "jupyter", "ipywidgets", "lightgbm", "matplotlib", "pytest" ], "requires_python": "", "summary": "Leave One Feature Out Importance", "version": "0.2.4" }, "last_serial": 5995618, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "d608db886e3413f6030a5f6da3e8063d", "sha256": "fa0a3d72bc0fdbfb8c294a0e6b0f3f5e2fdb1dba14a7eba529588e8319435319" }, "downloads": -1, "filename": "lofo_importance-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "d608db886e3413f6030a5f6da3e8063d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5417, "upload_time": "2019-02-18T19:33:01", "url": "https://files.pythonhosted.org/packages/87/27/95495aa7a6ba75cb76793f435f24b5146259d23dcd858d6e5896a3817e4b/lofo_importance-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f65264c9986703626b039510f2dcf6e2", "sha256": "9315d0b729af0f875881e70e52434adf5b085df5e703d3a9eba9bb5e6380107d" }, "downloads": -1, "filename": "lofo-importance-0.1.0.tar.gz", "has_sig": false, "md5_digest": "f65264c9986703626b039510f2dcf6e2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4002, "upload_time": "2019-02-18T19:33:03", "url": "https://files.pythonhosted.org/packages/c5/75/033dec4d7ff04e4e2014e8bfed0720bbc97b5b68be87b91cc04f930d5143/lofo-importance-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "c8f55557497a50f7835d0334ded8a93d", "sha256": "a493687f9ec97d9eb18a07e2ac7e50b6ccca31411a0f50711538fdfda60f03a0" }, "downloads": -1, "filename": "lofo_importance-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "c8f55557497a50f7835d0334ded8a93d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5403, "upload_time": "2019-02-18T19:53:41", "url": "https://files.pythonhosted.org/packages/cd/d8/5f158e1bd9136b53980d3b8eb3e85fc00c2f850e0f03661662352d018e29/lofo_importance-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "15c5f7c58701cc35f9bc6eab21053f7b", "sha256": "1903dc186907795d38ca07328c4e7190c7356a65c928fb9280b084548227db29" }, "downloads": -1, "filename": "lofo-importance-0.1.1.tar.gz", "has_sig": false, "md5_digest": "15c5f7c58701cc35f9bc6eab21053f7b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3983, "upload_time": "2019-02-18T19:53:43", "url": "https://files.pythonhosted.org/packages/ad/fa/4a1e007dec68e722ca135922dffb1d87e83e5ed87ffd47f5a9c6ab9396e0/lofo-importance-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "5a4b54a4bc56d93ceabacf9cfa924c5d", "sha256": "57c61bb4c06cba0e3163b00accb6e19a420a7faa60510b128311c0790a2be12c" }, "downloads": -1, "filename": "lofo_importance-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5a4b54a4bc56d93ceabacf9cfa924c5d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7040, "upload_time": "2019-03-17T15:05:42", "url": "https://files.pythonhosted.org/packages/43/8e/c43fb5e6a56e6e028b24c93bee6282726534abeb42c1cd12d41fecd5443b/lofo_importance-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a91a91dc2756f42ffbc491f27d84a17f", "sha256": "8d481f7f00201a79751e4c6ecaf6e3b06074ed97dcfa2bc3079aeae1b437517c" }, "downloads": -1, "filename": "lofo-importance-0.2.0.tar.gz", "has_sig": false, "md5_digest": "a91a91dc2756f42ffbc491f27d84a17f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4679, "upload_time": "2019-03-17T15:05:44", "url": "https://files.pythonhosted.org/packages/bb/8f/5ca828fe6020fb95d24a9fdedb0aae9455926becbd3f4387993179fe71e2/lofo-importance-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "fe579a0cf7961db2112a642837577b74", "sha256": "a240bede2db4dd76c0f795cd1ba3753d2b7414cb049e1ed2b3c8918ac6d999d4" }, "downloads": -1, "filename": "lofo_importance-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "fe579a0cf7961db2112a642837577b74", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8346, "upload_time": "2019-07-29T13:07:05", "url": "https://files.pythonhosted.org/packages/4d/4c/2a1580fdb8b21c3e3fe71f7c4da1db5707ac1b6737959b29807089ecb84b/lofo_importance-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "67cc92ceba9f6e95e3f990810521ba4c", "sha256": "06ab70af2473855b073e64718972c231660bfd99752d573463811aca5c1fdc77" }, "downloads": -1, "filename": "lofo-importance-0.2.1.tar.gz", "has_sig": false, "md5_digest": "67cc92ceba9f6e95e3f990810521ba4c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5823, "upload_time": "2019-07-29T13:07:07", "url": "https://files.pythonhosted.org/packages/40/e6/06f405f6d50764cd4fc6ae6feda766abcfcdbe527fd2435ba0804837ccf5/lofo-importance-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "5c538c3ec92ea57250e80f1fd6659799", "sha256": "ff43f239326f19b2cbe80cd08a039759a4cf13b8dc6a4208dd4ce14caa0e5272" }, "downloads": -1, "filename": "lofo_importance-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "5c538c3ec92ea57250e80f1fd6659799", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9311, "upload_time": "2019-08-11T14:04:28", "url": "https://files.pythonhosted.org/packages/b6/7e/54b5706c350eb2f9ce4c06710bd8fdf31d12828d1c416b3de57cb9351019/lofo_importance-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ff737d1cba2c9376692e2d65e58d9f32", "sha256": "89ada4f95e5ad049ab3f0df0d835b468c7b2e161f26afe6ce09d25579cacee22" }, "downloads": -1, "filename": "lofo-importance-0.2.2.tar.gz", "has_sig": false, "md5_digest": "ff737d1cba2c9376692e2d65e58d9f32", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6405, "upload_time": "2019-08-11T14:04:30", "url": "https://files.pythonhosted.org/packages/f3/70/f20707cf4c57ee791f5bb9f5a15f7686bf7e00dca128eef6629b8d030ef0/lofo-importance-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "54ba4d60344508c0e3bb7a77dd5b7b41", "sha256": "9f1d950e61ff5d3a54b1d5c191f99a4316d3c1c031ae45af295fe0c404c3560a" }, "downloads": -1, "filename": "lofo_importance-0.2.3-py3-none-any.whl", "has_sig": false, "md5_digest": "54ba4d60344508c0e3bb7a77dd5b7b41", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9324, "upload_time": "2019-09-15T11:42:30", "url": "https://files.pythonhosted.org/packages/f7/f5/4f02b631d839172e5ce200a75d10f2bea649fc7c7dfce9dbc41974734990/lofo_importance-0.2.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9a8ac4c651267f3227c25b0ecc24c0e1", "sha256": "aea6ada9c2971c4234fe96137a1ad590091de3a5a8a46e86e80f33709fccda67" }, "downloads": -1, "filename": "lofo-importance-0.2.3.tar.gz", "has_sig": false, "md5_digest": "9a8ac4c651267f3227c25b0ecc24c0e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6432, "upload_time": "2019-09-15T11:42:31", "url": "https://files.pythonhosted.org/packages/ae/2e/f69b0394e19e0991a9232613cc326804ad82486c2dd276ea47f238120aa3/lofo-importance-0.2.3.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "aa4cbef75cc286525eb69d4bc870819d", "sha256": "9954f1f93da9b75d34cbb076509bd74113da2648a0070f742bed103e77a78148" }, "downloads": -1, "filename": "lofo_importance-0.2.4-py3-none-any.whl", "has_sig": false, "md5_digest": "aa4cbef75cc286525eb69d4bc870819d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9334, "upload_time": "2019-10-18T13:21:00", "url": "https://files.pythonhosted.org/packages/fd/9e/25b8591cc3148c84f3d3883c202ee4769faeab08c11c72bca92ec10cc9d7/lofo_importance-0.2.4-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "aa4cbef75cc286525eb69d4bc870819d", "sha256": "9954f1f93da9b75d34cbb076509bd74113da2648a0070f742bed103e77a78148" }, "downloads": -1, "filename": "lofo_importance-0.2.4-py3-none-any.whl", "has_sig": false, "md5_digest": "aa4cbef75cc286525eb69d4bc870819d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9334, "upload_time": "2019-10-18T13:21:00", "url": "https://files.pythonhosted.org/packages/fd/9e/25b8591cc3148c84f3d3883c202ee4769faeab08c11c72bca92ec10cc9d7/lofo_importance-0.2.4-py3-none-any.whl" } ] }