{ "info": { "author": "Simon Larsson", "author_email": "simonlarsson0@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering" ], "description": "# extrakit-learn\n\n[![PyPI version](https://badge.fury.io/py/xklearn.svg)](https://pypi.python.org/pypi/xklearn/) \n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/simon-larsson/extrakit-learn/blob/master/LICENSE)\n\nMachine learnings components built to extend scikit-learn. All components use scikit's [object API](https://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects) to work interchangably with scikit components. It is mostly a collection of tools that have been useful for [Kaggle](https://www.kaggle.com) competitions. extrakit-learn is in no way affiliated with scikit-learn in anyway, just inspired by it.\n\n## Installation\n\n pip install xklearn\n\n## Components\n- [CategoryEncoder](https://github.com/simon-larsson/extrakit-learn#categoryencoder) - Like scikit's LabelEncoder but supports NaNs and unseen values.\n- [CountEncoder](https://github.com/simon-larsson/extrakit-learn#countencoder) - Categorical feature engineering on a column based on value counts.\n- [TargetEncoder](https://github.com/simon-larsson/extrakit-learn#targetencoder) - Categorical feature engineering on a column based on target means.\n- [MultiColumnEncoder](https://github.com/simon-larsson/extrakit-learn#multicolumnencoder) - Apply a column encoder to multiple columns.\n- [FoldEstimator](https://github.com/simon-larsson/extrakit-learn#foldestimator) - K-fold on scikit estimator wrapped into an estimator.\n- [FoldLightGBM](https://github.com/simon-larsson/extrakit-learn#foldlightgbm) - K-fold on LGBM wrapped into an estimator.\n- [FoldXGBoost](https://github.com/simon-larsson/extrakit-learn#foldxgboost) - K-fold on XGBoost wrapped into an estimator.\n- [StackClassifier](https://github.com/simon-larsson/extrakit-learn#stackclassifier) - Stack an ensemble of classifiers with a meta classifier.\n- [StackRegressor](https://github.com/simon-larsson/extrakit-learn#stackregressor) - Stack an ensemble of regressors with a meta regressor.\n- [compress_dataframe](https://github.com/simon-larsson/extrakit-learn#compress_dataframe) - Reduce memory of a Pandas dataframe.\n\n### Hierachy\n xklearn\n \u00e2\u201d\u201a\n \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac preprocessing\n \u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac CategoryEncoder\n \u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac CountEncoder\n \u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac TargetEncoder \n \u00e2\u201d\u201a \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac MultiColumnEncoder\n \u00e2\u201d\u201a\n \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac models\n \u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac FoldEstimator\n \u00e2\u201d\u201a \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac FoldLightGBM\n | \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac FoldXGBoost\n | \u00e2\u201d\u0153\u00e2\u201d\u20ac\u00e2\u201d\u20ac StackClassifier\n | \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac StackRegressor\n |\n \u00e2\u201d\u201d\u00e2\u201d\u20ac\u00e2\u201d\u20ac utils\n\n##### Example\n\n from xklearn.models import FoldEstimator\n\n### CategoryEncoder\nWraps scikit's LabelEncoder, allowing missing and unseen values to be handled.\n\n#### Arguments\n`unseen` - Strategy for handling unseen values. See replacement strategies below for options.\n\n`missing` - Strategy for handling missing values. See replacement strategies below for options.\n\n##### Replacement strategies\n\n`'encode'` - Replace value with -1.\n\n`'nan'` - Replace value with np.nan.\n\n`'error'` - Raise ValueError.\n\n#### Example\n```python\nfrom xklearn.preprocessing import CategoryEncoder\n...\n\nce = CategoryEncoder(unseen='nan', missing='nan')\nX[:, 0] = ce.fit_transform(X[:, 0])\n```\n\n### CountEncoder\nReplaces categorical values with their respective value count during training. Classes with a count of one and previously unseen classes during prediction are encoded as either one or NaN.\n\n#### Arguments\n`unseen` - Strategy for handling unseen values. See replacement strategies below for options.\n\n`missing` - Strategy for handling missing values. See replacement strategies below for options.\n\n##### Replacement strategies\n\n`'one'` - Replace value with 1.\n\n`'nan'` - Replace value with np.nan.\n\n`'error'` - Raise ValueError.\n\n#### Example\n```python\nfrom xklearn.preprocessing import CountEncoder\n...\n\nce = CountEncoder(unseen='one')\nX[:, 0] = ce.fit_transform(X[:, 0])\n```\n\n### TargetEncoder\nPerforms target mean encoding of categorical features with optional smoothing.\n\n#### Arguments\n`smoothing` - Smoothing weight.\n\n`unseen` - Strategy for handling unseen values. See replacement strategies below for options.\n\n`missing` - Strategy for handling missing values. See replacement strategies below for options.\n\n##### Replacement strategies\n\n`'global'` - Replace value with global target mean.\n\n`'nan'` - Replace value with np.nan.\n\n`'error'` - Raise ValueError.\n\n#### Example\n\n```python\nfrom xklearn.preprocessing import TargetEncoder\n...\n\nte = TargetEncoder(smoothing=10)\nX[:, 0] = te.fit_transform(X[:, 0], y)\n```\n\n### MultiColumnEncoder\nApplies a column encoder over multiple columns.\n\n#### Arguments\n`enc` - Base encoder that will be applied to selected columns\n\n`columns` - Column selection, either bool-mask, indices or None (default=None).\n\n#### Example\n```python\nfrom xklearn.preprocessing import CountEncoder\nfrom xklearn.preprocessing import MultiColumnEncoder\n...\n\ncolumns = [1, 3, 4]\nenc = CountEncoder()\n\nmce = MultiColumnEncoder(enc, columns)\nX = mce.fit_transform(X)\n```\n\n### FoldEstimator\nK-fold wrapped into an estimator that performs cross validation over a selected folding method automatically when fit. Can optionally be used as a stacked ensemble of k estimators after fit.\n\n#### Arguments\n`est` - Base estimator.\n\n`fold` - Folding cross validation object, i.e KFold and StratifedKfold.\n\n`metric` - Evaluation metric.\n\n`refit_full` - Flag indicting post fit behaviour. True will do a full refit on the full data, False will make it a stacked ensemble trained on the different folds.\n\n`verbose` - Flag for printing fold scores during fit.\n\n#### Example\n```python\nfrom xklearn.models import FoldEstimator\n...\n\nbase = RandomForestRegressor(n_estimators=10)\nfold = KFold(n_splits=5)\n\nest = FoldEstimator(base, fold=fold, metric=mean_squared_error, verbose=1)\n\nest.fit(X_train, y_train)\nest.predict(X_test)\n```\nOutput:\n```\nFinished fold 1 with score: 200.8023\nFinished fold 2 with score: 261.2365\nFinished fold 3 with score: 169.2404\nFinished fold 4 with score: 186.7915\nFinished fold 5 with score: 205.0894\nFinished with a total score of: 204.6813\n```\n\n### FoldLightGBM\nK-fold wrapped into an estimator that performs cross validation on a LGBM over a selected folding method automatically when fit. Can optionally be used as a stacked ensemble of k estimators after fit.\n\n#### Arguments\n`lgbm` - Base estimator.\n\n`fold` - Folding cross validation object, i.e KFold and StratifedKfold.\n\n`metric` - Evaluation metric.\n\n`fit_params` - Dictionary of parameter that should be fed to the fit method.\n\n`refit_full` - Flag indicting post fit behaviour. True will do a full refit on the full data, False will make it a stacked ensemble trained on the different folds.\n\n`refit_params` - Dictionary of parameter that should be fed to the refit if refit_full=False.\n\n`verbose` - Flag for printing fold scores during fit.\n\n#### Example\n```python\nfrom xklearn.models import FoldLightGBM\n...\n\nbase = LGBMClassifier(n_estimators=1000)\nfold = KFold(n_splits=5)\nfit_params = {'eval_metric': 'auc',\n 'early_stopping_rounds': 50,\n 'verbose': 0}\n \nfold_lgbm = FoldLightGBM(base, \n fold=fold, \n metric=roc_auc_score,\n fit_params=fit_params,\n verbose=1)\n \nfold_lgbm.fit(X_train, y_train)\nfold_lgbm.predict(X_test)\n```\nOutput:\n```\nFinished fold 1 with score: 0.9114\nFinished fold 2 with score: 0.9265\nFinished fold 3 with score: 0.9419\nFinished fold 4 with score: 0.9189\nFinished fold 5 with score: 0.9152\nFinished with a total score of: 0.9225\n```\n\n### FoldXGBoost\nK-fold wrapped into an estimator that performs cross validation on a XGBoost over a selected folding method automatically when fit. Can optionally be used as a stacked ensemble of k estimators after fit.\n\n#### Arguments\n`xgb` - Base estimator.\n\n`fold` - Folding cross validation object, i.e KFold and StratifedKfold.\n\n`metric` - Evaluation metric.\n\n`fit_params` - Dictionary of parameter that should be fed to the fit method.\n\n`refit_full` - Flag indicting post fit behaviour. True will do a full refit on the full data, False will make it a stacked ensemble trained on the different folds.\n\n`refit_params` - Dictionary of parameter that should be fed to the refit if refit_full=False.\n\n`verbose` - Flag for printing fold scores during fit.\n\n#### Example\n```python\nfrom xklearn.models import FoldXGBoost\n...\n\nbase = XGBRegressor(objective=\"reg:linear\", random_state=42)\nfold = KFold(n_splits=5)\nfit_params = {'eval_metric': 'mse',\n 'early_stopping_rounds': 5,\n 'verbose': 0}\n \nfold_xgb = FoldXGBoost(base, \n fold=fold, \n metric=mean_squared_error,\n fit_params=fit_params,\n verbose=1)\n \nfold_xgb.fit(X_train, y_train)\nfold_xgb.predict(X_test)\n```\nOutput:\n```\nFinished fold 1 with score: 3212.8362\nFinished fold 2 with score: 2179.7843\nFinished fold 3 with score: 2707.8460\nFinished fold 4 with score: 2988.6643\nFinished fold 5 with score: 3281.4299\nFinished with a total score of: 3274.9001\n```\n\n### StackClassifier\nEnsemble classifier that stacks an ensemble of classifiers by using their outputs as input features.\n\n#### Arguments\n`clfs` - List of ensemble of classifiers.\n\n`meta_clf` - Meta classifier that stacks the predictions of the ensemble.\n\n`keep_features` - Flag to train the meta classifier on the original features too.\n\n`refit` - Flag to retrain the ensemble of classifiers during fit.\n\n#### Example\n```python\nfrom xklearn.models import StackClassifier\n...\n\nmeta_clf = RidgeClassifier()\nensemble = [RandomForestClassifier(), KNeighborsClassifier(), SVC()]\n\nstack_clf = StackClassifier(clfs=ensemble, meta_clf=meta_clf, refit=True)\n\nstack_clf.fit(X_train, y_train)\ny_ = stack_clf.predict(X_test)\n```\n\n### StackRegressor\nEnsemble regressor that stacks an ensemble of regressors by using their outputs as input features.\n\n#### Arguments\n`regs` - List of ensemble of regressors.\n\n`meta_reg` - Meta regressor that stacks the predictions of the ensemble.\n\n`drop_first` : Drop first class probability to avoid multi-collinearity.\n\n`keep_features` - Flag to train the meta regressor on the original features too.\n\n`refit` - Flag to retrain the ensemble of regressors during fit.\n\n#### Example\n```python\nfrom xklearn.models import StackRegressor\n...\n\nmeta_reg = RidgeRegressor()\nensemble = [RandomForestRegressor(), KNeighborsRegressor(), SVR()]\n\nstack_reg = StackRegressor(regs=ensemble, meta_reg=meta_reg, refit=True)\n\nstack_reg.fit(X_train, y_train)\ny_ = stack_reg.predict(X_test)\n```\n\n### compress_dataframe\nReduce memory usage of a Pandas dataframe by finding columns that use larger variable types than unnecessary.\n\n#### Arguments\n`df` - Dataframe for memory reduction.\n\n`verbose` - Flag for printing result of memory reduction.\n\n#### Example\n```python\nfrom xklearn.utils import compress_dataframe\n...\n\ntrain = compress_dataframe(train, verbose=1)\n```\nOutput:\n```\nDataframe memory decreased to 169.60 MB (64.6% reduction)\n```", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/simon-larsson/extrakit-learn", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "xklearn", "package_url": "https://pypi.org/project/xklearn/", "platform": "", "project_url": "https://pypi.org/project/xklearn/", "project_urls": { "Homepage": "https://github.com/simon-larsson/extrakit-learn" }, "release_url": "https://pypi.org/project/xklearn/0.0.7/", "requires_dist": null, "requires_python": "", "summary": "Handy machine learning tools in the spirit of scikit-learn.", "version": "0.0.7" }, "last_serial": 5813768, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "25cb19fd9c8370765f8efd55985c5dc5", "sha256": "34d6180bccbaf7785482f2b57e3c36e18c43136e158156167eeff17b1c8e7445" }, "downloads": -1, "filename": "xklearn-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "25cb19fd9c8370765f8efd55985c5dc5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 14419, "upload_time": "2019-05-08T17:02:12", "url": "https://files.pythonhosted.org/packages/57/6d/e8200d38c36f44c9dd8379558fa11313498f6d11608348b8d6a854187d34/xklearn-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "041576d5ada8e48902564e3d5b04f7f2", "sha256": "ec22a936b22e8e0e99e603ae694b891af5c4e405fd800421adf1e7dd75c7ef5e" }, "downloads": -1, "filename": "xklearn-0.0.1.tar.gz", "has_sig": false, "md5_digest": "041576d5ada8e48902564e3d5b04f7f2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9924, "upload_time": "2019-05-08T17:02:14", "url": "https://files.pythonhosted.org/packages/17/26/d9b3f2ba2bd2fa25d8a3321ec0bbcb7c065cea6b847c37708876a40e68ba/xklearn-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "944f1deaa90c7f3238178e9fecf1d6c7", "sha256": "93bae38941c8bee9b3dc37faf2690f8ce8c0b5da2c43dfdaaaac56eaa32e4e93" }, "downloads": -1, "filename": "xklearn-0.0.2.tar.gz", "has_sig": false, "md5_digest": "944f1deaa90c7f3238178e9fecf1d6c7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12466, "upload_time": "2019-05-13T13:37:12", "url": "https://files.pythonhosted.org/packages/e0/5e/72306118d3bb0e631b4c59a7b13fcc6edf5e6733059983184fd9754d07c5/xklearn-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "92bfd63abcf8643e11179a04a00cd946", "sha256": "97c6090764dd540af096c148f6b445dc26e18d76c21648c2e17460a80bab10e7" }, "downloads": -1, "filename": "xklearn-0.0.3.tar.gz", "has_sig": false, "md5_digest": "92bfd63abcf8643e11179a04a00cd946", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12455, "upload_time": "2019-05-13T13:42:22", "url": "https://files.pythonhosted.org/packages/87/40/b4053fb53d425ce02832a17672002394bfc93bfe57e2425c006bf06efefe/xklearn-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "457a61746fe097af3ec28acf9d958d59", "sha256": "1f09d058036a2433610f33ffb7ef639612d3ca5801f86c9f9fb3dde28ec4d1e3" }, "downloads": -1, "filename": "xklearn-0.0.4.tar.gz", "has_sig": false, "md5_digest": "457a61746fe097af3ec28acf9d958d59", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14131, "upload_time": "2019-05-14T19:14:47", "url": "https://files.pythonhosted.org/packages/48/62/22d26635216f7d8bc5914fb280c8bd064177c7f52c73aef2f28e0841752e/xklearn-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "46a522189b6fd5aa061800979198608a", "sha256": "65f2e1de05db6675f3f9bd94268b1a7b9aeb0cec38e2900ff72315d59acaec2a" }, "downloads": -1, "filename": "xklearn-0.0.5.tar.gz", "has_sig": false, "md5_digest": "46a522189b6fd5aa061800979198608a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14302, "upload_time": "2019-06-19T12:57:29", "url": "https://files.pythonhosted.org/packages/74/40/129fd015b844321d4ee6948919c96099f0fb466e59c0305fa30d9bd3cb6a/xklearn-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "e8008face3e177e7a99d27a5bc455be9", "sha256": "e29fd82200123c00738632cfb8e1cb562d1e592b63f021c77b03706c4e66d700" }, "downloads": -1, "filename": "xklearn-0.0.6.tar.gz", "has_sig": false, "md5_digest": "e8008face3e177e7a99d27a5bc455be9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15298, "upload_time": "2019-09-10T16:34:16", "url": "https://files.pythonhosted.org/packages/0f/8f/b6496ad157fd879f5b9d616b6c87d2d74b5f13da54d04c66a465b4eb9bbf/xklearn-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "cde33481a71c1cc95377800208e03725", "sha256": "916e0a7618b8bdf8988de7df5edc5e93ef3ebab609d0f46c1082a60988a6befb" }, "downloads": -1, "filename": "xklearn-0.0.7.tar.gz", "has_sig": false, "md5_digest": "cde33481a71c1cc95377800208e03725", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14990, "upload_time": "2019-09-11T08:59:03", "url": "https://files.pythonhosted.org/packages/cd/d5/b4f65e1c390fe013266fb8132c22db607fe31ca35348886a9ccfc2f5e0cb/xklearn-0.0.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cde33481a71c1cc95377800208e03725", "sha256": "916e0a7618b8bdf8988de7df5edc5e93ef3ebab609d0f46c1082a60988a6befb" }, "downloads": -1, "filename": "xklearn-0.0.7.tar.gz", "has_sig": false, "md5_digest": "cde33481a71c1cc95377800208e03725", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14990, "upload_time": "2019-09-11T08:59:03", "url": "https://files.pythonhosted.org/packages/cd/d5/b4f65e1c390fe013266fb8132c22db607fe31ca35348886a9ccfc2f5e0cb/xklearn-0.0.7.tar.gz" } ] }