{ "info": { "author": "Sa\u0161o Karakati\u010d", "author_email": "karakatic@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development" ], "description": "# EvoPreprocess\n\nEvoPreprocess is a Python toolkit for sampling datasets, instance weighting, and feature selection. It is compatible with [scikit-learn](http://scikit-learn.org/stable/) and [imbalanced-learn](https://imbalanced-learn.readthedocs.io/en/stable/). It is based on [NiaPy](https://github.com/NiaOrg/NiaPy) library for the implementation of nature-inspired algorithms and is distributed under MIT license.\n\n## Getting Started\n\nThese instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.\n\n### Requirements\n- Python 3.6+\n- PIP\n\n### Dependencies\nEvoSampling requires:\n\n- numpy(>=1.8.2)\n- scikit-learn(>=0.19.0)\n- imbalanced-learn(>=0.3.1)\n- NiaPy(>=2.0.0rc2)\n\n### Installation\nInstall EvoPreprocess with pip:\n\n```sh\n$ pip install EvoPreprocess\n```\nOr directly from the source code:\n\n```sh\n$ git clone https://github.com/karakatic/EvoPreprocess.git\n$ cd EvoPreprocess\n$ python setup.py install\n```\n\n# Usage\n\nAfter installation, the package can be imported:\n\n```sh\n$ python\n>>> import EvoPreprocess\n>>> EvoPreprocess.__version__\n```\n\n## Data sampling\n\n### Simple data sampling example\n\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom EvoPreprocess.data_sampling import EvoSampling\n\n# Load classification data\ndataset = load_breast_cancer()\n\n# Print the size of dataset\nprint(dataset.data.shape, len(dataset.target))\n\n# Sample instances of dataset with default settings with EvoSampling\nX_resampled, y_resampled = EvoSampling().fit_resample(dataset.data, dataset.target)\n\n# Print the size of dataset after sampling\nprint(X_resampled.shape, len(y_resampled))\n```\n\n### Data sampling for regression with custom nature-inspired algorithm and other custom settings\n\n```python\nimport NiaPy.algorithms.basic as nia\nfrom sklearn.datasets import load_boston\nfrom sklearn.tree import DecisionTreeRegressor\nfrom EvoPreprocess.data_sampling import EvoSampling\n\n# Load regression data\ndataset = load_boston()\n\n# Print the size of dataset\nprint(dataset.data.shape, len(dataset.target))\n\n# Sample instances of dataset with custom settings and regression with EvoSampling\nX_resampled, y_resampled = EvoSampling(\n\tevaluator=DecisionTreeRegressor(),\n\tevo_algorithm=nia.EvolutionStrategyMpL,\n\tn_folds=5,\n\tn_runs=5,\n\tn_jobs=4\n).fit_resample(dataset.data, dataset.target)\n\n# Print the size of dataset after sampling\nprint(X_resampled.shape, len(y_resampled))\n```\n\n### Data sampling with scikit-learn\n\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.tree import DecisionTreeClassifier\nfrom EvoPreprocess.data_sampling import EvoSampling\n\n# Set the random seed for the reproducibility\nrandom_seed = 1111\n\n# Load classification data\ndataset = load_breast_cancer()\n\n# Split the dataset to training and testing set\nX_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target,\n test_size=0.33,\n random_state=random_seed)\n\n# Train the decision tree model\ncls = DecisionTreeClassifier(random_state=random_seed)\ncls.fit(X_train, y_train)\n\n# Print the results: shape of the original dataset and the accuracy of decision tree classifier on original data\nprint(X_train.shape, accuracy_score(y_test, cls.predict(X_test)), sep=': ')\n\n# Sample the data with random_seed set\nevo = EvoSampling(n_folds=3, random_seed=random_seed)\nX_resampled, y_resampled = evo.fit_resample(X_train, y_train)\n\n# Fit the decision tree model\ncls.fit(X_resampled, y_resampled)\n\n# Print the results: shape of the original dataset and the accuracy of decision tree classifier on original data\nprint(X_resampled.shape, accuracy_score(y_test, cls.predict(X_test)), sep=': ')\n```\n\n## Instance weighting\n\n### Simple instance weighting example\n\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom EvoPreprocess.data_weighting import EvoWeighting\n\n# Load classification data\ndataset = load_breast_cancer()\n\n# Get weights for the instances\ninstance_weights = EvoWeighting().reweight(dataset.data, dataset.target)\n\n# Print the weights for instances\nprint(instance_weights)\n```\n\n### Instance weighting with scikit-learn\n\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.tree import DecisionTreeClassifier\nfrom EvoPreprocess.data_weighting import EvoWeighting\n\n# Set the random seed for the reproducibility\nrandom_seed = 1234\n\n# Load classification data\ndataset = load_breast_cancer()\n\n# Split the dataset to training and testing set\nX_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target,\n test_size=0.33,\n random_state=random_seed)\n\n# Train the decision tree model with custom instance weights\ncls = DecisionTreeClassifier(random_state=random_seed)\ncls.fit(X_train, y_train)\n\n# Print the results: shape of the original dataset and the accuracy of decision tree classifier on original data\nprint(X_train.shape, accuracy_score(y_test, cls.predict(X_test)), sep=': ')\n\n# Get weights for the instances\ninstance_weights = EvoWeighting(random_seed=random_seed).reweight(X_train, y_train)\n\n# Fit the decision tree model\ncls.fit(X_train, y_train, sample_weight=instance_weights)\n\n# Print the results: shape of the original dataset and the accuracy of decision tree classifier on original data\nprint(X_train.shape, accuracy_score(y_test, cls.predict(X_test)), sep=': ')\n```\n\n## Feature selection\n\n### Simple feature selection example\n\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom EvoPreprocess.feature_selection import EvoFeatureSelection\n\n# Load classification data\ndataset = load_breast_cancer()\n\n# Print the size of dataset\nprint(dataset.data.shape)\n\n# Run feature selection with EvoFeatureSelection\nX_new = EvoFeatureSelection().fit_transform(\n dataset.data,\n dataset.target)\n\n# Print the size of dataset after feature selection\nprint(X_new.shape)\n```\n\n### Feature selection for regression with custom nature-inspired algorithm and other custom settings\n\n```python\nfrom sklearn.datasets import load_boston\nfrom sklearn.tree import DecisionTreeRegressor\nimport NiaPy.algorithms.basic as nia\nfrom EvoPreprocess.feature_selection import EvoFeatureSelection\n\n# Load regression data\ndataset = load_boston()\n\n# Print the size of dataset\nprint(dataset.data.shape)\n\n# Run feature selection with custom settings and regression with EvoFeatureSelection\nX_new = EvoFeatureSelection(\n evaluator=DecisionTreeRegressor(max_depth=2),\n evo_algorithm=nia.DifferentialEvolution,\n random_seed=1,\n n_runs=5,\n n_folds=5,\n n_jobs=4\n).fit_transform(dataset.data, dataset.target)\n\n# Print the size of dataset after feature selection\nprint(X_new.shape)\n```\n\n### Feature selection with scikit-learn\n\n```python\nfrom sklearn.datasets import load_boston\nfrom sklearn.metrics import mean_squared_error\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.tree import DecisionTreeRegressor\nfrom EvoPreprocess.feature_selection import EvoFeatureSelection\n\n# Set the random seed for the reproducibility\nrandom_seed = 1000\n\n# Load regression data\ndataset = load_boston()\n\n# Split the dataset to training and testing set\nX_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target,\n test_size=0.33,\n random_state=random_seed)\n\n# Train the decision tree model\nmodel = DecisionTreeRegressor(random_state=random_seed)\nmodel.fit(X_train, y_train)\n\n# Print the results: shape of the original dataset and the accuracy of decision tree regressor on original data\nprint(X_train.shape, mean_squared_error(y_test, model.predict(X_test)), sep=': ')\n\n# Sample the data with random_seed set\nevo = EvoFeatureSelection(evaluator=model, random_seed=random_seed)\nX_train_new = evo.fit_transform(X_train, y_train)\n\n# Fit the decision tree model\nmodel.fit(X_train_new, y_train)\n\n# Keep only selected feature on test set\nX_test_new = evo.transform(X_test)\n\n# Print the results: shape of the original dataset and the MSE of decision tree regressor on original data\nprint(X_train_new.shape, mean_squared_error(y_test, model.predict(X_test_new)), sep=': ')\n```\n\n## EvoPreprocess as a part of the pipeline (from imbalanced-learn)\n\n```python\nfrom imblearn.pipeline import Pipeline\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.tree import DecisionTreeClassifier\nfrom EvoPreprocess.data_sampling import EvoSampling\nfrom EvoPreprocess.feature_selection import EvoFeatureSelection\n\n# Set the random seed for the reproducibility\nrandom_seed = 1111\n\n# Load classification data\ndataset = load_breast_cancer()\n\n# Split the dataset to training and testing set\nX_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target,\n test_size=0.33,\n random_state=random_seed)\n\n# Train the decision tree model\ncls = DecisionTreeClassifier(random_state=random_seed)\ncls.fit(X_train, y_train)\n\n# Print the results: shape of the original dataset and the accuracy of decision tree classifier on original data\nprint(X_train.shape, accuracy_score(y_test, cls.predict(X_test)), sep=': ')\n\n# Make scikit-learn pipeline with feature selection and data sampling\npipeline = Pipeline(steps=[\n ('feature_selection', EvoFeatureSelection(n_folds=10, random_seed=random_seed)),\n ('data_sampling', EvoSampling(n_folds=10, random_seed=random_seed)),\n ('classifier', DecisionTreeClassifier(random_state=random_seed))])\n\n# Fit the pipeline\npipeline.fit(X_train, y_train)\n\n# Print the results: the accuracy of the pipeline\nprint(accuracy_score(y_test, pipeline.predict(X_test)))\n```\n\nFor more examples please look at **Examples** folder.\n\n# Authors\n\nEvoPreprocess was programmed and is maintained by Sa\u0139\u02c7o Karakati\u00c4\u0164 from University of Maribor.\n\n\n## License\n\nThis project is licensed under the MIT License - see .\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/karakatic/EvoPreprocess", "keywords": "Evolutionary Algorithms,Nature Inspired Algorithms,Data Sampling,Instance Weighting,Feature Selection,Preprocessing,Machine Learning", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "EvoPreprocess", "package_url": "https://pypi.org/project/EvoPreprocess/", "platform": "", "project_url": "https://pypi.org/project/EvoPreprocess/", "project_urls": { "Homepage": "https://github.com/karakatic/EvoPreprocess" }, "release_url": "https://pypi.org/project/EvoPreprocess/0.1.4/", "requires_dist": [ "numpy", "scipy", "scikit-learn (>=0.19.0imbalanced-learn>=0.3.1NiaPy>=2.0.0rc2)" ], "requires_python": "", "summary": "Data Preprocessing with Evolutionary and Nature Inspired Algorithms.", "version": "0.1.4" }, "last_serial": 5551148, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "7802721b03cdf33f1945b864325cf14b", "sha256": "9adbd869a650a4cb76012ae101be46866848df40a38f5cd832ae4d1af69797aa" }, "downloads": -1, "filename": "EvoPreprocess-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "7802721b03cdf33f1945b864325cf14b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17013, "upload_time": "2019-07-18T13:17:50", "url": "https://files.pythonhosted.org/packages/a6/be/0bc71a0eb01189309b1f8a192ba680b1901fa08ada093d2fac3607a8d9e5/EvoPreprocess-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4702a64f92ddb5b16d8cef6be41eaddd", "sha256": "e5e187089c5f4193dfab3dbabcc6229ef80b9bde572e96e2fd91cfd4686b97a4" }, "downloads": -1, "filename": "EvoPreprocess-0.1.1.tar.gz", "has_sig": false, "md5_digest": "4702a64f92ddb5b16d8cef6be41eaddd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9155, "upload_time": "2019-07-18T13:17:58", "url": "https://files.pythonhosted.org/packages/ad/37/5e85c4a95c1d030b51e62ddcbc5c1e7d290075cc5fdebea7cad049e5f248/EvoPreprocess-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "414357b6dbc5087ade85aa75e64542ca", "sha256": "45d0406fb6627cabb743c1306736be9117779733e5569d939fc2f0627e5156e0" }, "downloads": -1, "filename": "EvoPreprocess-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "414357b6dbc5087ade85aa75e64542ca", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17013, "upload_time": "2019-07-18T13:17:53", "url": "https://files.pythonhosted.org/packages/d7/25/33b0ceb995cd69b632da47a12714766e8e02d0fd74f4842596253cd6de9c/EvoPreprocess-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fb2d62965cc07f25d627c96066d1de20", "sha256": "89893be5c18bf2d1131b4705bb6703e99dec883fcaffd92f7b1b8e82fbf2ee79" }, "downloads": -1, "filename": "EvoPreprocess-0.1.2.tar.gz", "has_sig": false, "md5_digest": "fb2d62965cc07f25d627c96066d1de20", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9154, "upload_time": "2019-07-18T13:18:00", "url": "https://files.pythonhosted.org/packages/e7/d5/f3844ed31e09228b54c9ab8d66860ff17b20c9eda36f0b646f64f2134746/EvoPreprocess-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "a32765c3cc6de549651c7e4e5a286cd2", "sha256": "12891df5ccfdfabeb3aeed645f1547be7338d9ce9a90f5a5ea3644971c9b068b" }, "downloads": -1, "filename": "EvoPreprocess-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "a32765c3cc6de549651c7e4e5a286cd2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17015, "upload_time": "2019-07-18T13:17:55", "url": "https://files.pythonhosted.org/packages/38/b2/ec27bbc49fb6d1619cf31add6ba3d0799323c62ecaa23527b1de1f6fd146/EvoPreprocess-0.1.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2f8d85d445e9f56ff774412a2d73b222", "sha256": "853467d3ca4aa6738614a29b8848d73f53b78ce729ad24e754583683a1a75dc6" }, "downloads": -1, "filename": "EvoPreprocess-0.1.3.tar.gz", "has_sig": false, "md5_digest": "2f8d85d445e9f56ff774412a2d73b222", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9151, "upload_time": "2019-07-18T13:18:02", "url": "https://files.pythonhosted.org/packages/e5/1a/dd26cc896c32ad06e5f20577ce237aae7f405bdb9ba2ce4b95aaf9cee3bd/EvoPreprocess-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "8b04853284ceeffdd6342499c4c05b79", "sha256": "72a5124f76beb7cc1707b8fa861aa92cc74801818f94cf96239b2659d7262af6" }, "downloads": -1, "filename": "EvoPreprocess-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8b04853284ceeffdd6342499c4c05b79", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17006, "upload_time": "2019-07-18T13:17:56", "url": "https://files.pythonhosted.org/packages/4c/db/0e12da16148a1efc43ee2d17b0e30430d81d25437e329d6bc34fd7949984/EvoPreprocess-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a4f14be5cca15504f493609f49bda954", "sha256": "eccdff989d78df7bd3603a7122a7f3dd100d89859b1768051b277c3fe73cf32f" }, "downloads": -1, "filename": "EvoPreprocess-0.1.4.tar.gz", "has_sig": false, "md5_digest": "a4f14be5cca15504f493609f49bda954", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9154, "upload_time": "2019-07-18T13:18:04", "url": "https://files.pythonhosted.org/packages/21/f2/b9abb9e3caadc15b461241ffe147eefcee0f79132905cd6d08574f4ec649/EvoPreprocess-0.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8b04853284ceeffdd6342499c4c05b79", "sha256": "72a5124f76beb7cc1707b8fa861aa92cc74801818f94cf96239b2659d7262af6" }, "downloads": -1, "filename": "EvoPreprocess-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8b04853284ceeffdd6342499c4c05b79", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17006, "upload_time": "2019-07-18T13:17:56", "url": "https://files.pythonhosted.org/packages/4c/db/0e12da16148a1efc43ee2d17b0e30430d81d25437e329d6bc34fd7949984/EvoPreprocess-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a4f14be5cca15504f493609f49bda954", "sha256": "eccdff989d78df7bd3603a7122a7f3dd100d89859b1768051b277c3fe73cf32f" }, "downloads": -1, "filename": "EvoPreprocess-0.1.4.tar.gz", "has_sig": false, "md5_digest": "a4f14be5cca15504f493609f49bda954", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9154, "upload_time": "2019-07-18T13:18:04", "url": "https://files.pythonhosted.org/packages/21/f2/b9abb9e3caadc15b461241ffe147eefcee0f79132905cd6d08574f4ec649/EvoPreprocess-0.1.4.tar.gz" } ] }