{ "info": { "author": "Rog\u00e9rio C. P. Fragoso", "author_email": "rcpf@cin.ufpe.br", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "featselection\n=========\n\nThis project provides a set filter methods for feature selection applied to text classification.\n\nCurrently the following methods are available:\n\n- ALOFT - At Least One FeaTure `[1] `_\n- MFD - Maximum f Features per Document `[2] `_\n- MFDR - Maximum f Features per Document-Reduced `[2] `_\n- cMFDR - Class-dependent Maximum f Features per Document-Reduced `[3] `_\n- AFSA - Automatic Features Subsets Analyzer `[4] `_\n\n============\nInstallation\n============\nThe package can be installed using pip:\n\n``pip install featselection``\n\n=============\nDependencies\n=============\nThe code is tested to work with Python 3.6. The dependency requirements are: \n\n* numpy\n* scipy\n* pandas\n* scikit-learn\n\nThese dependencies are automatically installed using the pip command above.\n\n=========\nExamples\n=========\n\nIn this example, we show the use MFD.\n\n.. code-block:: python3\n\n import numpy as np\n\n from sklearn.metrics import accuracy_score\n from sklearn.feature_selection import chi2\n from sklearn.naive_bayes import MultinomialNB\n from sklearn.datasets import fetch_20newsgroups\n from sklearn.model_selection import StratifiedKFold\n from sklearn.feature_extraction.text import CountVectorizer\n\n from filters import MFD\n\n\n # Load data\n cats = ['comp.windows.x', 'rec.sport.baseball', 'sci.med', 'soc.religion.christian', 'talk.politics.misc']\n newsgroups = fetch_20newsgroups(categories=cats)\n\n # Pre-processing: Transform texts to Bag-of-Words and remove stopwords\n vectorizer = CountVectorizer(stop_words='english')\n vectors = vectorizer.fit_transform(newsgroups.data)\n\n # 10-fold stratified cross validation\n skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)\n accuracy_results = []\n\n for train_index, test_index in skf.split(vectors, newsgroups.target):\n # Train\n my_filter = MFD(10, chi2)\n X_train = my_filter.fit_transform(vectors[train_index], newsgroups.target[train_index])\n clf = MultinomialNB()\n clf.fit(X_train, newsgroups.target[train_index])\n\n # Test\n X_test = my_filter.transform(vectors[test_index])\n predicted = clf.predict(X_test)\n\n # Evaluate\n accuracy_results.append(accuracy_score(newsgroups.target[test_index], predicted))\n\n # Output averaged accuracy\n print('Mean accuracy = {0} ({1})'.format(np.mean(accuracy_results), np.std(accuracy_results)))\n\n==========\nReferences\n==========\n\n\n\n`[1] `_ Pinheiro, Roberto HW, et al. \"A global-ranking local feature selection method for text categorization.\" Expert Systems with Applications 39.17 (2012): 12851-12857.\n\n`[2] `_ Pinheiro, Roberto HW, et al. \"Data-driven global-ranking local feature selection methods for text categorization.\" Expert Systems with Applications 42.4 (2015): 1941-1949.\n\n`[3] `_ Fragoso, Rog\u00e9rio CP, et al. \"Class-dependent feature selection algorithm for text categorization.\" 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016.\n\n`[4] `_ Fragoso, Rog\u00e9rio CP, et al. \"A method for automatic determination of the feature vector size for text categorization.\" 2016 5th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2016.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/rcpf/featselection", "keywords": "", "license": "MIT", "maintainer": "Rog\u00e9rio C. P. Fragoso", "maintainer_email": "rcpf@cin.ufpe.br", "name": "featselection", "package_url": "https://pypi.org/project/featselection/", "platform": "", "project_url": "https://pypi.org/project/featselection/", "project_urls": { "Homepage": "https://github.com/rcpf/featselection" }, "release_url": "https://pypi.org/project/featselection/0.2.dev0/", "requires_dist": [ "scikit-learn (>=0.19.0)", "numpy (>=1.14.5)", "pandas (>=0.23.3)", "scipy (>=0.13.3)" ], "requires_python": ">=3", "summary": "Feature selection methods for Text Classification", "version": "0.2.dev0" }, "last_serial": 5229230, "releases": { "0.2.dev0": [ { "comment_text": "", "digests": { "md5": "0281d415f9f2ac20df5f62eaec13cc23", "sha256": "3879720f2df0c0d17a066c369f5b433bab3c0b172e02d8a8e8047c91ef12914c" }, "downloads": -1, "filename": "featselection-0.2.dev0-py3-none-any.whl", "has_sig": false, "md5_digest": "0281d415f9f2ac20df5f62eaec13cc23", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 5171, "upload_time": "2019-05-05T18:44:45", "url": "https://files.pythonhosted.org/packages/36/e0/c0f5e08d65721c6e024a268a96e2453805513ffe4283f14d09178af3cbf1/featselection-0.2.dev0-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0281d415f9f2ac20df5f62eaec13cc23", "sha256": "3879720f2df0c0d17a066c369f5b433bab3c0b172e02d8a8e8047c91ef12914c" }, "downloads": -1, "filename": "featselection-0.2.dev0-py3-none-any.whl", "has_sig": false, "md5_digest": "0281d415f9f2ac20df5f62eaec13cc23", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 5171, "upload_time": "2019-05-05T18:44:45", "url": "https://files.pythonhosted.org/packages/36/e0/c0f5e08d65721c6e024a268a96e2453805513ffe4283f14d09178af3cbf1/featselection-0.2.dev0-py3-none-any.whl" } ] }