{
    "info": {
        "author": "Rog\u00e9rio C. P. Fragoso",
        "author_email": "rcpf@cin.ufpe.br",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Developers",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: BSD License",
            "Programming Language :: Python :: 3",
            "Programming Language :: Python :: 3.5",
            "Programming Language :: Python :: 3.6",
            "Topic :: Scientific/Engineering :: Artificial Intelligence"
        ],
        "description": "featselection\n=========\n\nThis project provides a set filter methods for feature selection applied to text classification.\n\nCurrently the following methods are available:\n\n- ALOFT - At Least One FeaTure `[1] <https://www.sciencedirect.com/science/article/pii/S0957417412007063>`_\n- MFD - Maximum f Features per Document `[2] <https://www.sciencedirect.com/science/article/pii/S0957417414006344>`_\n- MFDR - Maximum f Features per Document-Reduced `[2] <https://www.sciencedirect.com/science/article/pii/S0957417414006344>`_\n- cMFDR - Class-dependent Maximum f Features per Document-Reduced `[3] <https://ieeexplore.ieee.org/abstract/document/7727649>`_\n- AFSA - Automatic Features Subsets Analyzer `[4] <https://ieeexplore.ieee.org/abstract/document/7839596>`_\n\n============\nInstallation\n============\nThe package can be installed using pip:\n\n``pip install featselection``\n\n=============\nDependencies\n=============\nThe code is tested to work with Python 3.6. The dependency requirements are: \n\n* numpy\n* scipy\n* pandas\n* scikit-learn\n\nThese dependencies are automatically installed using the pip command above.\n\n=========\nExamples\n=========\n\nIn this example, we show the use MFD.\n\n.. code-block:: python3\n\n    import numpy as np\n\n    from sklearn.metrics import accuracy_score\n    from sklearn.feature_selection import chi2\n    from sklearn.naive_bayes import MultinomialNB\n    from sklearn.datasets import fetch_20newsgroups\n    from sklearn.model_selection import StratifiedKFold\n    from sklearn.feature_extraction.text import CountVectorizer\n\n    from filters import MFD\n\n\n    # Load data\n    cats = ['comp.windows.x', 'rec.sport.baseball', 'sci.med', 'soc.religion.christian', 'talk.politics.misc']\n    newsgroups = fetch_20newsgroups(categories=cats)\n\n    # Pre-processing: Transform texts to Bag-of-Words and remove stopwords\n    vectorizer = CountVectorizer(stop_words='english')\n    vectors = vectorizer.fit_transform(newsgroups.data)\n\n    # 10-fold stratified cross validation\n    skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)\n    accuracy_results = []\n\n    for train_index, test_index in skf.split(vectors, newsgroups.target):\n        # Train\n        my_filter = MFD(10, chi2)\n        X_train = my_filter.fit_transform(vectors[train_index], newsgroups.target[train_index])\n        clf = MultinomialNB()\n        clf.fit(X_train, newsgroups.target[train_index])\n\n        # Test\n        X_test = my_filter.transform(vectors[test_index])\n        predicted = clf.predict(X_test)\n\n        # Evaluate\n        accuracy_results.append(accuracy_score(newsgroups.target[test_index], predicted))\n\n    # Output averaged accuracy\n    print('Mean accuracy = {0} ({1})'.format(np.mean(accuracy_results), np.std(accuracy_results)))\n\n==========\nReferences\n==========\n\n\n\n`[1] <https://www.sciencedirect.com/science/article/pii/S0957417412007063>`_ Pinheiro, Roberto HW, et al. \"A global-ranking local feature selection method for text categorization.\" Expert Systems with Applications 39.17 (2012): 12851-12857.\n\n`[2] <https://www.sciencedirect.com/science/article/pii/S0957417414006344>`_ Pinheiro, Roberto HW, et al. \"Data-driven global-ranking local feature selection methods for text categorization.\" Expert Systems with Applications 42.4 (2015): 1941-1949.\n\n`[3] <https://ieeexplore.ieee.org/abstract/document/7727649>`_ Fragoso, Rog\u00e9rio CP, et al. \"Class-dependent feature selection algorithm for text categorization.\" 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016.\n\n`[4] <https://ieeexplore.ieee.org/abstract/document/7839596>`_ Fragoso, Rog\u00e9rio CP, et al. \"A method for automatic determination of the feature vector size for text categorization.\" 2016 5th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2016.\n\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/rcpf/featselection",
        "keywords": "",
        "license": "MIT",
        "maintainer": "Rog\u00e9rio C. P. Fragoso",
        "maintainer_email": "rcpf@cin.ufpe.br",
        "name": "featselection",
        "package_url": "https://pypi.org/project/featselection/",
        "platform": "",
        "project_url": "https://pypi.org/project/featselection/",
        "project_urls": {
            "Homepage": "https://github.com/rcpf/featselection"
        },
        "release_url": "https://pypi.org/project/featselection/0.2.dev0/",
        "requires_dist": [
            "scikit-learn (>=0.19.0)",
            "numpy (>=1.14.5)",
            "pandas (>=0.23.3)",
            "scipy (>=0.13.3)"
        ],
        "requires_python": ">=3",
        "summary": "Feature selection methods for Text Classification",
        "version": "0.2.dev0"
    },
    "last_serial": 5229230,
    "releases": {
        "0.2.dev0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "0281d415f9f2ac20df5f62eaec13cc23",
                    "sha256": "3879720f2df0c0d17a066c369f5b433bab3c0b172e02d8a8e8047c91ef12914c"
                },
                "downloads": -1,
                "filename": "featselection-0.2.dev0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "0281d415f9f2ac20df5f62eaec13cc23",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": ">=3",
                "size": 5171,
                "upload_time": "2019-05-05T18:44:45",
                "url": "https://files.pythonhosted.org/packages/36/e0/c0f5e08d65721c6e024a268a96e2453805513ffe4283f14d09178af3cbf1/featselection-0.2.dev0-py3-none-any.whl"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "0281d415f9f2ac20df5f62eaec13cc23",
                "sha256": "3879720f2df0c0d17a066c369f5b433bab3c0b172e02d8a8e8047c91ef12914c"
            },
            "downloads": -1,
            "filename": "featselection-0.2.dev0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0281d415f9f2ac20df5f62eaec13cc23",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 5171,
            "upload_time": "2019-05-05T18:44:45",
            "url": "https://files.pythonhosted.org/packages/36/e0/c0f5e08d65721c6e024a268a96e2453805513ffe4283f14d09178af3cbf1/featselection-0.2.dev0-py3-none-any.whl"
        }
    ]
}