{ "info": { "author": "Wesley Baugh", "author_email": "wesley@bwbaugh.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: Science/Research", "Natural Language :: English", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic" ], "description": "infer\r\n=====\r\n\r\nA machine learning toolkit for classification and assisted\r\nexperimentation.\r\n\r\n|Build Status|\r\n\r\nFeatures\r\n--------\r\n\r\nAssisted experimentation\r\n~~~~~~~~~~~~~~~~~~~~~~~~\r\n\r\nWe provide an experimental foundation for assisted experimentation by\r\nallowing the user to modularize the tasks that are common, but unique in\r\nimplementation, to every experiment, including knowing how to:\r\n\r\n- Perform any required initial setup to get the model ready.\r\n- Parse the training dataset.\r\n- Parse the testing dataset.\r\n- Train the model given a training instance.\r\n- Make a prediction for a test instance using the current model.\r\n\r\nOnce the task specific implementations are known, we then coordinate the\r\nuse of those functions to perform the experiment in an automated fashion\r\nby automatically performing tasks that can be common to any experiment:\r\n\r\n- Run an experiment given an implementation of all the modular tasks.\r\n- Incrementally train the model and get the current model's\r\n performance.\r\n- Get the test instances that were incorrectly labeled using the\r\n current model.\r\n\r\nFeature extraction\r\n~~~~~~~~~~~~~~~~~~\r\n\r\nA feature extractor takes a document and extracts features to be used\r\nwith a machine learning classifier (during training and prediction).\r\n\r\nWe wanted to provide a way to easily perform the common task of\r\nextracting textual features from documents, while at the same time\r\nmaking it easy for the researcher to experiment with the kinds of\r\nfeatures that are extracted. The researcher can currently specify:\r\n\r\n- The function used to tokenize the raw document string.\r\n- The range of n-grams to extract from the text\r\n\r\nClassification\r\n~~~~~~~~~~~~~~\r\n\r\nPerformance metrics\r\n^^^^^^^^^^^^^^^^^^^\r\n\r\nWe provide a means of measuring the performance of your classifier by\r\nproviding standard performance metrics, expanded to allow for\r\nmultinomial classifiers, including:\r\n\r\n- Confusion matrix\r\n- Accuracy\r\n- Recall (average, weighted average, and per class)\r\n- Precision (average, weighted average, and per class)\r\n- F-measure with a selectable *beta* parameter (average, weighted\r\n average, and per class)\r\n\r\n(Multinomial) Naive Bayes\r\n^^^^^^^^^^^^^^^^^^^^^^^^^\r\n\r\nSome of the existing implementations of Naive Bayes that are available\r\nin various libraries we have found to be very memory inefficient.\r\nBecause of this, we decided to write our own implementation that can\r\nhopefully be better optimized.\r\n\r\nIn addition, there are lots of ways you can experiment with using and\r\noptimizing the performance of Naive Bayes that we wanted to be able to\r\neasily experiment with.\r\n\r\nFeature selection\r\n~~~~~~~~~~~~~~~~~\r\n\r\nFeature selection is another tool that the researcher should be able to\r\nexperiment with.\r\n\r\nDictionary trimming\r\n^^^^^^^^^^^^^^^^^^^\r\n\r\nCurrently, we provide a form of feature selection that is similar to\r\ndictionary trimming, by having the classifier ignore all but the top-k\r\nmost frequent features. This often gives us 90% of the benefit of\r\nfeature selection without the work of computing more complex metrics.\r\n\r\nDictionary trimming normally involves making a pass over your dataset in\r\norder to extract the (feature) vocabulary. However, this is infeasible\r\nwhen you are attempting to learn in an online (streaming) setting, such\r\nas when your documents are continuously coming in, like tweets off of a\r\nstream from Twitter. To handle this case, we created a data structure\r\nthat *efficiently* keeps track of the top-k most common terms that is\r\nupdated while training. This allows O(1) checking if a feature is in the\r\ntop-k features *without* having to make a pass over the vocabulary to\r\nmake the check.\r\n\r\n.. |Build Status| image:: https://travis-ci.org/bwbaugh/infer.png?branch=master\r\n :target: https://travis-ci.org/bwbaugh/infer", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://www.github.com/bwbaugh/infer", "keywords": "", "license": "Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License", "maintainer": "", "maintainer_email": "", "name": "infer", "package_url": "https://pypi.org/project/infer/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/infer/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://www.github.com/bwbaugh/infer" }, "release_url": "https://pypi.org/project/infer/0.1/", "requires_dist": null, "requires_python": null, "summary": "A machine learning toolkit for classification and assisted experimentation.", "version": "0.1" }, "last_serial": 793269, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "dbb7eea841302ecf562a987608f53b68", "sha256": "af81d0f0e40b3db138d496ee4af46d117eef55bff594a890cc808379359ac5db" }, "downloads": -1, "filename": "infer-0.1.zip", "has_sig": false, "md5_digest": "dbb7eea841302ecf562a987608f53b68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23364, "upload_time": "2013-03-22T06:32:27", "url": "https://files.pythonhosted.org/packages/c1/e8/88d216f7abec7d439165bdd49dd5008e51cf885ee47be568784c00b8e1ba/infer-0.1.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dbb7eea841302ecf562a987608f53b68", "sha256": "af81d0f0e40b3db138d496ee4af46d117eef55bff594a890cc808379359ac5db" }, "downloads": -1, "filename": "infer-0.1.zip", "has_sig": false, "md5_digest": "dbb7eea841302ecf562a987608f53b68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23364, "upload_time": "2013-03-22T06:32:27", "url": "https://files.pythonhosted.org/packages/c1/e8/88d216f7abec7d439165bdd49dd5008e51cf885ee47be568784c00b8e1ba/infer-0.1.zip" } ] }