{ "info": { "author": "Antti Haapala", "author_email": "antti@haapala.name", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis" ], "description": "Rabbit of Caerbannog\r\n--------------------\r\n\r\n..\r\n\r\n Well, that's no ordinary rabbit - that's the most foul, cruel, and bad-tempered rodent you ever set eyes on!\r\n\r\n -- Tim the Enchanter\r\n\r\nThis module is a high-level interface for the Vowpal Wabbit machine learning system. Currently it relies on \r\nthe `wabbit_wappa` module for lower-level interaction, but strives to provide a more high-level object-oriented interface.\r\n\r\nThere are currently 3 kinds of `Rabbit`s you can `import from` `caerbannog`:\r\n\r\n`Rabbit`\r\n Your standard rabbit instance. By default runs Vowpal Wabbit using pipes for stdin/stdout\r\n\r\n`ActiveRabbit`\r\n Runs Vowpal Wabbit in active learning mode, using TCP socket\r\n\r\n`OfflineRabbit`\r\n The initializer expects the argument `fp` which is an open file with `'wt'` mode.\r\n the inputs fed to `teach` will be written to this file for offline processing.\r\n\r\n\r\nMovie Review Sentiments - Active learning demo with caerbannog\r\n--------------------------------------------------------------\r\n\r\nImport relevant modules here. We are using the ActiveRabbit for active\r\nonline learning\r\n\r\n.. code:: python\r\n\r\n from caerbannog import ActiveRabbit, Example\r\n from itertools import islice\r\n import random\r\n import nltk\r\n from nltk.corpus import movie_reviews\r\n\r\nCreate an active bunny, with active mellowness of 0.01\r\n\r\n.. code:: python\r\n\r\n rabbit = ActiveRabbit(loss_function='logistic', active_mellowness=0.01)\r\n rabbit.start()\r\n\r\nLoad the documents from NLTK movie review corpus (note that you need to\r\ndownload these first by nltk.download(). For each document, make a tuple\r\n(document\\_words, category) where category is either 'pos' or 'neg' and\r\ndocument\\_words is a list of words from tokenizer.\r\n\r\n.. code:: python\r\n\r\n documents = [(list(movie_reviews.words(fileid)), category)\r\n for category in movie_reviews.categories()\r\n for fileid in movie_reviews.fileids(category)]\r\n random.shuffle(documents)\r\n len(documents)\r\n\r\n\r\n\r\n.. parsed-literal::\r\n\r\n 2000\r\n\r\nThe feature extractor function. First filters out all tokens that are\r\nnon-alphanumeric. Then make the 'w' namespace consist of all the words\r\nin the review; 'n' consists of 2-4 ngrams of the document words.\r\n\r\n.. code:: python\r\n\r\n def document_features(document_words):\r\n document_words = list(filter(str.isalnum, document_words))\r\n example = Example()\r\n example['w'].add_features(set(document_words))\r\n ngrams = set()\r\n for j in range(2, 5):\r\n ngrams.update('_'.join(i) for i in nltk.ngrams(document_words, j))\r\n \r\n example['n'].add_features(ngrams)\r\n return example\r\n\r\nVowpal Wabbit expects labels to be -1 and 1 for logistic binary\r\nclassifier\r\n\r\n.. code:: python\r\n\r\n def convert_sent(sent):\r\n return {'pos': 1, 'neg': -1}[sent]\r\n\r\nConvert the sentiment value and extract features.\r\n\r\n.. code:: python\r\n\r\n examples = [ (convert_sent(sent), document_features(doc)) for (doc, sent) in documents ]\r\n\r\nTrain with 1500 first examples and keep the remaining ones for\r\nverification\r\n\r\n.. code:: python\r\n\r\n teach, test = examples[:1500], examples[1500:]\r\n\r\nTeach the filter. We ask for prediction for each example; if the\r\nimportance is over 1 we \"label\" the example and teach it to the\r\nclassifier. We repeat the classification 40 times to ensure that the\r\nclassifier has had enough to adjust the weights.\r\n\r\n.. code:: python\r\n\r\n taught = 0\r\n predicted = 0\r\n labelled = set()\r\n for i in range(10):\r\n for sent, ex in teach:\r\n predicted += 1\r\n if rabbit.predict(example=ex).importance >= 1:\r\n rabbit.teach(label=sent, example=ex)\r\n taught += 1\r\n labelled.add(ex)\r\n \r\n print(\"Predicted {}, taught {} (ratio {}). {} unique inputs labelled\"\r\n .format(predicted, taught, taught/predicted, len(labelled)))\r\n\r\n.. parsed-literal::\r\n\r\n Predicted 15000, taught 1057 (ratio 0.07046666666666666). 1042 unique inputs labelled\r\n\r\nTest with the testing set. For each correctly labelled example, increase\r\nthe counter\r\n\r\n.. code:: python\r\n\r\n correct = 0\r\n for sent, ex in test:\r\n prediction = rabbit.predict(example=ex)\r\n if prediction.label == sent:\r\n correct += 1\r\n \r\n print(\"{} inputs predicted. {} correct; ratio {}\".format(len(test), correct, correct / len(test)))\r\n\r\n.. parsed-literal::\r\n\r\n 500 inputs predicted. 418 correct; ratio 0.836\r\n\r\n\r\nLicense\r\n-------\r\n\r\nMIT license.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "", "license": "Copyright (c) 2014 Antti Haapala\r\n\r\nPermission is hereby granted, free of charge, to any person obtaining a copy\r\nof this software and associated documentation files (the \"Software\"), to deal\r\nin the Software without restriction, including without limitation the rights\r\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\ncopies of the Software, and to permit persons to whom the Software is\r\nfurnished to do so, subject to the following conditions:\r\n\r\nThe above copyright notice and this permission notice shall be included in\r\nall copies or substantial portions of the Software.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\r\nTHE SOFTWARE.", "maintainer": "http://github.com/ztane/caerbannog/", "maintainer_email": "", "name": "caerbannog", "package_url": "https://pypi.org/project/caerbannog/", "platform": "", "project_url": "https://pypi.org/project/caerbannog/", "project_urls": null, "release_url": "https://pypi.org/project/caerbannog/0.1/", "requires_dist": null, "requires_python": null, "summary": "Well, that's no ordinary rabbit.", "version": "0.1" }, "last_serial": 1286974, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "1e1e548b342667297a76465f647a8173", "sha256": "170bf1c10279084d07a02a42bca2daa0d2b416006baf96e73a579545e83a32f3" }, "downloads": -1, "filename": "caerbannog-0.1.tar.gz", "has_sig": false, "md5_digest": "1e1e548b342667297a76465f647a8173", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5127, "upload_time": "2014-10-29T12:40:31", "url": "https://files.pythonhosted.org/packages/99/e7/cc78fb3f5fc92bb7e5a6e83585cda8e3064e757d288e9cc457fc50abf1e0/caerbannog-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1e1e548b342667297a76465f647a8173", "sha256": "170bf1c10279084d07a02a42bca2daa0d2b416006baf96e73a579545e83a32f3" }, "downloads": -1, "filename": "caerbannog-0.1.tar.gz", "has_sig": false, "md5_digest": "1e1e548b342667297a76465f647a8173", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5127, "upload_time": "2014-10-29T12:40:31", "url": "https://files.pythonhosted.org/packages/99/e7/cc78fb3f5fc92bb7e5a6e83585cda8e3064e757d288e9cc457fc50abf1e0/caerbannog-0.1.tar.gz" } ] }