{ "info": { "author": "Giacomo Berardi", "author_email": "barnets@gmail.com", "bugtrack_url": null, "classifiers": [ "Environment :: Console", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing :: Linguistic" ], "description": "ShallowLearn\n============\nA collection of supervised learning models based on shallow neural network approaches (e.g., word2vec and fastText)\nwith some additional exclusive features.\nWritten in Python and fully compatible with `scikit-learn `_.\n\n**Discussion group** for users and developers: https://groups.google.com/d/forum/shallowlearn\n\n.. image:: https://travis-ci.org/giacbrd/ShallowLearn.svg?branch=master\n :target: https://travis-ci.org/giacbrd/ShallowLearn\n.. image:: https://badge.fury.io/py/shallowlearn.svg\n :target: https://badge.fury.io/py/shallowlearn\n\nGetting Started\n---------------\nInstall the latest version:\n\n.. code:: shell\n\n pip install cython\n pip install shallowlearn\n\nImport models from ``shallowlearn.models``, they implement the standard methods for supervised learning in scikit-learn,\ne.g., ``fit(X, y)``, ``predict(X)``, ``predict_proba(X)``, etc.\n\nData is raw text, each sample in the iterable ``X`` is a list of tokens (words of a document), \nwhile each element in the iterable ``y`` (corresponding to an element in ``X``) can be a single label or a list in case\nof a multi-label training set. Obviously, ``y`` must be of the same size of ``X``.\n\nModels\n------\n\nGensimFastText\n~~~~~~~~~~~~~~\n**Choose this model if your goal is classification with fastText!** (it is going to be the most stable and rich feature-wise)\n\nA supervised learning model based on the fastText algorithm [1]_.\nThe code is mostly taken and rewritten from `Gensim `_,\nit takes advantage of its optimizations (e.g. Cython) and support.\n\nIt is possible to choose the Softmax loss function (default) or one of its two \"approximations\":\nHierarchical Softmax and Negative Sampling. \n\nThe parameter ``bucket`` configures the feature hashing space, i.e., the *hashing trick* described in [1]_.\nUsing the hashing trick together with ``partial_fit(X, y)`` yields a powerful *online* text classifier (see `Online learning`_).\n\nIt is possible to load pre-trained word vectors at initialization,\npassing a Gensim ``Word2Vec`` or a ShallowLearn ``LabeledWord2Vec`` instance (the latter is retrievable from a\n``GensimFastText`` model by the attribute ``classifier``).\nWith method ``fit_embeddings(X)`` it is possible to pre-train word vectors, using the current parameter values of the model.\n\nConstructor argument names are a mix between the ones of Gensim and the ones of fastText (see this `class docstring `_).\n\n.. code:: python\n\n >>> from shallowlearn.models import GensimFastText\n >>> clf = GensimFastText(size=100, min_count=0, loss='hs', iter=3, seed=66)\n >>> clf.fit([('i', 'am', 'tall'), ('you', 'are', 'fat')], ['yes', 'no'])\n >>> clf.predict([('tall', 'am', 'i')])\n ['yes']\n\nFastText\n~~~~~~~~\nThe supervised algorithm of fastText implemented in `fastText.py `_ ,\nwhich exposes an interface on the original C++ code.\nThe current advantages of this class over ``GensimFastText`` are the *subwords* and the *n-gram features* implemented\nvia the *hashing trick*.\nThe constructor arguments are equivalent to the original `supervised model\n`_, except for ``input_file``, ``output`` and\n``label_prefix``.\n\n**WARNING**: The only way of loading datasets in fastText.py is through the filesystem (as of version 0.8.2),\nso data passed to ``fit(X, y)`` will be written in temporary files on disk.\n\n.. code:: python\n\n >>> from shallowlearn.models import FastText\n >>> clf = FastText(dim=100, min_count=0, loss='hs', epoch=3, bucket=5, word_ngrams=2)\n >>> clf.fit([('i', 'am', 'tall'), ('you', 'are', 'fat')], ['yes', 'no'])\n >>> clf.predict([('tall', 'am', 'i')])\n ['yes']\n\nDeepInverseRegression\n~~~~~~~~~~~~~~~~~~~~~\n*TODO*: Based on https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.score\n\nDeepAveragingNetworks\n~~~~~~~~~~~~~~~~~~~~~\n*TODO*: Based on https://github.com/miyyer/dan\n\nExclusive Features\n------------------\nNext cool features will be listed as Issues in Github, for now:\n\nPersistence\n~~~~~~~~~~~\nAny model can be serialized and de-serialized with the two methods ``save`` and ``load``.\nThey overload the `SaveLoad `_ interface of Gensim,\nso it is possible to control the cost on disk usage of the models, instead of simply *pickling* the objects.\nThe original interface also allows to use compression on the serialization outputs.\n\n``save`` may create multiple files with names prefixed by the name given to the serialized model.\n\n.. code:: python\n\n >>> from shallowlearn.models import GensimFastText\n >>> clf = GensimFastText(size=100, min_count=0, loss='hs', iter=3, seed=66)\n >>> clf.save('./model')\n >>> loaded = GensimFastText.load('./model') # it also creates ./model.CLF\n\nBenchmarks\n----------\n\nText classification\n~~~~~~~~~~~~~~~~~~~\n\nThe script ``scripts/document_classification_20newsgroups.py`` refers to this\n`scikit-learn example `_\nin which text classifiers are compared on a reference dataset;\nwe added our models to the comparison.\n**The current results, even if still preliminary, are comparable with other\napproaches, achieving the best performance in speed**.\n\nResults as of release `0.0.5 `_,\nwith *chi2_select* option set to 80%.\nThe times take into account of *tf-idf* vectorization in the \u201cclassic\u201d classifiers, and the I/O operations for the\ntraining of fastText.py.\nThe evaluation measure is *macro F1*.\n\n.. image:: https://cdn.rawgit.com/giacbrd/ShallowLearn/master/images/benchmark.svg\n :alt: Text classifiers comparison\n :align: center\n :width: 888 px\n\nOnline learning\n~~~~~~~~~~~~~~~\n\nThe script ``scripts/plot_out_of_core_classification.py`` computes a benchmark on some scikit-learn classifiers which are able to\nlearn incrementally,\na batch of examples at a time.\nThese classifiers can learn online by using the scikit-learn method ``partial_fit(X, y)``.\nThe `original example `_\ndescribes the approach through feature hashing, which we set with parameter ``bucket``.\n\n**The results are decent but there is room for improvement**.\nWe configure our classifier with ``iter=1, size=100, alpha=0.1, sample=0, min_count=0``, so to keep the model fast and\nsmall, and to not cut off words from the few samples we have.\n\n.. image:: https://cdn.rawgit.com/giacbrd/ShallowLearn/master/images/onlinelearning.svg\n :alt: Online learning\n :align: center\n :width: 888 px\n\nReferences\n----------\n.. [1] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/giacbrd/ShallowLearn", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "ShallowLearn", "package_url": "https://pypi.org/project/ShallowLearn/", "platform": "", "project_url": "https://pypi.org/project/ShallowLearn/", "project_urls": { "Homepage": "https://github.com/giacbrd/ShallowLearn" }, "release_url": "https://pypi.org/project/ShallowLearn/0.0.5/", "requires_dist": null, "requires_python": "", "summary": "A collection of supervised learning models based on shallow neural network approaches (e.g., word2vec and fastText) with some additional exclusive features", "version": "0.0.5" }, "last_serial": 2546364, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "a8235ac70cf55e66e19d642719571dc2", "sha256": "52aac29c6d6d57858f66035f73ed065b0743b837b6f5d63d09508f06fd340fa0" }, "downloads": -1, "filename": "ShallowLearn-0.0.2-5.tar.gz", "has_sig": false, "md5_digest": "a8235ac70cf55e66e19d642719571dc2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 69986, "upload_time": "2016-10-14T12:45:28", "url": "https://files.pythonhosted.org/packages/20/41/da65f30862b13a78837dc234b27c0a745e097399f3225d0cdcd866d066fb/ShallowLearn-0.0.2-5.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "ad2094a81fef4cbce96b4fe26909784e", "sha256": "c60f778a47e6f66763bc00f57261d512dacf95a29d3035c51e1f1a0b8a97dbc6" }, "downloads": -1, "filename": "ShallowLearn-0.0.3.tar.gz", "has_sig": false, "md5_digest": "ad2094a81fef4cbce96b4fe26909784e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 83320, "upload_time": "2016-10-27T23:45:58", "url": "https://files.pythonhosted.org/packages/0b/82/691a73b09fa1e4e44769c20b2b44773fa02dc4acfc2e6dfb1e1ac7b89923/ShallowLearn-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "8ccb858e43b4bcdd5f3c8fcf364b63ae", "sha256": "ee1a583bda743bca2c0e82ff732626fe2068a457c634ae00e1c30e0145933e41" }, "downloads": -1, "filename": "ShallowLearn-0.0.4.tar.gz", "has_sig": false, "md5_digest": "8ccb858e43b4bcdd5f3c8fcf364b63ae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 85529, "upload_time": "2016-11-06T17:01:57", "url": "https://files.pythonhosted.org/packages/cd/1d/39d0c58e782885412674e024645c1b9f15469095a2f650d284ffd6ddd6a8/ShallowLearn-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "4a9fbc8dbeb1f7183f9db949c1aad24e", "sha256": "b1ee718b47741be0d6e06d991ba4d33a93710f94af39483cddb4f20a3dac67ef" }, "downloads": -1, "filename": "ShallowLearn-0.0.5.tar.gz", "has_sig": false, "md5_digest": "4a9fbc8dbeb1f7183f9db949c1aad24e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93363, "upload_time": "2016-12-30T17:01:48", "url": "https://files.pythonhosted.org/packages/4b/95/053eef29979e514db4652e00dfac384ff80f7515a67e21a91cfb33c9cc2a/ShallowLearn-0.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4a9fbc8dbeb1f7183f9db949c1aad24e", "sha256": "b1ee718b47741be0d6e06d991ba4d33a93710f94af39483cddb4f20a3dac67ef" }, "downloads": -1, "filename": "ShallowLearn-0.0.5.tar.gz", "has_sig": false, "md5_digest": "4a9fbc8dbeb1f7183f9db949c1aad24e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 93363, "upload_time": "2016-12-30T17:01:48", "url": "https://files.pythonhosted.org/packages/4b/95/053eef29979e514db4652e00dfac384ff80f7515a67e21a91cfb33c9cc2a/ShallowLearn-0.0.5.tar.gz" } ] }