{ "info": { "author": "Chris Emmery", "author_email": "cmry@protonmail.com", "bugtrack_url": null, "classifiers": [], "description": "Omesa\n=====\n\n.. image:: https://travis-ci.org/cmry/omesa.svg?branch=master\n :target: https://travis-ci.org/cmry/omesa\n :alt: Travis-CI\n\n.. image:: https://readthedocs.org/projects/omesa/badge/?version=latest\n :target: http://omesa.readthedocs.org/en/latest/?badge=latest\n :alt: Docs\n\n.. image:: https://landscape.io/github/cmry/omesa/master/landscape.svg?style=flat\n :target: https://landscape.io/github/cmry/omesa/master\n :alt: Landscape\n\n.. image:: https://img.shields.io/badge/license-GPLv3-blue.svg\n :alt: GPLv3\n\n.. image:: https://img.shields.io/badge/python-3.5-blue.svg\n :alt: Python 3.4\n\n.. _scikit-learn: http://scikit-learn.org/stable/\n.. _readthedocs: http://omesa.readthedocs.org/\n\nA small framework for reproducible Text Mining research that largely builds\non top of scikit-learn_. Its goal is to make common research procedures fully\nautomated, optimized, and well recorded. To this end it features:\n\n- Exhaustive search over best features, pipeline options, to classifier optimization.\n- Flexible wrappers to plug in your tools and features of choice.\n- Completely sparse pipeline through hashing - from data to feature space.\n- Record of all settings and fitted parts of the entire experiment, promoting reproducibility.\n- Dump an easily deployable version of the final model for plug-and-play demos.\n\nRead the documentation at readthedocs_.\n\n.. image:: http://www.cmry.nl/dump/shed.png\n :alt: Pipeline\n\nImportant Note\n''''''''''''''\n\nThis repository is currently in alpha development, so don't expect any stable\nfunctionality until this part is removed. The `dev` branch will usually have the\nlatest (not always stable) version.\n\nFront-end Preview\n'''''''''''''''''''\n\n.. _dev: https://github.com/cmry/omesa/tree/dev\n.. _lime: https://github.com/marcotcr/lime\n\nIn 'front' a web front-end is being developed that uses a standalone\ndatabase for storing models. This provides visualization and comparison of\nmodel performance. Some extra dependencies are introduced, such as ``bottle``,\n``blitzdb``, ``plotly`` and lime_. Currently only the 'Results' section works,\npreview below:\n\n.. image:: http://www.cmry.nl/dump/omesa.png\n :alt: Front\n\n.. image:: http://www.cmry.nl/dump/omesa_prop.png\n :alt: Front Prop\n\nIf you want to take a peek, install all above dependencies, do the following:\n\n.. code-block:: shell\n\n $ cd /dir/to/omesa/examples\n $ python3 n_gram.py\n $ cd ../front\n $ python3 ./app.wsgi\n\nAnd follow the ``localhost`` link that is shown to access the web app. Please\nnote that this part can be quite unstable. Bug reports are welcome.\n\n\nDependencies\n''''''''''''\n\n.. _Frog: https://languagemachines.github.io/frog/\n.. _LaMachine: https://proycon.github.io/LaMachine/\n.. _spaCy: https://spacy.io/\n\nOmesa currently heavily relies on ``numpy``, ``scipy`` and ``sklearn``. To use\nFrog_ as a Dutch back-end, we strongly recommend using LaMachine_. For\nEnglish, there is a spaCy_ wrapper available.\n\nOmesa Only - End-To-End In 2 Minutes\n------------------------------------\n\nWith the end-to-end ``Experiment`` pipeline and a configuration dictionary,\nseveral experiments or set-ups can be ran and evaluated with a very minimal\npiece of code. One of the test examples provided is that of n-gram\nclassification of Wikipedia documents. In this experiment, we are provided with\na toy set n_gram.csv that features 10 articles about Machine Learning, and 10\nrandom other articles. To run the experiment, the following configuration is used:\n\nExample\n'''''''\n\n.. _`n-gram classification`: https://github.com/cmry/omesa/blob/master/examples/n_gram.py\n.. _`n_gram.csv`: https://github.com/cmry/omesa/blob/master/examples/n_gram.csv\n\nWith the end-to-end ``Experiment`` pipeline and a configuration dictionary,\nseveral experiments or set-ups can be ran and evaluated with a very minimal\npiece of code. One of the test examples provided is that of `n-gram classification`_\nof Wikipedia documents. In this experiment, we are provided with a toy set\n`n_gram.csv`_ that features 10 articles about Machine Learning, and 10 random\nother articles. To run the experiment, the following configuration is used:\n\n.. code-block:: python\n\n from omesa.experiment import Experiment\n from omesa.featurizer import Ngrams\n from omesa.containers import CSV\n from sklearn.naive_bayes import MultinomialNB\n\n Experiment(\n project=\"unit_tests\",\n name=\"gram_experiment\",\n train_data=CSV(\"n_gram.csv\", data=\"gram\", label=\"label\"),\n lime_data=CSV(\"n_gram.csv\", data=\"gram\", label=\"label\"),\n features=[Ngrams(level='char', n_list=[3])],\n classifiers=[\n {'clf': MultinomialNB()}\n ],\n \"save\": (\"log\")\n )\n\nThis will cross validate performance on the ``.csv``, selecting text\nand label columns and indicating a header is present in the ``.csv`` document.\nWe provide the ``Ngrams`` function and parameters to be used as features, and\nstore the log.\n\nOutput\n''''''\n\nThe log file will be printed during run time, as well as stored in the\nscript's directory. A sample from the output of the current experiment is as\nfollows:\n\n.. code-block:: shell\n\n ---- Omesa ----\n\n Config:\n\n feature: char_ngram\n n_list: [3]\n\n \tname: gram_experiment\n \tseed: 42\n\n Sparse train shape: (20, 1301)\n\n Performance on test set:\n\n precision recall f1-score support\n\n DF 0.83 0.50 0.62 10\n ML 0.64 0.90 0.75 10\n\n avg / total 0.74 0.70 0.69 20\n\n\n Experiment took 0.2 seconds\n\n ----------\n\nAdding own Features\n-------------------\n\nHere's an example of the most minimum word frequency feature class:\n\n.. code-block:: python\n\n class SomeFeaturizer(object):\n\n def __init__(self, some_params):\n \"\"\"Set parameters for SomeFeaturizer.\"\"\"\n self.name = 'hookname'\n self.some_params = some_params\n\n def transform(self, raw, parse):\n \"\"\"Return a dictionary of feature values.\"\"\"\n return Counter([x for x in raw])\n\nThis returns a ``{word: frequency}`` dict per instance that can easily be\ntransformed into a sparse matrix.\n\nAcknowledgements\n----------------\n\n.. _AMiCA: http://www.amicaproject.be/\n\nPart of the work on Omesa was carried out in the context of the\nAMiCA_ (IWT SBO-project 120007) project, funded by the government agency for\nInnovation by Science and Technology (IWT).\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cmry/omesa", "keywords": null, "license": "GPLv3", "maintainer": null, "maintainer_email": null, "name": "omesa", "package_url": "https://pypi.org/project/omesa/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/omesa/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/cmry/omesa" }, "release_url": "https://pypi.org/project/omesa/0.4.2a0/", "requires_dist": null, "requires_python": null, "summary": "A framework for reproducible machine learning research", "version": "0.4.2a0" }, "last_serial": 2505025, "releases": { "0.2.1a0": [], "0.2.2a0": [], "0.2.6a0": [], "0.2.7a0": [ { "comment_text": "", "digests": { "md5": "98751f284350d2261bd0fbde6f1d018f", "sha256": "6bf37b6c2320a43a753cd112861d90290602060fbd9deca626fe165c3780a992" }, "downloads": -1, "filename": "omesa-0.2.7a0.tar.gz", "has_sig": false, "md5_digest": "98751f284350d2261bd0fbde6f1d018f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18820, "upload_time": "2016-02-29T16:01:09", "url": "https://files.pythonhosted.org/packages/6c/37/4511ecc0e2590a32804ac8248182f7a3b3e191d5f27de0e43ff4285e98e9/omesa-0.2.7a0.tar.gz" } ], "0.2.8a0": [ { "comment_text": "", "digests": { "md5": "8054ae2c81e1ce7e533a6ea6d7594a61", "sha256": "183dfd4c002903aac57ccba4cb5fa88fff802e689f5b1805d7eb89e48074fc46" }, "downloads": -1, "filename": "omesa-0.2.8a0.tar.gz", "has_sig": false, "md5_digest": "8054ae2c81e1ce7e533a6ea6d7594a61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18821, "upload_time": "2016-02-29T16:02:44", "url": "https://files.pythonhosted.org/packages/bf/ab/2bbf374cb89ad136961a8e1e7adb1c913c002730a443e1cbbcf6450040f7/omesa-0.2.8a0.tar.gz" } ], "0.2.9a0": [ { "comment_text": "", "digests": { "md5": "981c864564fc1aa8e26e11d61ee33a32", "sha256": "682453b1312e7d3427b11f0116daa23e4e5a56bba9c1087dead1f2bb322a29d3" }, "downloads": -1, "filename": "omesa-0.2.9a0-py3-none-any.whl", "has_sig": false, "md5_digest": "981c864564fc1aa8e26e11d61ee33a32", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 53442, "upload_time": "2016-02-29T17:05:07", "url": "https://files.pythonhosted.org/packages/19/02/28caecbe1c7cb9db9aaf2d47c565884483eca55e38e49cd7d9ff5b36d2df/omesa-0.2.9a0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "89d5e4f891aa95db525eac9c4849146e", "sha256": "4bc910c07d62a34aa4554f386a9229b7124a1d036984901f343ab37dab93c3c5" }, "downloads": -1, "filename": "omesa-0.2.9a0.tar.gz", "has_sig": false, "md5_digest": "89d5e4f891aa95db525eac9c4849146e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18606, "upload_time": "2016-02-29T17:05:19", "url": "https://files.pythonhosted.org/packages/76/3c/68515dcacffc2a56d0c8790d35c1acd118645975e78df70e2cc3f916690b/omesa-0.2.9a0.tar.gz" } ], "0.3.0a0": [ { "comment_text": "", "digests": { "md5": "bafc471d91e94a0825fc6a31bca6782f", "sha256": "9d45831e0f68dec65a73899e7ca28bcd4cf6fa19abb447e45e62384bd3fec754" }, "downloads": -1, "filename": "omesa-0.3.0a0.tar.gz", "has_sig": false, "md5_digest": "bafc471d91e94a0825fc6a31bca6782f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28957, "upload_time": "2016-06-02T12:43:02", "url": "https://files.pythonhosted.org/packages/bf/d7/d411eb85e176a4f44f98a30b1f530ae1c906bf9ddf65e2e45a253a10a3a2/omesa-0.3.0a0.tar.gz" } ], "0.3.1a0": [ { "comment_text": "", "digests": { "md5": "40ffdc716a47a7ce0f54b0017ee4b0d4", "sha256": "296e65e696158b0ae6446fec3883c00dc86f05f710771767eff246aaca109c51" }, "downloads": -1, "filename": "omesa-0.3.1a0.tar.gz", "has_sig": false, "md5_digest": "40ffdc716a47a7ce0f54b0017ee4b0d4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28949, "upload_time": "2016-06-02T21:06:32", "url": "https://files.pythonhosted.org/packages/be/c2/45b5643f5292f8a3a2a01d772afde2624f45c4ca53f7b557427673d81934/omesa-0.3.1a0.tar.gz" } ], "0.3.2a0": [ { "comment_text": "", "digests": { "md5": "d2b02d5cfa020b1b3739da418fbccd8c", "sha256": "8725a6e7a4e3d3d5b646422c47232a2c8abc6d932e10d7094791f31be7218755" }, "downloads": -1, "filename": "omesa-0.3.2a0.tar.gz", "has_sig": false, "md5_digest": "d2b02d5cfa020b1b3739da418fbccd8c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29631, "upload_time": "2016-06-14T16:58:06", "url": "https://files.pythonhosted.org/packages/e2/b5/b4440dbcaa6e68fbd1144e867654588e60e865f01621292868f986d06897/omesa-0.3.2a0.tar.gz" } ], "0.4.0a0": [ { "comment_text": "", "digests": { "md5": "e738d0b722ab572a9a47e74931a8c865", "sha256": "b70982c74d3afb7ee429f2863b3f81f511265ff25d7b75c4857143ed898a70d0" }, "downloads": -1, "filename": "omesa-0.4.0a0.tar.gz", "has_sig": false, "md5_digest": "e738d0b722ab572a9a47e74931a8c865", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29751, "upload_time": "2016-12-06T17:29:02", "url": "https://files.pythonhosted.org/packages/61/24/eda5dd069750d8ded7c6187348a9a76c055c9b755d0e9c8396d7e5b26a53/omesa-0.4.0a0.tar.gz" } ], "0.4.1a0": [ { "comment_text": "", "digests": { "md5": "d5bede91711260a8275c26bec3540ce3", "sha256": "f07ded6c58b520cff70c2147784576e6e224eb225a80a53f52f30cebe1d9ed41" }, "downloads": -1, "filename": "omesa-0.4.1a0.tar.gz", "has_sig": false, "md5_digest": "d5bede91711260a8275c26bec3540ce3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29776, "upload_time": "2016-12-07T16:21:20", "url": "https://files.pythonhosted.org/packages/8f/d3/3f4f6a6d8ef53e8924e0754343c3dfee702f22ff895c0c75b695899d4b1e/omesa-0.4.1a0.tar.gz" } ], "0.4.2a0": [ { "comment_text": "", "digests": { "md5": "489befab5dd83307cc55a982dd0ad2c8", "sha256": "41e3a016473621a041c20697c789ee6de557f55a629de64961cf2b9cbc3b3bf1" }, "downloads": -1, "filename": "omesa-0.4.2a0.tar.gz", "has_sig": false, "md5_digest": "489befab5dd83307cc55a982dd0ad2c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29776, "upload_time": "2016-12-07T16:24:31", "url": "https://files.pythonhosted.org/packages/a7/ec/0c64a752a14e5b2e4277d78c5b49a4da6022ae56b70a33e101add66b1bd6/omesa-0.4.2a0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "489befab5dd83307cc55a982dd0ad2c8", "sha256": "41e3a016473621a041c20697c789ee6de557f55a629de64961cf2b9cbc3b3bf1" }, "downloads": -1, "filename": "omesa-0.4.2a0.tar.gz", "has_sig": false, "md5_digest": "489befab5dd83307cc55a982dd0ad2c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29776, "upload_time": "2016-12-07T16:24:31", "url": "https://files.pythonhosted.org/packages/a7/ec/0c64a752a14e5b2e4277d78c5b49a4da6022ae56b70a33e101add66b1bd6/omesa-0.4.2a0.tar.gz" } ] }