{ "info": { "author": "Omri Mendels", "author_email": "omri.mendels@microsoft.com", "bugtrack_url": null, "classifiers": [], "description": "# Moda\n## Models and evaluation framework for trending topics detection and anomaly detection.\n\n\n\nModa provides an interface for evaluating models on either univariate or multi-category time-series datasets. It further allows the user to add additional models using a scikit-learn style API. All models provided in Moda were adapted to a multi-category scenario using by wrapping a univariate model to run on multiple categories. It further allows the evaluation of models using either a train/test split or a time-series cross validation.\n\n## Installation\n`pip install moda`\n\n## Usage\n\n### Turning a raw dataset into a moda dataset:\nmoda uses a MultiIndex to hold the datestamp and category. All models have been adapted to accept such structure. The input dataset is assumed to have an entry per row and a datestamp column called 'date'. An additional 'category' column is optional.\nAs a first step, the dataset is aggregated to a fixed size time interval, and a new dataset with a 'date','category' (optional) and 'value' columns is created. A MultiIndex of 'date' (pandas DatetimeIndex) and 'category' is the dataset's index.\n\n```python\nimport pandas as pd\nfrom moda.dataprep import raw_to_ts, ts_to_range\n\nDATAPATH = \"example/SF_data/SF311-2008.csv\"\n# The full dataset can be downloaded from here: https://data.sfgov.org/City-Infrastructure/311-Cases/vw6y-z8j6/data\nTIME_RANGE = \"24H\" # Aggregate all events in the raw data into 3 hour intervals\n\n# Read raw file\nraw = pd.read_csv(DATAPATH)\n\n# Turn the raw data into a time series (with date as a pandas DatetimeIndex)\nts = raw_to_ts(raw)\n\n# Aggregate items per time and category, given a time interval\nranged_ts = ts_to_range(ts,time_range=TIME_RANGE)\n```\n\n### Run a model:\n\nRun one model, and extract metrics using a manually labeled set\n```python\nfrom moda.evaluators import get_metrics_for_all_categories, get_final_metrics\nfrom moda.dataprep import read_data\nfrom moda.models import STLTrendinessDetector\n\nmodel = STLTrendinessDetector(freq='24H', \n min_value=10,\n anomaly_type='residual',\n num_of_std=3, lo_delta=0)\n\n# Take the entire time series and evaluate anomalies on all of it or just the last window(s)\nprediction = model.predict(dataset)\nraw_metrics = get_metrics_for_all_categories(dataset[['value']], prediction[['prediction']], dataset[['label']],\n window_size_for_metrics=1)\nmetrics = get_final_metrics(raw_metrics)\n\n## Plot results for each category\nmodel.plot(labels=dataset['label'])\n```\n\n### Model evaluation\n\nExample for a train/test split and evaluation\n```python\nfrom moda.evaluators import get_metrics_for_all_categories, get_final_metrics\nfrom moda.dataprep import read_data\nfrom moda.models import STLTrendinessDetector\n\ndataset = read_data(\"datasets/SF24H_labeled.csv\")\nprint(dataset.head())\n\nmodel = STLTrendinessDetector(freq='24H', \n min_value=10,\n anomaly_type='residual',\n num_of_std=3, lo_delta=0)\n\n# Take the entire time series and evaluate anomalies on all of it or just the last window(s)\nprediction = model.predict(dataset)\nraw_metrics = get_metrics_for_all_categories(dataset[['value']], prediction[['prediction']], dataset[['label']],\n window_size_for_metrics=1)\nmetrics = get_final_metrics(raw_metrics)\nprint('f1 = {}'.format(metrics['f1']))\nprint('precision = {}'.format(metrics['precision']))\nprint('recall = {}'.format(metrics['recall']))\n\n## Plot results for each category\n#model.plot(labels=dataset['label']) \n```\n\n## Examples\nA jupyter notebook with this example can be found [here](example.ipynb).\n\nA more detailed example which includes an exploratory data analysis can be found [here](moda/example/EDA.ipynb)\n\n\n## Models currently included:\n1. Moving average based seasonality decomposition (MA adapted for trendiness detection)\n\nA wrapper on statsmodel's seasonal_decompose. A naive decomposition which uses a moving average to remove the trend, and a convolution filter to detect seasonality. The result is a time series of residuals. In order to detect anomalies and interesting trends in the time series, we look for outliers on the decomposed trend series and the residuals series. Points are considered outliers if their value is higher than a number of standard deviations of the historical values in a previous window. We evaluated different policies for trendiness prediction: 1. residual anomaly only, 2. trend anomaly only, residual OR trend anomaly, residual AND trend anomaly.\nThis is the baseline model, which gives decent results when seasonality is more or less constant.\n\n2. Seasonality and trend decomposition using Loess (Adapted STL)\n\nSTL uses iterative Loess smoothing to obtain an estimate of the trend and then Loess smoothing again to extract a changing additive seasonal component. It can handle any type of seasonality, and the seasonality value can change over time. We used the same anomaly detection mechanism as the moving-average based seasonal decomposition.\nWrapper on (https://github.com/jrmontag/STLDecompose)\nUse this model when trend and seasonality have a more complex pattern. It usually outperforms the moving average model.\n\nExample output plot for STL:\n![STL](https://github.com/omri374/moda/raw/master/figs/STL_example.png)\nThe left hand side shows the origin (top) and decomposed time series (Seasonal, trend, residual)\nThe right hand side shows anomalies found on the residuals time series (top), trend, prediction (combination of residuals and trend anomalies), and ground truth (bottom). \n\n3. Azure anomaly detector\n\nUse the Azure Anomaly Detector cognitive service as a black box for detecting anomalies. Azure Anomaly finder provides an upper bound that can be used to estimate the degree of anomaly. This model is useful when the anomalies have a relatively complex structure\n\n4. Twitter\n\nA wrapper on Twitter's AnomalyDetection package (https://github.com/Marcnuth/AnomalyDetection)\nThis model is similar to (1) and (2), but has a more sophisticated way of detecting the anomalies once the time series is analyzed.\n\n5. LSTMs\n\nTrains a forecasting LSTM model, and compares the prediction value at time t vs. the actual value at time t. Then, estimate the difference by comparison to the standard deviation of previous differences. This is useful only when there exists enough data for representing the time series pattern.\n\nAn example on running LSTMs can be found [here](moda/example/lstm/LSTM_AD.ipynb)\n\n\n## Runing tests and linting\nModa uses pytest for testing. In order to run tests, just call `pytest` from moda's main directory. For linting, this module uses PEP8 conventions.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://www.github.com/omri374/moda", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "moda", "package_url": "https://pypi.org/project/moda/", "platform": "", "project_url": "https://pypi.org/project/moda/", "project_urls": { "Homepage": "https://www.github.com/omri374/moda" }, "release_url": "https://pypi.org/project/moda/0.2.2/", "requires_dist": null, "requires_python": "", "summary": "Tools for analyzing trending topics", "version": "0.2.2" }, "last_serial": 4439849, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "bfe4b5929b01fcaca583a9bd9dc7ab5a", "sha256": "ba5b9a9404eb434e2881167db55d95286ef6fb864cde4f9bdb49c5313e61ee02" }, "downloads": -1, "filename": "moda-0.0.2-py3.6.egg", "has_sig": false, "md5_digest": "bfe4b5929b01fcaca583a9bd9dc7ab5a", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 56258, "upload_time": "2018-09-26T13:44:26", "url": "https://files.pythonhosted.org/packages/5a/45/a06c520dd483f43811f617e2b3b04758b939aee40b8318d63aba4a4090f4/moda-0.0.2-py3.6.egg" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "89e092fd772d38a82a54db6bf2d708cd", "sha256": "fc805f7e1bf837a232dd2d5909bcf2995099f275dabb69c790b59925365699f2" }, "downloads": -1, "filename": "moda-0.0.3-py3.6.egg", "has_sig": false, "md5_digest": "89e092fd772d38a82a54db6bf2d708cd", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 56256, "upload_time": "2018-09-26T13:44:27", "url": "https://files.pythonhosted.org/packages/2f/0f/027c70f2baa37537eeab221ac4b9f19973f2ff83cc6c5129b6fbd2d3f5b7/moda-0.0.3-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "d3148702e346f7454aabec2da1a615e1", "sha256": "65cef9f9275e71bdbf1640f5fee6aee59c8135b83950094cf31ae632a6ed3bad" }, "downloads": -1, "filename": "moda-0.0.3.tar.gz", "has_sig": false, "md5_digest": "d3148702e346f7454aabec2da1a615e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17831, "upload_time": "2018-09-26T13:44:29", "url": "https://files.pythonhosted.org/packages/71/18/415f52b978d6f41de392213a37ddf12c665db5dae859b895b2af27bf0ceb/moda-0.0.3.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "9c7b5af6f2e4f4dfa310b0e3d9b2ec20", "sha256": "8028e067d5ca82bb9b1fc9c8e8a44d6b7ec31fc8b4520f5ae70e471663360da1" }, "downloads": -1, "filename": "moda-0.0.6-py3.6.egg", "has_sig": false, "md5_digest": "9c7b5af6f2e4f4dfa310b0e3d9b2ec20", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 56286, "upload_time": "2018-10-04T10:05:36", "url": "https://files.pythonhosted.org/packages/63/c7/19197a59443c13274448d70707d635f584582285964e0bbe7f01820e5613/moda-0.0.6-py3.6.egg" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "dc0520eb57ed6a73f1aa85ec72f839ea", "sha256": "022a181ab645d1dcb8dc6b6e7189a53f4d20724a45cfc4c9cf612134af05b417" }, "downloads": -1, "filename": "moda-0.0.7-py3.6.egg", "has_sig": false, "md5_digest": "dc0520eb57ed6a73f1aa85ec72f839ea", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 68371, "upload_time": "2018-10-04T12:04:41", "url": "https://files.pythonhosted.org/packages/67/ff/9c88b9a30396ce109d34bec2d63a12e5c2372ed89930f21c4aa08faf7d41/moda-0.0.7-py3.6.egg" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "402e6c4445195c2d7e6d797e71a34814", "sha256": "4e3dcabc341879954b5097b814b0f8d6fc9769e441e2603abe53ce453f002e65" }, "downloads": -1, "filename": "moda-0.1.0-py3.6.egg", "has_sig": false, "md5_digest": "402e6c4445195c2d7e6d797e71a34814", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 67101, "upload_time": "2018-10-07T12:34:07", "url": "https://files.pythonhosted.org/packages/0c/80/d1e52024e4302575ee3167efffcbc6e760bec7396d119f237f4ba97f5bac/moda-0.1.0-py3.6.egg" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "1f158d0c5794991be85bf5396c5e5bed", "sha256": "1424a973f0d75c49c6be2a6782ed25f8ca6fc02c4b334191b7b7de14f8749f0b" }, "downloads": -1, "filename": "moda-0.2.0-py3.6.egg", "has_sig": false, "md5_digest": "1f158d0c5794991be85bf5396c5e5bed", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 32836, "upload_time": "2018-10-31T12:33:51", "url": "https://files.pythonhosted.org/packages/df/a6/1b72ccfdbd59fee79ee7c585e4d89e64c6477ff44a22e7df3ea8ff6ed79e/moda-0.2.0-py3.6.egg" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "84f837b4dd2602b0cddd0f1e81d84dbb", "sha256": "400070d2c5f3c668663280056f220ea0a93784dee15351e5fac8770fbac02148" }, "downloads": -1, "filename": "moda-0.2.1-py3.6.egg", "has_sig": false, "md5_digest": "84f837b4dd2602b0cddd0f1e81d84dbb", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 34892, "upload_time": "2018-10-31T14:49:41", "url": "https://files.pythonhosted.org/packages/d0/71/83445936e7a775b035d101c6dce28e8c3364d5f23f464af5145ea8dcc2ee/moda-0.2.1-py3.6.egg" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "c25e869b4a57b1a768dc1bc0a42c5f00", "sha256": "d13931d7886a73a60969a2196501e5e86b36990830ebfacf960fb99137f8623d" }, "downloads": -1, "filename": "moda-0.2.2-py3.6.egg", "has_sig": false, "md5_digest": "c25e869b4a57b1a768dc1bc0a42c5f00", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 35575, "upload_time": "2018-11-01T08:29:11", "url": "https://files.pythonhosted.org/packages/df/9e/d8aeef32df12c6b81ea84a18ec6e821e82770d52a6f45bf7fc94a0b9a197/moda-0.2.2-py3.6.egg" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c25e869b4a57b1a768dc1bc0a42c5f00", "sha256": "d13931d7886a73a60969a2196501e5e86b36990830ebfacf960fb99137f8623d" }, "downloads": -1, "filename": "moda-0.2.2-py3.6.egg", "has_sig": false, "md5_digest": "c25e869b4a57b1a768dc1bc0a42c5f00", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 35575, "upload_time": "2018-11-01T08:29:11", "url": "https://files.pythonhosted.org/packages/df/9e/d8aeef32df12c6b81ea84a18ec6e821e82770d52a6f45bf7fc94a0b9a197/moda-0.2.2-py3.6.egg" } ] }