{ "info": { "author": "Nacho Navarro", "author_email": "nachonavarroasv@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3" ], "description": "# Anomaly Detection: Seasonal ESD\n\nNote: All credit goes to Jordan Hochenbaum, Owen S. Vallis and Arun Kejariwa at Twitter, Inc. Any errors in the code are, of course, my mistake. Feel free to fix them.\n\n## Intro\nSeasonal ESD is an anomaly detection algorithm implemented at Twitter https://arxiv.org/pdf/1704.07706.pdf. What better definition than the one they use in their paper:\n\n> \"we developed two novel statistical techniques\n> for automatically detecting anomalies in cloud infrastructure\n> data. Specifically, the techniques employ statistical learning\n> to detect anomalies in both application, and system metrics.\n> Seasonal decomposition is employed to filter the trend and\n> seasonal components of the time series, followed by the use\n> of robust statistical metrics \u2013 median and median absolute\n> deviation (MAD) \u2013 to accurately detect anomalies, even in\n> the presence of seasonal spikes.\"\n\n## Installation\n\nTo install `sesd`, use pip:\n\n```python\npip install sesd\n```\n\n\n### Explanation\nThe algorithm uses the Extreme Studentized Deviate test to calculate the anomalies. In fact, the novelty doesn't come\nin the fact that ESD is used, but rather on _what_ it is tested.\n\nThe problem with the ESD test on its own is that it assumes a normal data distribution, while real world data can have a multimodal distribution. To circumvent this, STL decomposition is used. Any time series can be decomposed with STL decomposition into a seasonal, trend, and residual component. The key is that the residual has a unimodal distribution that ESD can test. \n\nHowever, there is still the problem that extreme, spurious anomalies can corrupt the residual component. To fix it, the paper proposes to use the median to represent the \"stable\" trend, instead of the trend found by means of STL decomposition.\n\nFinally, for data sets that have a high percentage of anomalies, the research papers proposes to use the median and Median Absolute Deviate (MAD) instead of the mean and standard deviation to compute the zscore. Using MAD enables a more consistent measure of central tendency of a time series with a high percentage of anomalies.\n\n---\n\n## Usage\n\n```python\nimport numpy as np\nimport sesd\nts = np.random.random(100)\n# Introduce artificial anomalies\nts[14] = 9\nts[83] = 10\noutliers_indices = sesd.seasonal_esd(ts, hybrid=True, max_anomalies=2)\nfor idx in outliers_indices:\n print \"Anomaly index: {0}, anomaly value: {1}\".format(idx, ts[idx])\n\n>>> Anomaly index: 83, anomaly value: 10.0\n>>> Anomaly index: 14, anomaly value: 9.0\n```\n\n--- \n\n## Documentation\n\n\n* `seasonal_esd(seasonality=None, hybrid=False, max_anomalies=10, alpha=0.05)`: Computes the Seasonal Extreme Studentized Deviate of a time series. The steps taken are first to to decompose the time series into STL decomposition (trend, seasonality, residual). Then, calculate the Median Absolute Deviate (MAD) if hybrid (otherwise the median) and perform a regular ESD test on the residual, which we calculate as: `R = ts - seasonality - MAD or median.\n\n * Arguments\n\n * `ts`: The time series to compute the SESD.\n * `seasonality`: The statsmodel library requires a seasonality to compute the STL decomposition If none is given, then it will automatically be calculated to be 20% of the total time series.\n * `hybrid`: See Twitter\u2019s research paper for the difference.\n max_anomalies: The number of times the Grubbs\u2019 Test will be applied to the time series.\n * `alpha`: the significance level.\n\n * Returns\n\n * The indices of the anomalies in the time series.\n\n* `esd(timeseries, max_anomalies=10, alpha=0.05, hybrid=False)`: Computes the Extreme Studentized Deviate of a time series. A Grubbs Test is performed max_anomalies times with the caveat that each time the top value is removed. For more details visit http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm\n\n * Arguments\n\n * `ts`: The time series to compute the ESD.\n max_anomalies: The number of times the Grubbs\u2019 Test will be applied to the time series.\n * `alpha`: the significance level.\n * `hybrid`: If set to false then the mean and standard deviation will be used to calculate the zscores in the Grubbs test. If set to true, then median and MAD will be used.\n\n * Returns\n\n * The indices of the anomalies in the time series.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/nachonavarro/seasonal-esd-anomaly-detection", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "sesd", "package_url": "https://pypi.org/project/sesd/", "platform": "", "project_url": "https://pypi.org/project/sesd/", "project_urls": { "Homepage": "https://github.com/nachonavarro/seasonal-esd-anomaly-detection" }, "release_url": "https://pypi.org/project/sesd/0.1.6/", "requires_dist": [ "numpy", "scipy", "statsmodels" ], "requires_python": "", "summary": "Anomaly detection algorithm implemented at Twitter", "version": "0.1.6" }, "last_serial": 5228536, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "ee2b11b687e6688b76bbac5748a67b48", "sha256": "783d081bbb099b4d42fb6aa62aa3ed82bcdf87438dcdc1b40b2a388969aae9e3" }, "downloads": -1, "filename": "sesd-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ee2b11b687e6688b76bbac5748a67b48", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16176, "upload_time": "2018-12-16T14:00:19", "url": "https://files.pythonhosted.org/packages/dc/d8/9403d9c8240cf86ff266249dd2b9789e5f524191e4491f802e70a6db841c/sesd-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8398b5dae520db6254dfeb4554dcc328", "sha256": "3971eb8cfabfb382138d9c6c1e3821d1ecb7da2ebe42df1c6369c57e464a3102" }, "downloads": -1, "filename": "sesd-0.1.0.tar.gz", "has_sig": false, "md5_digest": "8398b5dae520db6254dfeb4554dcc328", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3647, "upload_time": "2018-12-16T14:00:23", "url": "https://files.pythonhosted.org/packages/64/f8/98f1e101e4bc9456c433870b946cf3cfd5b7df612dc7cb370fbc38735cc1/sesd-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "ee0d78fe85b3ce11856eb24a4ab1a55e", "sha256": "19f7cfd3fdc2ed07f9a7fd37ab9c44f0d87cf7b88754606493f509c5976cfdf7" }, "downloads": -1, "filename": "sesd-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "ee0d78fe85b3ce11856eb24a4ab1a55e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16199, "upload_time": "2018-12-16T14:06:43", "url": "https://files.pythonhosted.org/packages/72/3e/423e0504ae4586163c322e824fc4df84ad56b665e96351b6868b9618732c/sesd-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "05745b8d21500aced45a8e43ad9652fb", "sha256": "13c118605f64a848023bbb7398985be175f2292edab79f7ed159d10e068fd20d" }, "downloads": -1, "filename": "sesd-0.1.1.tar.gz", "has_sig": false, "md5_digest": "05745b8d21500aced45a8e43ad9652fb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3699, "upload_time": "2018-12-16T14:06:45", "url": "https://files.pythonhosted.org/packages/0d/53/c51b79f1820333cc4cac93eddef1a0840457a709514608f0eddb9fa99ff1/sesd-0.1.1.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "16fb996258a1b2bb5be2e88f584b140e", "sha256": "a7072e70f021a730fdc05a54dc9339c38489ee07ea367c8f95ca99447f647480" }, "downloads": -1, "filename": "sesd-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "16fb996258a1b2bb5be2e88f584b140e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16201, "upload_time": "2018-12-16T14:32:53", "url": "https://files.pythonhosted.org/packages/d1/92/b88cedbab4f66442de7adfa03c2a83967721b7cdb48ace8ff94ffc5b9649/sesd-0.1.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "151351aef970abad764dbc2499f59037", "sha256": "23b950e06e38aa97b7eb822313375f20602b8f343e7ff4b4ba345b5c883af641" }, "downloads": -1, "filename": "sesd-0.1.4.tar.gz", "has_sig": false, "md5_digest": "151351aef970abad764dbc2499f59037", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3709, "upload_time": "2018-12-16T14:32:54", "url": "https://files.pythonhosted.org/packages/69/06/18a3abb21d1928f1f172c83f4fb00d744173e59ecd54fcb23ca0486f9934/sesd-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "27339809de3e1d8a50f88b37d9020884", "sha256": "99134533bf431943a77c3ea2a8422257a2015894a39350786a7ecd60c62d1d96" }, "downloads": -1, "filename": "sesd-0.1.5-py3-none-any.whl", "has_sig": false, "md5_digest": "27339809de3e1d8a50f88b37d9020884", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16710, "upload_time": "2018-12-16T15:08:25", "url": "https://files.pythonhosted.org/packages/53/9b/6bca572a89bd09d8806e2e39c051a9fff18ae7c1c1a8755c3c83f3ee2ca4/sesd-0.1.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "077d2e339d02edc5ff4b66bb57eb526e", "sha256": "3a9330859c4915247a048580eca1085edc04fecfdf035132151d12ba6acf6600" }, "downloads": -1, "filename": "sesd-0.1.5.tar.gz", "has_sig": false, "md5_digest": "077d2e339d02edc5ff4b66bb57eb526e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3957, "upload_time": "2018-12-16T15:08:26", "url": "https://files.pythonhosted.org/packages/ca/0b/bf13e4d492ec2c45e09eaffa6a73c1f3ddab6e848b73794c492598a2ecd7/sesd-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "5b6322e0b6b29ceacf7ae0f7563899fb", "sha256": "8ed483afc8ca1a998fd4dd1a2263d03171b270664da063414f8be2f8075fcde6" }, "downloads": -1, "filename": "sesd-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "5b6322e0b6b29ceacf7ae0f7563899fb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16825, "upload_time": "2019-05-05T13:59:30", "url": "https://files.pythonhosted.org/packages/43/57/4ab45bc3fc7f5c08ce756ad37af960ce7932170e3f741b6f065162a6d7e4/sesd-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0a51478110aa1d3fa98a8458c336749d", "sha256": "41374b618f072d6859f90fa0f6cf4960b5a059fa2fdbc2b35949aa64987dabb3" }, "downloads": -1, "filename": "sesd-0.1.6.tar.gz", "has_sig": false, "md5_digest": "0a51478110aa1d3fa98a8458c336749d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4148, "upload_time": "2019-05-05T13:59:32", "url": "https://files.pythonhosted.org/packages/2b/1b/a6adc3362e23d7ad87d0c4c2380f0921fa71e49f356c38fe966b95d7ffcc/sesd-0.1.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5b6322e0b6b29ceacf7ae0f7563899fb", "sha256": "8ed483afc8ca1a998fd4dd1a2263d03171b270664da063414f8be2f8075fcde6" }, "downloads": -1, "filename": "sesd-0.1.6-py3-none-any.whl", "has_sig": false, "md5_digest": "5b6322e0b6b29ceacf7ae0f7563899fb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16825, "upload_time": "2019-05-05T13:59:30", "url": "https://files.pythonhosted.org/packages/43/57/4ab45bc3fc7f5c08ce756ad37af960ce7932170e3f741b6f065162a6d7e4/sesd-0.1.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0a51478110aa1d3fa98a8458c336749d", "sha256": "41374b618f072d6859f90fa0f6cf4960b5a059fa2fdbc2b35949aa64987dabb3" }, "downloads": -1, "filename": "sesd-0.1.6.tar.gz", "has_sig": false, "md5_digest": "0a51478110aa1d3fa98a8458c336749d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4148, "upload_time": "2019-05-05T13:59:32", "url": "https://files.pythonhosted.org/packages/2b/1b/a6adc3362e23d7ad87d0c4c2380f0921fa71e49f356c38fe966b95d7ffcc/sesd-0.1.6.tar.gz" } ] }