{ "info": { "author": "Huy Do", "author_email": "huydhn@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Certstream + Analytics\n\n[![Build Status](https://travis-ci.org/huydhn/certstream-analytics.svg?branch=master)](https://travis-ci.org/huydhn/certstream-analytics)\n[![codecov.io](https://codecov.io/gh/huydhn/certstream-analytics/master.svg)](http://codecov.io/gh/huydhn/certstream-analytics?branch=master)\n\n\n# Installation\n\nThe package can be installed from\n[PyPI](https://pypi.org/project/certstream-analytics)\n\n```\npip install certstream-analytics\n```\n\n# Usage\n\n```python\nimport time\n\nfrom certstream_analytics.analysers import WordSegmentation\nfrom certstream_analytics.analysers import IDNADecoder\nfrom certstream_analytics.analysers import HomoglyphsDecoder\n\nfrom certstream_analytics.transformers import CertstreamTransformer\nfrom certstream_analytics.storages import ElasticsearchStorage\nfrom certstream_analytics.stream import CertstreamAnalytics\n\ndone = False\n\n# These analysers will be run in the same order\nanalyser = [\n IDNADecoder(),\n HomoglyphsDecoder(),\n WordSegmentation(),\n]\n\n# The following fields are filtered out and indexed:\n# - String: domain\n# - List: SAN\n# - List: Trust chain\n# - Timestamp: Not before\n# - Timestamp: Not after\n# - Timestamp: Seen\ntransformer = CertstreamTransformer()\n\n# Indexed the data in Elasticsearch\nstorage = ElasticsearchStorage(hosts=['localhost:9200'])\n\nconsumer = CertstreamAnalytics(transformer=transformer,\n storage=storage,\n analyser=analyser)\n# The consumer is run in another thread so this function is non-blocking\nconsumer.start()\n\nwhile not done:\n time.sleep(1)\n\nconsumer.stop()\n```\n\n## IDNA decoder\nThis analyser decode IDNA domain name into Unicode for further processing\ndownstream. Normally, it will be the very first analyser to be run. If\nthe analyser encounters a malform IDNA domain string, it will keep the\ndomain as it is.\n\n```python\nfrom certstream_analytics.analysers import IDNADecoder\n\ndecoder = IDNADecoder()\n\n# Just an example dummy record\nrecord = {\n 'all_domains': [\n 'xn--f1ahbgpekke1h.xn--p1ai',\n ]\n}\n\n# The domain name will now become '\u0443\u043a\u0440\u044d\u043c\u043f\u0443\u0436\u0441\u043a.\u0440\u0444'\nprint(decoder.run(record))\n```\n\n## Homoglyphs decoder\nThere are lots of phishing websites that utilize [homoglyphs](https://en.wikipedia.org/wiki/Homoglyph)\nto lure the victims. Some common examples include 'l' and 'i' or the\nUnicode character RHO '\ud835\udf80' and 'p'. The homoglyphs decoder uses the excellent\n[confusable_homoglyphs](https://github.com/vhf/confusable_homoglyphs) to\ngenerate all potential alternative domain names in ASCII.\n\n```python\nfrom certstream_analytics.analysers import HomoglyphsDecoder\n\n# If the greedy flag is set, all alternative domains will be returned\ndecoder = HomoglyphsDecoder(greed=False)\n\n# Just an example dummy record\nrecord = {\n 'all_domains': [\n # MATHEMATICAL MONOSPACE SMALL P\n '*.\ud835\uddc9aypal.com',\n\n # MATHEMATICAL SAN-SERIF BOLD SMALL RHO\n '*.\ud835\uddc9ay\ud835\udf80al.com',\n ]\n}\n\n# The domain name will now be converted to '*.paypal.com' with the ASCII\n# character p\nprint(decoder.run(record))\n```\n\n## Aho-Corasick\nA domain and its SAN from Certstream will be compared against a list of\nmost popular [domains](https://github.com/opendns/public-domain-lists)\n(from OpenDNS) using Aho-Corasick algorithm. This is a simple check to\nremove some of the most obvious phishing domains, for examples, *www.facebook.com.msg40.site*\nwill match with *facebook* cause *facebook* is in the above list of most\npopular domains (I wonder how long it is going to last).\n\n```python\nfrom certstream_analytics.analysers import AhoCorasickDomainMatching\nfrom certstream_analytics.reporter import FileReporter\n\n# Print the list of matching domains\nreporter = FileReporter('matching-results.txt')\n\nwith open('opendns-top-domains.txt')) as fhandle:\n domains = [line.rstrip() for line in fhandle]\n\n# The list of domains to match against\ndomain_matching_analyser = AhoCorasickDomainMatching(domains)\n\nconsumer = CertstreamAnalytics(transformer=transformer,\n analyser=domain_matching_analyser,\n reporter=reporter)\n\n# Need to think about what to do with the matching result\nconsumer.start()\n\nwhile not done:\n time.sleep(1)\n\nconsumer.stop()\n```\n\n## Word segmentation\nIn order to improve the accuracy of the matching algorithm, we segment\nthe domains into English words using\n[wordsegment](https://github.com/grantjenks/python-wordsegment).\n\n```python\nfrom certstream_analytics.analysers import WordSegmentation\n\nwordsegmentation = WordSegmentation()\n\n# Just an example dummy record\nrecord = {\n 'all_domains': [\n 'login-appleid.apple.com.managesupport.co',\n ]\n}\n\n# The returned output is as follows:\n#\n# {\n# 'analyser': 'WordSegmentation',\n# 'output': {\n# 'login-appleid.apple.com.managesuppport.co': [\n# 'login',\n# 'apple',\n# 'id',\n# 'apple',\n# 'com',\n# 'manage',\n# 'support',\n# 'co'\n# ],\n# },\n#\nprint(decoder.run(record))\n```\n\n## Features generator\nA list of features for each domain will also be generated so that they\ncan be used for classification jobs further downstream. The list\nincludes:\n\n- The number of dot-separated fields in the domain, for example, www.google.com has 3.\n- The overall length of the domain in characters.\n- The length of the longest dot-separate field .\n- The length of the TLD, e.g. .online (6) or .download (8) is longer than .com (3).\n- The randomness level of the domain. [Nostril](https://github.com/casics/nostril)\n package is used to check how many words as returned by the WordSegmentation\n analyser are non-sense.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/huydhn/certstream-analytics", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "certstream-analytics", "package_url": "https://pypi.org/project/certstream-analytics/", "platform": "", "project_url": "https://pypi.org/project/certstream-analytics/", "project_urls": { "Homepage": "https://github.com/huydhn/certstream-analytics" }, "release_url": "https://pypi.org/project/certstream-analytics/0.1.5/", "requires_dist": [ "elasticsearch-dsl", "certstream", "pyahocorasick", "tldextract", "wordsegment", "pyenchant", "idna" ], "requires_python": "", "summary": "certstream + analytics", "version": "0.1.5" }, "last_serial": 4656062, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "643e71a6ecfbc2fa9f97682b51f0b496", "sha256": "de21ed2059eb1a6d4d9aaa878c31dfbb1031d657b0a2389ef299512aa3e57928" }, "downloads": -1, "filename": "certstream_analytics-0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "643e71a6ecfbc2fa9f97682b51f0b496", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 7122, "upload_time": "2018-10-11T06:30:25", "url": "https://files.pythonhosted.org/packages/24/af/4002853585edba4541fd253deadc7cd6876ca2577a4a545f9ec332e80946/certstream_analytics-0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "48c6be3f6102b19fa1d218ddc15c8186", "sha256": "cbe698efe0a5dca671f2d57ff3adf0946695ba1bbddfd440d5a9638f20646379" }, "downloads": -1, "filename": "certstream-analytics-0.1.tar.gz", "has_sig": false, "md5_digest": "48c6be3f6102b19fa1d218ddc15c8186", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4700, "upload_time": "2018-10-11T06:30:26", "url": "https://files.pythonhosted.org/packages/e9/52/d9ff3fb1871c8742ce826966b19c766ba519fa9b5327d860e2bc8a084542/certstream-analytics-0.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "706699a75049084c5c2167b020313e1d", "sha256": "8f3f4025e8babbb0d07ca8f574b4a61a6f769f237546164f772baa8899189f25" }, "downloads": -1, "filename": "certstream_analytics-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "706699a75049084c5c2167b020313e1d", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 9668, "upload_time": "2018-10-12T10:08:47", "url": "https://files.pythonhosted.org/packages/4e/0c/64f91f159322ea1ce7303a9e5f22baf32497d156c28479e611c4efd7a88e/certstream_analytics-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9bb973e2b569990b0918b35d58d661ec", "sha256": "bdebfc3ef0d98769af0fea23befb5bf2b8973a7d1190bb99cb375faeb73ff10b" }, "downloads": -1, "filename": "certstream-analytics-0.1.2.tar.gz", "has_sig": false, "md5_digest": "9bb973e2b569990b0918b35d58d661ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6413, "upload_time": "2018-10-12T10:08:49", "url": "https://files.pythonhosted.org/packages/ad/5f/ecc957891ad36db0dd6debe26dfda9d82618060f4dd810f1eb954b6cb81d/certstream-analytics-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "f50199d1b6bd35971096bd64e7f7397e", "sha256": "19eb04bfd6a5bba11c31fe1ba2cf9a7c872dc680fbd00b808fc9928aa6aa40e7" }, "downloads": -1, "filename": "certstream_analytics-0.1.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "f50199d1b6bd35971096bd64e7f7397e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 10166, "upload_time": "2018-10-15T10:09:10", "url": "https://files.pythonhosted.org/packages/b6/b6/e04070e27dde234a96da797687c931292165d69f9c84a6df8c53de8043a9/certstream_analytics-0.1.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "07203bfa2b530708f0f6dc2ee871763f", "sha256": "d2df158443d0a0e0967a36eca54756b54175398cdf51132c78cead09af3d40a1" }, "downloads": -1, "filename": "certstream-analytics-0.1.3.tar.gz", "has_sig": false, "md5_digest": "07203bfa2b530708f0f6dc2ee871763f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6763, "upload_time": "2018-10-15T10:09:13", "url": "https://files.pythonhosted.org/packages/24/9c/f0357781af125e9a284f21cfb471c15cb003643464343aeb8abadf918879/certstream-analytics-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "8d9e836b49a6cc9c2fc00a863c909407", "sha256": "1dbfd09d7eb20cbb8cb30bef82724a81acf37dab0f9465c9350988e04fb085b3" }, "downloads": -1, "filename": "certstream_analytics-0.1.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "8d9e836b49a6cc9c2fc00a863c909407", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 17646, "upload_time": "2018-12-17T10:36:29", "url": "https://files.pythonhosted.org/packages/28/73/8eedd4af46c4ffbcb86724df7d69865a6731c7dbef9fbbf0189ce287f802/certstream_analytics-0.1.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5c4e8768f34b602c5383ff8a63e35c43", "sha256": "8d8fc40b151c29aefe900040d0c195731a3fc4d258f5b0f5dc6eb09cc34fced2" }, "downloads": -1, "filename": "certstream-analytics-0.1.4.tar.gz", "has_sig": false, "md5_digest": "5c4e8768f34b602c5383ff8a63e35c43", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12974, "upload_time": "2018-12-17T10:36:31", "url": "https://files.pythonhosted.org/packages/5c/73/40594ed65cff95b649df441d6653b2a24e55cf3c719e930b1271f6368d92/certstream-analytics-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "4eeded8a82655a6cfcdaeebfb6bb0e4e", "sha256": "63c54e792485e1364aa0b5eaacfec7ecfc203a9440b632c7a9516f6cf8efa551" }, "downloads": -1, "filename": "certstream_analytics-0.1.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "4eeded8a82655a6cfcdaeebfb6bb0e4e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 17760, "upload_time": "2019-01-03T11:21:46", "url": "https://files.pythonhosted.org/packages/25/c7/81b31f4ec7a9b17098787fe5dbecefe06c21e07f563eadc4e7d8524bd114/certstream_analytics-0.1.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3e5c70c13a987d643cfdc9f835cc5a4b", "sha256": "2dd9473b23838a5b783e44e513173b3dc156abf80ff7d56e05fe7f7a223aa6e4" }, "downloads": -1, "filename": "certstream-analytics-0.1.5.tar.gz", "has_sig": false, "md5_digest": "3e5c70c13a987d643cfdc9f835cc5a4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13093, "upload_time": "2019-01-03T11:21:48", "url": "https://files.pythonhosted.org/packages/b1/fc/39754bbd0283f1d422cc79d9a45a9dd3293a2896a7b70d945fb86151c2ad/certstream-analytics-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4eeded8a82655a6cfcdaeebfb6bb0e4e", "sha256": "63c54e792485e1364aa0b5eaacfec7ecfc203a9440b632c7a9516f6cf8efa551" }, "downloads": -1, "filename": "certstream_analytics-0.1.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "4eeded8a82655a6cfcdaeebfb6bb0e4e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 17760, "upload_time": "2019-01-03T11:21:46", "url": "https://files.pythonhosted.org/packages/25/c7/81b31f4ec7a9b17098787fe5dbecefe06c21e07f563eadc4e7d8524bd114/certstream_analytics-0.1.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3e5c70c13a987d643cfdc9f835cc5a4b", "sha256": "2dd9473b23838a5b783e44e513173b3dc156abf80ff7d56e05fe7f7a223aa6e4" }, "downloads": -1, "filename": "certstream-analytics-0.1.5.tar.gz", "has_sig": false, "md5_digest": "3e5c70c13a987d643cfdc9f835cc5a4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13093, "upload_time": "2019-01-03T11:21:48", "url": "https://files.pythonhosted.org/packages/b1/fc/39754bbd0283f1d422cc79d9a45a9dd3293a2896a7b70d945fb86151c2ad/certstream-analytics-0.1.5.tar.gz" } ] }