{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python" ], "description": "# threatrack_iocextract.py\n\nExtracts IOCs (and other patterns) from text.\n\n## How do I use it?\n\n### Install\n\n- from `https://pypi.org` via `pip`:\n\n```\npip install threatrack_iocextract\n```\n\n- via `setup.py`:\n\n```\nsudo python setup.py install\n```\n\n### Usage\n\n```python\nimport threatrack_iocextract\n\ntext = 'hxxp://bad[.]com/ https://evil.com/foobar?a=1 asfdas\\nsadfasf bob at example dot com'\n\n# extract IOCs\niocs1 = threatrack_iocextract.extract(text)\n# iocs1 is dict like `{ioctype1 : [ioc1, ioc2, ...], ioctype2 : [ioc3, ...], ...}`\nprint(iocs1['url'][0]) # prints https://evil.com/foobar?a=1\nprint(iocs1['hostname'][0]) # prints evil.com\n\n# extract defangled IOCs\niocs2 = threatrack_iocextract.extract_all(text)\nprint(iocs2['url'][0]) # prints http://bad.com/\nprint(iocs2['hostname'][0]) # prints bad.com\nprint(iocs2['email'][0]) # prints bob@example.com\n\n```\n\n## How does this work?\n\n### `threatrack_iocextract.py`\n\nThe main python program. The workflow is:\n\n1. `load_patterns()`: Read, expand and configure search patterns (automatically called on import).\n2. `refang(text)`: Turns defanged `text` into \"refanged\" text, e.g. 'hxxp://bad[.]com/' becomes 'http://bad.com/'.\n3. `extract(text)`: Extract IOCs from `text`. Returns a dict like `{ioctype1 : [ioc1, ioc2, ...], ioctype2 : [ioc3, ...], ...}` (see [How do I use it?])\n4. `extract_all(text)`: Like `extract()` but will also extract defanged IOCs.\n\n### `patterns/`\n\nContains the search pattern configuration.\n\n#### `patterns/patterns.csv`\n\nA tab separated list of search patterns.\nThis is loaded by `load_patterns()`.\nThe format is **tab separated**:\n\n```\nioctype\tregexpattern\tregexoptions\n```\n\nIt **must always have 3 columns**.\n\nFor example:\n\n```\nsha1\t\\b[a-f0-9]{40}\\b\ti\n```\n\nwould search for SHA1 hashes between word boundaries.\n\nPossible options are:\n\n- `i` (= re.I = case-insensitive)\n- `s` (= re.S = dot all)\n\nAll other `regexoptions` are ignored. `regexoptions` **must** be set (even if just an empty character)!\n\n**Expansions** allow you to reuse other sources in patterns. There are two types\nof expansions:\n\n- `%%file:name.csv%%`\n- `%%pattern:name%%`\n\n**file:** expansions allow to include file contents in patterns.\n`name.csv` is a file with single column list of patterns. Any pattern containing\nthis expansion will replace `%%file:name.csv%%` with a regex trying to match each\nline of `name.csv`.\n\nFor example, if `name.csv` contains:\n\n```\nfoo\nbar\nshoot\n```\n\nThe pattern `test%%file:name.csv%%?blah' will be expanded to `test(?:foo|bar|shoot)?blah'.\nSee `patterns/{schemes,tlds}.csv` for how this can be useful.\n\n**pattern:** expansions allow to reuse **previous** patterns.\n\nFor example, if `patterns.csv` contains:\n\n```\n.port\t6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]|[0-9]\t.\nipv4\t(?:(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})\\.){3}(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})(?:\\:%%pattern:.port%%)?\t.\n```\n\n`ipv4` would be expanded to:\n\n```\n(?:(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})\\.){3}(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})(?:\\:(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]|[0-9]))?\n```\n\n**Names starting with `.` (dot) are private.** They are not searched directly\nbut can be used in expansions.\n\n### Gotchas\n\n- `tlds.csv` and `schemes.csv` must be sorted longest patterns first to ensure longest match. Otherwise `.co` would match before `.com`, etc.\n\n\n\n## Are there any issues?\n\nYes. Unfortunately, I can only list the known issues.\n\n### Known issues\n\n#### Unconditional refangle\n\n`extract_all(text)` will refang the text. This could potentially lead to altered\nIOCs, e.g. `I am at home.To do so ...` would be altered to `I am@home.To do so ...`\nand thus lead to the email IOC `am@home.to`.\n\nOther possibilities are that IOCs get altered, e.g. `http://example[.]com/foo[dot]bar/`\nwould refangle to `http://example.com/foo.bar/`.\n\nUnfortunately, it is very hard to fix this. **Suggestions are welcome.**\n\n**Possible mitigation:** Add IOC specific refangs. So, e.g., only when ectracting email IOCs\nthe `s/at/@/` refang gets applied.\n\n## TODO\n\n- Fix IPv6 pattern, it overlaps with MAC addresses\n- Fix Hash and Bitcoin overlap\n- Increase whitelist\n- Fix MAC pattern to not match on fingerprints\n- Fix extract_all() refanging breaks YARA extraction. Need to find a smart solution to extracting defanged IOCs. :(\n- Add IOC sepcific refangs\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/threatrack/threatrack_iocextract", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "threatrack-iocextract", "package_url": "https://pypi.org/project/threatrack-iocextract/", "platform": "", "project_url": "https://pypi.org/project/threatrack-iocextract/", "project_urls": { "Homepage": "https://github.com/threatrack/threatrack_iocextract" }, "release_url": "https://pypi.org/project/threatrack-iocextract/0.0.9/", "requires_dist": null, "requires_python": "", "summary": "IOC extractor", "version": "0.0.9" }, "last_serial": 5938145, "releases": { "0.0.9": [ { "comment_text": "", "digests": { "md5": "2ca3541f4b9554ca49a8a6e5b0319fcc", "sha256": "a49cbcbf56685dcb12b00e1f7bed932922f9c891dbc4759aa29351b23be1d872" }, "downloads": -1, "filename": "threatrack_iocextract-0.0.9-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2ca3541f4b9554ca49a8a6e5b0319fcc", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14637, "upload_time": "2019-10-07T10:49:44", "url": "https://files.pythonhosted.org/packages/c6/67/5f5c753e7265dbd7a33fd86edec74ca33648a7d59671244499ff42a205c7/threatrack_iocextract-0.0.9-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4245710055cfc5fc271cecc15172cb59", "sha256": "8701560fac7d405f1f88826552aaa7be2346e773427d0788e304bb95006319bb" }, "downloads": -1, "filename": "threatrack_iocextract-0.0.9.tar.gz", "has_sig": false, "md5_digest": "4245710055cfc5fc271cecc15172cb59", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14847, "upload_time": "2019-10-07T10:49:46", "url": "https://files.pythonhosted.org/packages/ba/2f/5eb1ed5978c944061e7e0118864df92f7d61f5c0252a512c3deaa934d885/threatrack_iocextract-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2ca3541f4b9554ca49a8a6e5b0319fcc", "sha256": "a49cbcbf56685dcb12b00e1f7bed932922f9c891dbc4759aa29351b23be1d872" }, "downloads": -1, "filename": "threatrack_iocextract-0.0.9-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "2ca3541f4b9554ca49a8a6e5b0319fcc", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14637, "upload_time": "2019-10-07T10:49:44", "url": "https://files.pythonhosted.org/packages/c6/67/5f5c753e7265dbd7a33fd86edec74ca33648a7d59671244499ff42a205c7/threatrack_iocextract-0.0.9-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4245710055cfc5fc271cecc15172cb59", "sha256": "8701560fac7d405f1f88826552aaa7be2346e773427d0788e304bb95006319bb" }, "downloads": -1, "filename": "threatrack_iocextract-0.0.9.tar.gz", "has_sig": false, "md5_digest": "4245710055cfc5fc271cecc15172cb59", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14847, "upload_time": "2019-10-07T10:49:46", "url": "https://files.pythonhosted.org/packages/ba/2f/5eb1ed5978c944061e7e0118864df92f7d61f5c0252a512c3deaa934d885/threatrack_iocextract-0.0.9.tar.gz" } ] }