{ "info": { "author": "Grier Forensics", "author_email": "jdgrier@grierforensics.com", "bugtrack_url": null, "classifiers": [], "description": "# OfficeDissector\n\nOfficeDissector is a parser library for static security analysis of Office Open XML (OOXML) Documents,\ncreated by Grier Forensics for the Cyber System Assessments Group at MIT's Lincoln Laboratory.\n\nOfficeDissector is the first parser designed specifically for security analysis of OOXML documents. It exposes all internals, including \ndocument properties, parts, content-type, relationships, embedded macros and multimedia, and comments, and more. \nIt provides full JSON export, and a MASTIFF based plugin architecture. It also includes a nearly 600 MB test corpus, unit tests with nearly \n100% coverage, smoke tests running against the entire corpus, and simple, well factored, fully commented code \n\n## Install\n\nOfficeDissector requires Python 2.7 and the lxml package.\n\nThe easiest way to install OfficeDissector is to use pip to automatically download and install it:\n\n $ sudo pip install lxml # If you haven't installed lxml already\n $ sudo pip install officedissector\n\nAlternatively, you can download OfficeDissector from [github](https://github.com/grierforensics/officedissector/) or as a [zip](https://github.com/grierforensics/officedissector/archive/master.zip), and install your local copy, using either pip (recommended) or python setup:\n\n $ sudo pip install /path/to/thisfolder # Recommended, as pip supports uninstall\n $ sudo python setup.py install # Alternative\n\nFinally, to use OfficeDissector without installing it, download it and set the `PYTHONPATH` to the `officedissector` directory:\n\n $ export PYTHONPATH=/path/to/thisfolder\n\n## Documentation\n\nTo view OfficeDissector documentation, open in a browser:\n\n $ doc/html/index.html\n\n## Testing\n\nTo test, first set PYTHONPATH or install `officedissector` as described above. Then:\n\n # Unit tests\n $ cd test/unit_test\n $ python test_officedissector.py\n\n # Smoke tests\n $ cd test\n $ python smoke_tests.py\n\nThe smoke tests will create log files with more information about them.\n\n## MASTIFF Plugins\n\nTo find more information about the MASTIFF architecture and sample plugins, see\n`mastiff-plugins/README.txt`.\n\n## Usage\n\nBelow is an ipython session demonstrating usage of OfficeDissector:\n\n $ ipython\n In [1]: import officedissector\n In [2]: doc = officedissector.doc.Document('test/fraunhoferlibrary/Artikel.docx')\n In [4]: doc.is_macro_enabled\n Out[4]: False\n\n In [5]: doc.is_template\n Out[5]: False\n\n In [6]: mp = doc.main_part()\n In [7]: mp.content_type()\n Out[7]: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml'\n\n In [9]: mp.name\n Out[9]: '/word/document.xml'\n\n In [10]: mp.content_type()\n Out[10]: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml'\n\n # We can read the part's stream of data:\n In [17]: mp.stream().read(200)\n Out[17]: '\\r\\n