{ "info": { "author": "Ted Tibbetts", "author_email": "intuited@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Topic :: Office/Business", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Software Development :: Version Control", "Topic :: System :: Archiving", "Topic :: Utilities" ], "description": "``odfit``\n===========\n\nProduces textual information based on the contents of\nOpenDocument Format files and other archives.\n\nThe information produced includes metadata like\nthe names, uncompressed size, and modification times of archive members.\n\nA checksum is calculated, and basic filetype detection performed on each.\n\nFor each member determined to contain UTF-8 text,\nthe lines in the file are output.\n\n``odfit`` is similar in purpose but different in operation,\nand in its level of detail,\nto tools like `odt2txt`_ which output a text-mode\nrendering of the archive as a document.\n\nThis means that it presents a much more complete set of data,\nincluding embedded macros, formatting, and other details.\n\nInformation on binary files is limited to metadata\nand hashes, including a SHA1 checksum.\n\nXML files will be pretty-printed\nso as to make their contents more readily ``diff``-able.\n\n.. _odt2txt: http://stosberg.net/odt2txt/\n\n\nSETUP\n-----\n\nThe ``odfit`` script can make do with\nthe standard library's complement of modules.\n\nIt will attempt to make use of the `lxml`_ package\nfor XML pretty-printing,\nbut will fall back to standard library modules if necessary.\n\nSo it should be possible to use ``odfit`` with any Python 2.6 or later,\nwithout requiring installation.\n\nThis means that the executable module file ``odfit.py``\ncan be distributed with source repos with some reliability.\n\nPythons older than 2.6 may also work,\nbut the script has not been tested with them.\n\nIf ``lxml`` is available, XML processing will be faster (> 3x)\nand more robust when dealing with incorrect or incomplete XML.\n\nIn addition to running the script ``odfit.py`` directly,\nthe module can be installed from source\nor via `PyPI`_.\nThis will cause an equivalent script to be installed system-wide,\nenabling the command ``odfit`` on the system path.\n\n.. _lxml: http://pypi.python.org/pypi/lxml\n.. _PyPI: http://pypi.python.org/pypi/odfit\n\n\nUSAGE\n-----\n\n``odfit`` is a command-line tool.\n\nTo get a listing of the members of an archive, run e.g. ::\n\n $ odfit some_file.odt\n\nFor a typical OpenOffice document, this will produce many lines of data.\n\n``odfit`` is mostly intended for use with version control systems like `git`_,\nin order to identify changes between versions of OpenDocument Format files.\n\nFor example, with git 1.6.1 or later,\nsetting up a repo to use ``odfit`` to generate ``git diff`` listings\ncan be accomplished by\n\n- adding lines to ``.git/config`` or equivalent\n to set the ``textconv`` option for the ``odf`` driver name::\n\n [diff \"odf\"]\n textconv=odfit -D\n\n- associating the various ODF filetypes with the ``odf`` driver name\n by adding lines like the following\n to a ``.gitattributes`` file in the repo working tree\n or in ``.git/info/attributes``::\n\n *.odt diff=odf\n *.ods diff=odf\n *.odp diff=odf\n *.odb diff=odf\n\nThe ``-D`` option to ``odfdump`` instructs it\nto omit the timestamp of each archive member.\nOpenOffice seems to reset this timestamp for all members\nwhenever it saves a new version of a document.\nBecause of this, this piece of data is not meaningful\nand shouldn't normally be displayed as part of a diff.\nFor this reason, I expect that ``-D`` should normally be passed\nwhen generating a dump for the purposes of diffing.\n\nMore info (based on using `odt2txt`_ instead of ``odfit``)\nis available from the `git wiki`_.\n\n``odfit`` is also suitable for use with other pkzip-formatted archives.\nSee the note about `filetype detection`_ in `BUGS, ISSUES, and WARNINGS`_.\n\n.. _git: http://git-scm.com/\n.. _git wiki: https://git.wiki.kernel.org/index.php/GitTips#Instructions_for_Git_1.6.1_or_later\n\n\nOUTPUT FORMAT\n-------------\n\nMetadata Header\n^^^^^^^^^^^^^^^\n\nOutput for each contained file will consist of a series of header lines,\neach prefixed with the member filename and a colon followed by two spaces.\nFor example::\n\n path/to/member/file: date_time: 2010-10-12T21:32:24\n path/to/member/file: file_size: 42\n path/to/member/file: CRC: 4144865272\n\nEach line contains a name/value pair delimited by an additional colon and space.\n\nThe metadata included in the header is that stored in the archive file itself,\nplus the exception of the ``sha1`` element,\nwhich is calculated by ``odfit`` from the archive member's data.\n\nNot all member attributes are dumped.\nThe attributes which are dumped if nonempty for a given archive member are:\n\n- timestamp: ``date_time``\n- ``comment``\n- ``extra``\n- ``file_size``\n- CRC-32 checksum: ``CRC``\n\nThe timestamp will only be output if the ``-D`` option\nhas not been passed on the command line.\n\nAfter these attributes, two more header lines will be output,\ncontaining the member's SHA-1 hash and the detected filetype.\nThe detected filetype will be either\n``binary``, ``utf-8`` (which subsumes ASCII), or ``unknown``.\n\n\nContent section\n^^^^^^^^^^^^^^^\n\nAfter the header, printable files (those whose filetype is ``utf-8``)\nwill have their content dumped.\n\nThe output for the content section\nis similar to that used for the header section.\n\nRather than name/value pairs, lines of the file are output.\n\nThe delimiter between filename and content is two colons and a space.\n\nFor example::\n\n path/to/member/file:: The first line of the file\n path/to/member/file:: The second line of the file\n path/to/member/file:: The third line of the file\n\nA file is assumed to be printable if it is not detected as binary.\nSee `filetype detection`_ for more information.\n\nXML Files\n^^^^^^^^^\n\nThe content of files whose names end in ``.xml``\nwill be passed through an XML tidying routine before being dumped.\n\nThis is intended to make the output more ``diff``-able\nby splitting long lines containing multiple elements.\n\nThe presence of files containing poorly-formed XML\nmay result in errors because of this.\nThis should not be a problem for documents created with OpenOffice.\n\n\nBUGS, ISSUES, and WARNINGS\n--------------------------\n\n- Binary _`filetype detection` uses the same stupid algorithm as ``git diff``:\n scanning for nulls within the first 8000 bytes of the file.\n This works well enough with ASCII or UTF-8 text,\n but will falsely detect as binary files in encodings such as UTF-16.\n\n- There are no plans to guarantee consistency between versions.\n So a dump created with an older version of ``odfit``\n shouldn't be compared with a dump created with a newer version.\n Even with the same version of ``odfit``, dumps of the same document\n may differ because of different dependencies:\n ``odfit`` will use different XML packages depending on\n what is locally installed, and the pretty-printed output of XML files\n will vary depending on the XML package used.\n The `lxml`_ package will be used if it is locally available.\n\n- The output format is intended only for reading and comparison purposes.\n It is not intended to be a reversable translation of the original,\n even for archives which contain only text files.\n In particular, filenames containing ':' characters\n are not properly escaped.\n There will be additional ambiguities in dumps from files\n containing members with identical filenames.\n\n- I haven't yet checked to see if the order of members\n of an odf file is stable.\n ``odfit`` currently does not sort the members before outputting them:\n member output is done in archive file order.\n This means that comparison of dumps of equivalent documents\n may end up showing significant differences.\n\n- There are many performance optimizations which could be put in place,\n particularly if the script were to be reworked as a diff routine.\n Having knowledge of both of the compared documents would allow\n ``odfit`` to, for example, only output XML for files which have changed.\n It's also very unlikely that a SHA-1 comparison\n will detect changes that a CRC-32 comparison does not.\n\n- ``odfit`` is not tested with Python versions other than 2.6.\n This means that there's little guarantee\n that it will work on any particular system,\n since 2.5 and even 2.4 are not uncommon in the field.\n\nLICENSE\n-------\n\n``odfit`` is copyright 2010 by Ted Tibbetts\nand is licensed under the FreeBSD license.\n\nSee the file COPYING for details.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/intuited/odfit", "keywords": "xml,opendocument,odt,ods,odb,odp,openoffice,archive,zip,dump,diff,compare", "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "odfit", "package_url": "https://pypi.org/project/odfit/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/odfit/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/intuited/odfit" }, "release_url": "https://pypi.org/project/odfit/0.6.2/", "requires_dist": null, "requires_python": null, "summary": "Produces textual information based on the contents of OpenDocument Format files and other archives.", "version": "0.6.2" }, "last_serial": 795670, "releases": { "0.6": [ { "comment_text": "", "digests": { "md5": "a1261a6ba1f98e530512359ac63e36b4", "sha256": "6b1e015af7db4afe060aeed82a8d79c3318a3b3b9de56787d7b373c2444cf2f6" }, "downloads": -1, "filename": "odfit-0.6.tar.gz", "has_sig": false, "md5_digest": "a1261a6ba1f98e530512359ac63e36b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10146, "upload_time": "2010-10-17T08:00:30", "url": "https://files.pythonhosted.org/packages/f1/d6/4a15c564e77092e7a82f75b45213d84100ec0f9fffa159ac292fde50572a/odfit-0.6.tar.gz" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "0a3c37cf33d3d256ad7a8b00edee8278", "sha256": "1032dc7059ee04f891a5d6b9abe6dd7adeb460bd6fae7224c02199d6797ef47b" }, "downloads": -1, "filename": "odfit-0.6.1.tar.gz", "has_sig": false, "md5_digest": "0a3c37cf33d3d256ad7a8b00edee8278", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10439, "upload_time": "2010-10-21T06:37:24", "url": "https://files.pythonhosted.org/packages/11/26/f79f9eaa839ada3c96c9abc8914388bab6c08831a4dc28e3322225e40e9b/odfit-0.6.1.tar.gz" } ], "0.6.2": [ { "comment_text": "", "digests": { "md5": "da6b073948839810994f8f9a9a85fb88", "sha256": "7ce03dd2d5c4072c0b6e3fae04ec47f7e651ad70e3abbadaebf0e11f1da13da6" }, "downloads": -1, "filename": "odfit-0.6.2.tar.gz", "has_sig": false, "md5_digest": "da6b073948839810994f8f9a9a85fb88", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10549, "upload_time": "2010-10-27T10:54:43", "url": "https://files.pythonhosted.org/packages/9d/37/da65595b90fc291835d9a57439db88d881940d77e787bdf6cbc94592eeb1/odfit-0.6.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "da6b073948839810994f8f9a9a85fb88", "sha256": "7ce03dd2d5c4072c0b6e3fae04ec47f7e651ad70e3abbadaebf0e11f1da13da6" }, "downloads": -1, "filename": "odfit-0.6.2.tar.gz", "has_sig": false, "md5_digest": "da6b073948839810994f8f9a9a85fb88", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10549, "upload_time": "2010-10-27T10:54:43", "url": "https://files.pythonhosted.org/packages/9d/37/da65595b90fc291835d9a57439db88d881940d77e787bdf6cbc94592eeb1/odfit-0.6.2.tar.gz" } ] }