{ "info": { "author": "Joshua Tauberer", "author_email": "jt@occams.info", "bugtrack_url": null, "classifiers": [], "description": "xml_diff\n========\n\nCompares the text inside two XML documents and marks up the differences with ```` and ```` tags.\n\nThis is the result of about 7 years of trying to get this right and coded simply. I've used code like this in one form or another to compare bill text on GovTrack.us .\n\nThe comparison is completely blind to the structure of the two XML documents. It does a word-by-word comparison on the text content only, and then it goes back into the original documents and wraps changed text in new ```` and ```` wrapper elements.\n\nThe documents are then concatenated to form a new document and the new document is printed on standard output. Or use this as a library and call ``compare`` yourself with two ``lxml.etree.Element`` nodes (the roots of your documents).\n\nThe script is written in Python 3.\n\nExample\n-------\n\nComparing these two documents::\n\n\t\n\t\tHere is some bold text.\n\t\n\nand::\n\n\t\n\t\tHere is some italic content that shows how xml_diff works.\n\t\t\n\nYields::\n\n\t\n\t\t\n\t\t\tHere is some bold text.\n\t\t\n\t\t\n\t\t\tHere is some italic content that shows how xml_diff works.\n\t\t\n\t\n\nOn Ubuntu, get dependencies with::\n\n\tapt-get install python3-lxml libxml2-dev libxslt1-dev\n\nFor really fast comparisons, get Google's Diff Match Patch library , as re-written and sped-up by @leutloff and then turned into a Python extension module by me ::\n\n\tpip3 install diff_match_patch_python\n\nOr if you can't install that for any reason, use the pure-Python library::\n\n\tpip3 install diff-match-patch\n\nThis is also at . xml_diff will use whichever is installed.\n\nFinally, install this module::\n\n\tpip3 install xml_diff\n\nThen call the module from the command line::\n\n\tpython3 -m xml_diff --tags del,ins doc1.xml doc2.xml > changes.xml\n\nOr use the module from Python::\n\n\timport lxml.etree\n\tfrom xml_diff import compare\n\n\tdom1 = lxml.etree.parse(\"doc1.xml\").getroot()\n\tdom2 = lxml.etree.parse(\"doc2.xml\").getroot()\n\tcomparison = compare(dom1, dom2)\n\nThe two DOMs are modified in-place.\n\nOptional Arguments\n------------------\n\nThe ``compare`` function takes other optional keyword arguments:\n\n``merge`` is a boolean (default false) that indicates whether the comparison function should perform a merge. If true, ``dom1`` will contain not just ```` nodes but also ```` nodes and, similarly, ``dom2`` will contain not just ```` nodes but also ```` nodes. Although the two DOMs will now contain the same semantic information about changes, and the same text content, each preserves their original structure --- since the comparison is only over text and not structure. The new ``ins``/``del`` nodes contain content from the other document (including whole subtrees), and so there's no guarantee that the final documents will conform to any particular structural schema after this operation.\n\n``word_separator_regex`` (default ``r\"\\s+|[^\\s\\w]\"``) is a regular expression for how to separate words. The default splits on one or more spaces in a row and single instances of non-word characters.\n\n``differ`` is a function that takes two arguments ``(text1, text2)`` and returns an iterator over difference operations given as tuples of the form ``(operation, text_length)``, where ``operation`` is one of ``\"=\"`` (no change in text), ``\"+\"`` (text inserted into ``text2``), or ``\"-\"`` (text deleted from ``text1``). (See xml_diff/__init__.py's ``default_differ`` function for how the default differ works.)\n\n``tags`` is a two-tuple of tag names to use for deleted and inserted content. The default is ``('del', 'ins')``.\n\n``make_tag_func`` is a function that takes one argument, which is either ``\"ins\"`` or ``\"del\"``, and returns a new ``lxml.etree.Element`` to be inserted into the DOM to wrap changed content. If given, the ``tags`` argument is ignored.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/joshdata/xml_diff", "keywords": "compare diff XML", "license": "CC0 (copyright waived)", "maintainer": null, "maintainer_email": null, "name": "xml_diff", "package_url": "https://pypi.org/project/xml_diff/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/xml_diff/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/joshdata/xml_diff" }, "release_url": "https://pypi.org/project/xml_diff/0.7.0/", "requires_dist": null, "requires_python": null, "summary": "Compares two XML documents by diffing their text, ignoring structure, and wraps changed text in / tags.", "version": "0.7.0" }, "last_serial": 2051751, "releases": { "0.5.0": [ { "comment_text": "", "digests": { "md5": "02f2032b4e982f9c83e0e8ebdb301bc8", "sha256": "a24908770b168591de49a1d9293490749653ba1fd2dfc543a5a1462585957174" }, "downloads": -1, "filename": "xml_diff-0.5.0.tar.gz", "has_sig": false, "md5_digest": "02f2032b4e982f9c83e0e8ebdb301bc8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8492, "upload_time": "2014-06-26T12:22:59", "url": "https://files.pythonhosted.org/packages/6b/2a/b2e9c90bf0796b5353ef6131b74a22217d2dc62c43f8cc4d2a12e571c1c1/xml_diff-0.5.0.tar.gz" } ], "0.6.0": [ { "comment_text": "", "digests": { "md5": "c6074daadf8075c2e07960a07eafe1a0", "sha256": "77ec0433425998b47dc021a1f8f577fbc44170691534d3cb7e3747196ef191c3" }, "downloads": -1, "filename": "xml_diff-0.6.0.tar.gz", "has_sig": false, "md5_digest": "c6074daadf8075c2e07960a07eafe1a0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11026, "upload_time": "2014-08-13T21:04:57", "url": "https://files.pythonhosted.org/packages/b4/8f/2fabe092a447c1530bab0749a3cf49697d3a012cd87a46dd016437dabcdf/xml_diff-0.6.0.tar.gz" } ], "0.7.0": [ { "comment_text": "", "digests": { "md5": "248cc2b59b3c8636cfbd4bbadd0b2657", "sha256": "5555f796c9afe60d479a1d84c90a32a264cd6a5355eab4269b1e5b57b18b34a5" }, "downloads": -1, "filename": "xml_diff-0.7.0.tar.gz", "has_sig": false, "md5_digest": "248cc2b59b3c8636cfbd4bbadd0b2657", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13651, "upload_time": "2016-04-07T16:25:57", "url": "https://files.pythonhosted.org/packages/15/6a/0d6f5a41dcdd342d66a1007a33781f796c0ed512252c602949b71a7ba62b/xml_diff-0.7.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "248cc2b59b3c8636cfbd4bbadd0b2657", "sha256": "5555f796c9afe60d479a1d84c90a32a264cd6a5355eab4269b1e5b57b18b34a5" }, "downloads": -1, "filename": "xml_diff-0.7.0.tar.gz", "has_sig": false, "md5_digest": "248cc2b59b3c8636cfbd4bbadd0b2657", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13651, "upload_time": "2016-04-07T16:25:57", "url": "https://files.pythonhosted.org/packages/15/6a/0d6f5a41dcdd342d66a1007a33781f796c0ed512252c602949b71a7ba62b/xml_diff-0.7.0.tar.gz" } ] }