{ "info": { "author": "Ross Patterson", "author_email": "me@rpatterson.net", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": ".. -*-doctest-*-\n\n=====================\nrpatterson.stripdupes\n=====================\n\nInstallation\n============\n\nAll you need is easy_install_::\n\n $ easy_install rpatterson.stripdupes\n\nUsage\n=====\n\nSee the stripdupes console script's help message.\n\n >>> import subprocess\n >>> popen = subprocess.Popen(\n ... [stripdupes_script, '--help'],\n ... stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n >>> print popen.stdout.read()\n Usage: stripdupes [options]\n Strip duplicated sequences of lines.\n Options:\n -h, --help show this help message and exit\n -m NUM, --min=NUM Minimum length of duplicated sequence. If\n NUM is less than one, use a proportion of the\n total number of lines, otherwise NUM is a\n number of lines. [default: 0.01]\n -p REGEXP, --pattern=REGEXP\n Regular expression pattern used to\n normalize strings in sequences of strings.\n The default matches all whitespace. Use an\n empty string to disable. [default: '\\s+']\n -r STRING, --repl=STRING\n String to replace matches of pattern with\n for normalizing strings in sequences of\n strings. [default: ' ']\n\nWhen given input files whose combined contents include sequences of\nlines longer than the threshold that are duplicated elsewhere in the\ninput files, the output file will be written without those repeated\nsequences.\n\n >>> input = \"\"\"\\\n ... foo\n ... foo\n ... bar\n ... baz\n ... qux\n ... quux\n ... foo\n ... bar\n ... baz\n ... qux\n ... bah\n ... blah1\n ... quux\n ... blah\n ... quux\n ... fin\n ... \"\"\"\n\n >>> import cStringIO\n >>> from rpatterson import stripdupes\n >>> for line in stripdupes.strip(\n ... cStringIO.StringIO(input).readlines()): print line,\n foo\n bar\n baz\n qux\n quux\n bah\n blah1\n blah\n fin\n\n >>> input = \"\"\"\\\n ... blah\n ... quux\n ... bah\n ... foo\n ... foo\\t\n ... bar\n ... baz\n ... qux\n ... quux\n ... foo\n ... bar\n ... baz\n ... qux\n ... fin\n ... fin\n ... fin\n ... null\n ... fin\n ... \"\"\"\n\n >>> for line in stripdupes.strip(\n ... cStringIO.StringIO(input).readlines()): print line,\n blah\n quux\n bah\n foo\n bar\n baz\n qux\n fin\n null\n\nEnsure that odd sequences can be handled.\n\n >>> list(stripdupes.strip([]))\n []\n >>> list(stripdupes.strip(['foo']))\n ['foo']\n\nA duplicated sequence is not stripped if it is 1% or less of the\nlength of the sequence.\n\n >>> seq = range(149)+[0]\n >>> len(seq)\n 150\n >>> seq[0] == seq[149]\n True\n >>> len(list(stripdupes.strip(seq, pattern=None)))\n 150\n\n >>> seq = range(148)+[0]\n >>> len(seq)\n 149\n >>> seq[0] == seq[148]\n True\n >>> len(list(stripdupes.strip(seq, pattern=None)))\n 148\n\n .. _easy_install: http://peak.telecommunity.com/DevCenter/EasyInstall#installing-easy-install\n\n\nChangelog\n=========\n\n0.1 - 2009-05-27\n----------------\n\n* Initial release", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/rpatterson.stripdupes", "keywords": "", "license": "GPL", "maintainer": null, "maintainer_email": null, "name": "rpatterson.stripdupes", "package_url": "https://pypi.org/project/rpatterson.stripdupes/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/rpatterson.stripdupes/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/rpatterson.stripdupes" }, "release_url": "https://pypi.org/project/rpatterson.stripdupes/0.1/", "requires_dist": null, "requires_python": null, "summary": "Strip duplicated sequences of lines from within files", "version": "0.1" }, "last_serial": 799058, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "7aff2d3323800088d519c3bfed82ac76", "sha256": "561004ae1cb2bdd70b1d636890ba7defb7aea64e997eec5162c8b4bd3da1eb62" }, "downloads": -1, "filename": "rpatterson.stripdupes-0.1.tar.gz", "has_sig": false, "md5_digest": "7aff2d3323800088d519c3bfed82ac76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6331, "upload_time": "2009-05-28T02:08:14", "url": "https://files.pythonhosted.org/packages/48/bf/27cd00d8e34e7eeaacd019c2329bb141d3d608ae68a72ffb899f6bddd43d/rpatterson.stripdupes-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7aff2d3323800088d519c3bfed82ac76", "sha256": "561004ae1cb2bdd70b1d636890ba7defb7aea64e997eec5162c8b4bd3da1eb62" }, "downloads": -1, "filename": "rpatterson.stripdupes-0.1.tar.gz", "has_sig": false, "md5_digest": "7aff2d3323800088d519c3bfed82ac76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6331, "upload_time": "2009-05-28T02:08:14", "url": "https://files.pythonhosted.org/packages/48/bf/27cd00d8e34e7eeaacd019c2329bb141d3d608ae68a72ffb899f6bddd43d/rpatterson.stripdupes-0.1.tar.gz" } ] }