{ "info": { "author": "Noufal Ibrahim", "author_email": "noufal@nibrahim.net.in", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License (GPL)", "Operating System :: MacOS :: MacOS X", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Programming Language :: Python", "Topic :: Utilities" ], "description": "Lines\r\n=====\r\n\r\nIntroduction\r\n------------\r\n\r\nLines is a simple program that allows you to manipulate lines of a\r\nfile as if they were members of a set. It also provides a few other\r\nuseful functions to analyse such files.\r\n\r\nSet operations\r\n--------------\r\nGiven two files \r\n\r\n`file1` containing\r\n\r\n a\r\n b\r\n c\r\n d\r\n\r\nand `file2` with\r\n \r\n c\r\n d\r\n e\r\n f\r\n\r\nIt's possible to do things like\r\n\r\n* *Unions*\r\n\r\n`lines -u file1 file2`\r\n\r\ngives\r\n\r\n a\r\n b\r\n c\r\n d\r\n e\r\n f\r\n\r\n* *Intersections*\r\n\r\n`lines -i file1 file2`\r\n\r\ngives\r\n \r\n c\r\n d\r\n \r\n* *Difference* (All elements in `file1` that are not in `file2`).\r\n\r\n`lines -d file1 file2`\r\n\r\ngives\r\n\r\n a\r\n b\r\n\r\n* *Symmetric difference* (All elements present in only one of the\r\n sets). \r\n\r\n`lines -s file1 file2`\r\n\r\ngives\r\n\r\n a\r\n b\r\n e\r\n f\r\n \r\n \r\nOther Operations\r\n----------------\r\n\r\nThese are a few other operations which I've found useful\r\n\r\n* *Squeeze blanks*\r\n\r\nThis operation squeezes out the blank lines in a file. \r\n\r\nSo, If you run\r\n`lines --squeeze file1` where `file1` looks like this\r\n\r\n\r\n a\r\n b\r\n c\r\n \r\n d\r\n\r\n f\r\n\r\nYou'd get\r\n\r\n a\r\n b\r\n c\r\n d\r\n f\r\n \r\n* *Identify Patterns*\r\n\r\nThis partitions the elements of the set into subsets all of whose\r\nmembers have an upper bound on the levenshtein distance from each\r\nother. This is useful to identify patterns in the input file.\r\n\r\nSo, if I have a file `examples/f6` that looks like this\r\n\r\n Archive.001-of-020.part\r\n Archive.002-of-020.part\r\n Archive.003-of-020.part\r\n Archive.004-of-020.part\r\n Archive.005-of-020.part\r\n Archive.006-of-020.part\r\n Archive.007-of-020.part\r\n .Archive.008-of-020.part.zbnrw\r\n Archive.009-of-020.part\r\n Archive.010-of-020.part\r\n Archive.011-of-020.part\r\n Archive.012-of-020.part\r\n Archive.013-of-020.part\r\n Archive.014-of-020.part\r\n Archive.015-of-020.part\r\n Archive.016-of-020.part\r\n Archive.017-of-020.part\r\n Archive.018-of-020.part\r\n Archive.019-of-020.part\r\n Archive.020-of-020.part\r\n\r\nI can run `python lines.py --patterns -l 5 examples/f6` and get\r\n\r\n\r\n 19 elements\r\n 1 elements - {'.Archive.008-of-020.part.zbnrw'}\r\n\r\nThe `-l 5` is to set the upper bound on the levenshtein distance\r\nto 5. The `-p` option allows us to specify an \"outlier percentage\". If\r\nthe number of elements in a subset is below this, it will print all\r\nthe elements of the subset. This is useful to see the items that don't\r\nmatch the general pattern in the file.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/nibrahim/lines", "keywords": "", "license": "GPL", "maintainer": "", "maintainer_email": "", "name": "lines", "package_url": "https://pypi.org/project/lines/", "platform": "linux,osx,win32", "project_url": "https://pypi.org/project/lines/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/nibrahim/lines" }, "release_url": "https://pypi.org/project/lines/0.1.0-a/", "requires_dist": null, "requires_python": null, "summary": "Treat lines of a file as elements of a set", "version": "0.1.0-a" }, "last_serial": 883995, "releases": { "0.1.0-a": [ { "comment_text": "", "digests": { "md5": "492ab9ed432c975b13d17eec9ec12879", "sha256": "ec0ed344fcac1dc955e45f3a3cf40b88aa11187a85012b71aa41de0dbfbfe2dd" }, "downloads": -1, "filename": "lines-0.1.0-a.tar.gz", "has_sig": false, "md5_digest": "492ab9ed432c975b13d17eec9ec12879", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2643, "upload_time": "2013-10-08T10:12:19", "url": "https://files.pythonhosted.org/packages/5f/e2/75c52414c13501b2b7886fcef7b2b3d5054ae141a1a00765b5d9a5f3d9fe/lines-0.1.0-a.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "492ab9ed432c975b13d17eec9ec12879", "sha256": "ec0ed344fcac1dc955e45f3a3cf40b88aa11187a85012b71aa41de0dbfbfe2dd" }, "downloads": -1, "filename": "lines-0.1.0-a.tar.gz", "has_sig": false, "md5_digest": "492ab9ed432c975b13d17eec9ec12879", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2643, "upload_time": "2013-10-08T10:12:19", "url": "https://files.pythonhosted.org/packages/5f/e2/75c52414c13501b2b7886fcef7b2b3d5054ae141a1a00765b5d9a5f3d9fe/lines-0.1.0-a.tar.gz" } ] }