{ "info": { "author": "Andriy Mylenkyy", "author_email": "mylanium@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "This package designed to find and manage duplications,\nand contains two utilities:\n\n * dupfind - to find duplications\n * dupmanage - to manage found duplications\n\n\nDUPFIND UTILITY:\n================\n\n*dupfind* utility allows you to find duplicated files and directories\nin your file system.\n\nShow how utility find duplicated files:\n=======================================\n\nBy default utility identifies *duplication files* by\nfile content.\n\nFirst of all - create several different files\nin the current directory.\n\n >>> createFile('tfile1.txt', \"A\"*10)\n >>> createFile('tfile2.txt', \"A\"*1025)\n >>> createFile('tfile3.txt', \"A\"*2048)\n\nThen create other files in another directory, one\nof them to be the same as already created ones.\n\n >>> mkd(\"dir1\")\n >>> createFile('tfile1.txt', \"A\"*20, \"dir1\")\n >>> createFile('tfile2.txt', \"A\"*1025, \"dir1\")\n >>> createFile('tfile13.txt', \"A\"*48, \"dir1\")\n\nLook into the directories contents:\n\n >>> ls()\n === list directory ===\n D :: dir1 :: ...\n F :: tfile1.txt :: 10\n F :: tfile2.txt :: 1025\n F :: tfile3.txt :: 2048\n\n >>> ls(\"dir1\")\n === list dir1 directory ===\n F :: tfile1.txt :: 20\n F :: tfile13.txt :: 48\n F :: tfile2.txt :: 1025\n\n\nWe see, that \"tfile2.txt\" is same in both directories,\nwhile \"tfile1.txt\" - has the same name, but differs in size.\nSo utility must identify only \"tfile2.txt\" as a duplication file.\n\nWe force output results with \"-o \"\nargument to *outputf* file, and pass *testdir* as directory\nthat is looking for duplications.\n\n >>> dupfind(\"-o %(o)s %(dir)s\" % {'o':outputf, 'dir': testdir})\n\n\nNow check the results file for duplications.\n \n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...\n ...,1025,F,txt,tfile2.txt,.../tmp...,...\n\n\n\nShow quick/slow utility mode:\n=============================\n\nAs mentioned above - utility identifies *duplication files*\nby file contents. This mode slows down the system and consumes\na lot of system resources.\n\nHowever, in most cases the file name and size is enough to identify the\nduplication. So in that case you can use *quick* mode\n--quick (-q) option.\n\nSo test the previous files in the quick mode:\n\n >>> dupfind(\"-q -o %(o)s %(dir)s\" % {'o':outputf, 'dir': testdir})\n\n\nNow check the result file for duplications.\n\n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...\n ...,1025,F,txt,tfile2.txt,.../tmp...,...\n\n\nAs we can see the quick mode identifies duplications correctly.\n\n\nLet's show that there are cases when this mode can lead to mistakes.\nTo do that let's add a file with the same name and size but different content\nand apply utility in both modes:\n\n\n >>> createFile('tfile000.txt', \"First \"*20,)\n >>> createFile('tfile000.txt', \"Second \"*20, \"dir1\")\n\n\nNow check the duplication results using default (not quick mode) ...\n\n\n >>> dupfind(\" -o %(o)s %(dir)s\" % {'o':outputf, 'dir': testdir})\n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...\n ...,1025,F,txt,tfile2.txt,.../tmp...,...\n\n\nAs we can see *not-quick* mode identifies duplications correctly.\n\nLet's check duplications using the quick mode...\n\n\n >>> dupfind(\" -q -o %(o)s %(dir)s\" % {'o':outputf, 'dir': testdir})\n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,140,F,txt,tfile000.txt,.../tmp.../dir1,...\n ...,140,F,txt,tfile000.txt,.../tmp...,...\n ...,1025,F,txt,tfile2.txt,.../tmp.../dir1,...\n ...,1025,F,txt,tfile2.txt,.../tmp...,...\n\n\nAs we can see wrong duplications are found using the quick-mode.\n\n\nCleanup the test\n\n >>> cleanTestDir()\n\n\nShow how utility finds duplicated directories:\n==============================================\n\nUtility identifies *duplicated directories* as\ndirectories, all files of which are *duplicated*\nand all inner directories are also *duplicated\ndirectories*.\n\n\nFirst compare 2 directories with the same files.\n------------------------------------------------\n\nCreate directories with the same content.\n\n >>> def mkDir(dpath):\n ... mkd(dpath)\n ... createFile('tfile1.txt', \"A\"*10, dpath)\n ... createFile('tfile2.txt', \"A\"*1025, dpath)\n ... createFile('tfile3.txt', \"A\"*2048, dpath)\n ... \n >>> mkDir(\"dir1\")\n >>> mkDir(\"dir2\")\n\nConfirm that the directories' contents are really identical\n\n >>> ls(\"dir1\")\n === list dir1 directory ===\n F :: tfile1.txt :: 10\n F :: tfile2.txt :: 1025\n F :: tfile3.txt :: 2048\n\n >>> ls(\"dir2\")\n === list dir2 directory ===\n F :: tfile1.txt :: 10\n F :: tfile2.txt :: 1025\n F :: tfile3.txt :: 2048\n\n\nNow run the utility and check the result file:\n\n >>> dupfind(\"-o %(o)s %(dir)s\" % {'o':outputf, 'dir': testdir})\n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,D,,dir1,...\n ...,D,,dir2,...\n\n \nCompare 2 directories with the same files and dirs.\n---------------------------------------------------\n\nCreate new directories with the same content, but different names in previously\ncreated directories. \n\nSo for directories to be interpreted as duplications - they don't need to have\nthe same name, but the identical content.\n\nAdd 2 identical directories to the previous ones.\n\n >>> def mkDir1(dpath):\n ... mkd(dpath)\n ... createFile('tfile11.txt', \"B\"*4000, dpath)\n ... createFile('tfile12.txt', \"B\"*222, dpath)\n ... \n >>> mkDir1(\"dir1/dir11\")\n >>> mkDir1(\"dir2/dir21\")\n\nNote that we added two directories with same contents,\nbut different names. This should not break duplications.\n\n >>> def mkDir2(dpath):\n ... mkd(dpath)\n ... createFile('tfile21.txt', \"C\"*4096, dpath)\n ... createFile('tfile22.txt', \"C\"*123, dpath)\n ... createFile('tfile23.txt', \"C\"*444, dpath)\n ... createFile('tfile24.txt', \"C\"*555, dpath)\n ... \n >>> mkDir2(\"dir1/dir22\")\n >>> mkDir2(\"dir2/dir22\")\n\n\nConfirm that directories' contents are really identical\n\n >>> ls(\"dir1\")\n === list dir1 directory ===\n D :: dir11 :: -1\n D :: dir22 :: -1\n F :: tfile1.txt :: 10\n F :: tfile2.txt :: 1025\n F :: tfile3.txt :: 2048\n >>> ls(\"dir2\")\n === list dir2 directory ===\n D :: dir21 :: -1\n D :: dir22 :: -1\n F :: tfile1.txt :: 10\n F :: tfile2.txt :: 1025\n F :: tfile3.txt :: 2048\n\nAnd contents for inner directories\n\nFirst subdirectory:\n\n >>> ls(\"dir1/dir11\")\n === list dir1/dir11 directory ===\n F :: tfile11.txt :: 4000\n F :: tfile12.txt :: 222\n >>> ls(\"dir2/dir21\")\n === list dir2/dir21 directory ===\n F :: tfile11.txt :: 4000\n F :: tfile12.txt :: 222\n\nSecond subdirectory:\n\n >>> ls(\"dir1/dir22\")\n === list dir1/dir22 directory ===\n F :: tfile21.txt :: 4096\n F :: tfile22.txt :: 123\n F :: tfile23.txt :: 444\n F :: tfile24.txt :: 555\n >>> ls(\"dir2/dir22\")\n === list dir2/dir22 directory ===\n F :: tfile21.txt :: 4096\n F :: tfile22.txt :: 123\n F :: tfile23.txt :: 444\n F :: tfile24.txt :: 555\n\n\nNow test the utility.\n\n >>> dupfind(\"-o %(o)s %(dir)s\" % {'o':outputf, 'dir': testdir})\n\n\nChecks the results file for duplications.\n\n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,D,,dir1,...\n ...,D,,dir2,...\n\n\nNOTE:\n-----\nInner duplication directories are excluded from the results:\n\n >>> outputres = file(outputf).read()\n >>> \"dir1/dir11\" in outputres\n False\n >>> \"dir1/dir22\" in outputres\n False\n >>> \"dir2/dir21\" in outputres\n False\n >>> \"dir2/dir22\" in outputres\n False\n\n\nUtility accepts more than one argument as directories list:\n===========================================================\n\nUse previous directory structure to prove this:\n\nNow pass to utility \"dir1/dir11\" and \"dir2\" directories:\n\n >>> dupfind(\"-o %(o)s %(dir1-11)s %(dir2)s\" % {\n ... 'o':outputf,\n ... 'dir1-11': os.path.join(testdir,\"dir1/dir11\"),\n ... 'dir2': os.path.join(testdir,\"dir2\"),})\n\n\nNow check the result file for duplications.\n\n >>> cat(outputf)\n hash,size,type,ext,name,directory,modification,operation,operation_data\n ...,D,,dir11,.../tmp.../dir1,...\n ...,D,,dir21,.../tmp.../dir2,...\n\n\n\nDUPMANAGE UTILITY:\n==================\n\n\n*dupmanage* utility allows you to manage duplication files and directories\nof your file system with csv data file.\n\nUtility use csv-formatted data-file to process duplication items.\nData file must contain the following columns:\n\n * type\n * name\n * directory\n * operation\n * operation_data\n\n\nUtility supports 2 types of operations with duplication items:\n\n * deleting (\"D\")\n * symlinking (\"L\") only for UNIX-like systems\n\n*operation_data* is only used for *symlinking* operation and \nmust contain the path to symlinking sorce item.\n\n\n\nShow how utility manages duplications:\n======================================\n\nTo show - use previous directory structure and \nalso add several duplications:\n\nCreate a file in the root directory and the same file\nin another catalog.\n\n >>> createFile('tfile03.txt', \"D\"*100)\n >>> mkd(\"dir3\")\n >>> createFile('tfile03.txt', \"D\"*100, \"dir3\")\n\nLook into directories contents:\n\n >>> ls()\n === list directory ===\n D :: dir1 :: ...\n D :: dir2 :: ...\n D :: dir3 :: ...\n F :: tfile03.txt :: 100\n\n >>> ls(\"dir3\")\n === list dir3 directory ===\n F :: tfile03.txt :: 100\n\n\nWe already know the previous duplications, so now we create\ncsv-formatted data file to manage duplications.\n\n >>> manage_data = \"\"\"type,name,directory,operation,operation_data\n ... F,tfile03.txt,%(testdir)s/dir3,L,%(testdir)s/tfile03.txt\n ... D,dir2,%(testdir)s,D,\n ... \"\"\" % {'testdir': testdir}\n >>> createFile('manage.csv', manage_data)\n\n\n\nNow call the utility and check result directory content:\n\n >>> manage_path = os.path.join(testdir, 'manage.csv')\n >>> dupmanage(\"%s -v\" % manage_path)\n [... \n [...]: Symlink .../tfile03.txt item to .../dir3/tfile03.txt\n [...]: Remove .../dir2 directory \n [...]: Processed 2 items \n\n\nReview directory content:\n\n >>> ls()\n === list directory ===\n D :: dir1 :: ...\n D :: dir3 :: ...\n F :: tfile03.txt :: 100\n\n >>> ls(\"dir3\")\n === list dir3 directory ===\n L :: tfile03.txt :: ...\n\n\n\n\nHISTORY:\n========\n\n1.4.3\n-----\n\n * Comment useless for now output_format option\n for dupfinder utility.\n\n\n1.4.2\n-----\n\n * Refactoring content comparison to use zlib.crc32\n function to calculate file content diges -\n speedup algorythm.\n * Fixed some bugs\n\n\n1.4\n---\n\n * Updated file duplication finding: added file\n comparison by content oportunity. Made this\n variant - default one.\n * Added -q (--quick) option to use quick file\n comparison (by name and size)\n * Added tests for quick/not-quick duplication\n finding\n\n\n1.2\n---\n\n * Added *dupmanage* utility for manage duplications\n * Added tests for *dupmanage* utility\n\n\n1.0\n---\n\n * Tests for *dupfinder* utility added\n\n\n0.8\n---\n\n * Refactoring classes: remove DupFilter,\n move filtering into DupOut class.\n * Force implicitly hiding inner content of\n a duplication directories.\n\n\n0.7\n---\n\n * Refactoring utility into classes\n * Fix bugs with bad files processing\n * Fix bug with size calculation\n\n\n0.5\n---\n\n * Refactoring inner finding algorithm\n * Implemented opportunity to remove from\n the result report inner content from \n duplication directories\n\n\n0.3\n---\n\n * Files finder implemented\n * Output in csv format\n * added filters by size\n\n\n0.1\n---\n\n * Initial release", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://python.org/pypi/dupfinder", "keywords": "file duplication finder manager", "license": "GPL", "maintainer": null, "maintainer_email": null, "name": "dupfinder", "package_url": "https://pypi.org/project/dupfinder/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/dupfinder/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://python.org/pypi/dupfinder" }, "release_url": "https://pypi.org/project/dupfinder/1.4.3/", "requires_dist": null, "requires_python": null, "summary": "Find and manage duplication files on the file system", "version": "1.4.3" }, "last_serial": 147508, "releases": { "1.4.3": [ { "comment_text": "", "digests": { "md5": "5b0697388a8a9913a0ffc56a5e38926a", "sha256": "507cdb501eb7161a91f4b6fdb4ffe72f0878016e71dfa10f975af6a9519e894f" }, "downloads": -1, "filename": "dupfinder-1.4.3-py2.4.egg", "has_sig": false, "md5_digest": "5b0697388a8a9913a0ffc56a5e38926a", "packagetype": "bdist_egg", "python_version": "2.4", "requires_python": null, "size": 23626, "upload_time": "2010-01-20T23:38:53", "url": "https://files.pythonhosted.org/packages/5b/3a/a1c1f17d42048bd5a44555f158bd59a03da596b59248fb9e71f3a36b970b/dupfinder-1.4.3-py2.4.egg" }, { "comment_text": "", "digests": { "md5": "25b2b06a31e74ce1e59a23de682af1ff", "sha256": "3cab159008fa5a8b88da2bc8b1e0ed2761cecdcb9f983c0f3f8b7c771ab3f9af" }, "downloads": -1, "filename": "dupfinder-1.4.3.tar.gz", "has_sig": false, "md5_digest": "25b2b06a31e74ce1e59a23de682af1ff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10538, "upload_time": "2010-01-20T23:38:51", "url": "https://files.pythonhosted.org/packages/54/c1/7c4fa491298a83f98318ef6e429c372797667a89f798fd7cf06748034c85/dupfinder-1.4.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5b0697388a8a9913a0ffc56a5e38926a", "sha256": "507cdb501eb7161a91f4b6fdb4ffe72f0878016e71dfa10f975af6a9519e894f" }, "downloads": -1, "filename": "dupfinder-1.4.3-py2.4.egg", "has_sig": false, "md5_digest": "5b0697388a8a9913a0ffc56a5e38926a", "packagetype": "bdist_egg", "python_version": "2.4", "requires_python": null, "size": 23626, "upload_time": "2010-01-20T23:38:53", "url": "https://files.pythonhosted.org/packages/5b/3a/a1c1f17d42048bd5a44555f158bd59a03da596b59248fb9e71f3a36b970b/dupfinder-1.4.3-py2.4.egg" }, { "comment_text": "", "digests": { "md5": "25b2b06a31e74ce1e59a23de682af1ff", "sha256": "3cab159008fa5a8b88da2bc8b1e0ed2761cecdcb9f983c0f3f8b7c771ab3f9af" }, "downloads": -1, "filename": "dupfinder-1.4.3.tar.gz", "has_sig": false, "md5_digest": "25b2b06a31e74ce1e59a23de682af1ff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10538, "upload_time": "2010-01-20T23:38:51", "url": "https://files.pythonhosted.org/packages/54/c1/7c4fa491298a83f98318ef6e429c372797667a89f798fd7cf06748034c85/dupfinder-1.4.3.tar.gz" } ] }