{ "info": { "author": "Roland Puntaier", "author_email": "roland.puntaier@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Intended Audience :: End Users/Desktop", "Intended Audience :: Information Technology", "Intended Audience :: System Administrators", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Operating System :: POSIX", "Programming Language :: Python :: 3.6", "Topic :: System :: Archiving", "Topic :: System :: Systems Administration", "Topic :: Utilities" ], "description": "================================\nremdups - remove duplicate files\n================================\n\n:Author: Roland Puntaier\n:Homepage: https://github.com/rpuntaie/remdups\n:License: See LICENSE file\n\nremdups\n=======\n\n``remdups`` generates a script to\n\n- remove duplicate files\n\n- copy files from another directory to this one, ignoring duplicates\n\n- move files from another directory to this one, ignoring duplicates\n\nThe resulting script should be further inspected before executing it in your shell.\n\nUsage\n=====\n\n0) Optional. You can choose one or more source+hashing methods, via e.g.::\n\n cat > .remdups_c.sha512\n cat > .remdups_e.md5\n\n All of .remdups_{c,b,d,e,n}.{sha512, sha384, sha256, sha224, sha1, md5} \n contribute to the final hash. If you don't make such a file, the default is::\n\n .remdups_c.sha256\n\n {'c': 'content', 'b': 'block', 'd': 'date', 'e': 'exif', 'n': 'name'}\n\n1. Create the hash file by either of (can take a long time)::\n\n remdups\n remdups update\n remdups update \n\n The hashes are added to all .remdups_x.y. To rehash all files::\n\n rm .remdups_*\n\n2. Make a script with rm, mv, cp commands.\n It can be repeated with different options until the script is good.\n\n $remdups rm -s script.sh\n $remdups cp -s script.bat #if you used \n $remdups mv -s script.py #if you used \n\n If the file ends in .sh, cp is used and the file names are in linux format.\n This is usable also on Windows with MSYS, MSYS2 and CYGWIN.\n\n If the file ends in .bat, Windows commands are used.\n\n If the file ends in .py, python functions are used.\n\n3. Inspect the script and go back to 2., if necessary.\n Changes to the script can also be done with the editor.\n\n4. execute script\n\n $./script.sh\n\n\nAlternatively you can use remdups from your own python script, or interactively from a python prompt.\n\nInstall\n=======\n\n\n- Directly from PyPi:\n\n.. code:: console\n\n $ pip install remdups\n\n\n- From `github`_: Clone, change to the directory and do\n\n.. code:: console\n\n $ python setup.py install\n\n- If you plan to extend this tool\n\n - fork on `github`_\n - clone from your fork to you PC\n - do an editable install\n\n .. code:: console\n\n $ pip install -e .\n\n - test after change and bring coverage to 100%\n\n .. code:: console\n\n $ py.test --cov remdups.py --cov-report term-missing\n\n - consider sharing changes useful for others (`github`_ pull request).\n\n.. hint:: \n\n For more advanced file selection ``find`` can be used.\n The following example ignores directory ``old`` and produces a hash for all JPEG files:\n\n .. code:: console\n\n $ find . -path \"old\" -prune -or -not -type d -and -iname \"\\*.jpg\" -exec sha256sum {} \\; > .remdups_c.sha256\n\nCommand Line\n============\n\nThe following is in addition to the help given with::\n\n remdups --help\n\nThe sources for the hashes can be::\n\n {'c': 'content', 'b': 'block', 'd': 'date', 'e': 'exif', 'n': 'name'}\n\nDon't include ``n``, because same files with different names cannot be found. ``c`` is the best.\n\nDo e.g::\n\n cat > .remdups_b.sha512\n cat > .remdups_c.sha256\n\nFill the hash files from the current directory::\n\n remdups update\n\nOr fill the hash files from another directory::\n\n remdups update \n\nIn the latter case the paths in the hash files will have a ``//`` or ``\\\\``\nto mark the start for the new relatives paths in a subsequent ``mv`` or ``cp`` command.\n\nOnce the hash files are filled create the script. It depend on the extension used::\n\n remdups -s script.sh \n remdups -s script.bat \n remdups -s script.py \n\n``command`` can be ``rm``, ``cp``, ``mv``.\nThere is also ``dupsof`` and ``dupsoftail``, but they don't take a ``--script``, but print the output.\n\n``--keep-in``, ``--keep-out`` and ``--comment-out`` will remove different files of a duplicate group.\n``--safe`` will do a byte-wise comparison, before creating the script. That takes longer.\n\n``cp`` and ``mv`` also take ``--sort``: In this case the tree is not recreated, but the files are sorted\nto the provided tree structure using the file modification date. See https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior.\n\nAPI\n===\n\nWith your own python script you can load the file for hashing and use the\nloaded content immediately to create the new file, if not duplicate.\n\n.. code:: python\n\n from remdups import *\n hasher = Hasher()\n allduplicates = []\n for filename,duplicates,content in hasher.foreachcontent('.'):\n if duplicates:\n allduplicates.append(f)\n else:\n assert content!=[] #some .remdups_ must be with (c)ontent\n nfilename = 'afilehere'\n with open(nfilename,'wb') as nf:\n for buf in content:\n nf.write(buf)\n shutil.copystat(filename, nfilename)\n\n``foreachcontent()`` uses ``scandir()``, but does not add duplicate files to the ``.remdup_`` files.\n\n.. code:: python\n\n for f in hasher.scandir(otherdir,filter=['*.jpg'],exclude=['**/old/*']):\n duplicates = hasher.duplicates(f)\n yield (f,duplicates,kw['content'])\n if duplicates:\n hasher.clear(f)\n else:\n hasher.update_hashfiles()\n\nIf you don't want to keep the content, don't provide a ``[]`` for ``content`` in ``scandir``.\n``scandir()`` will hash all files not yet in the ``.remdup_`` files and will return the file name.\n\nThis code resorts a tree by hashing and creating a copy, if not duplicate.\n\n.. code:: python\n\n import os\n import remdups\n os.chdir('dir/to/resort/to')\n with open('.remdups_c.sha256','w'): pass\n remdups.resort('../some/dir/here',\"%y%m/%d_%H%M%S\")\n\n\n.. _`github`: https://github.com/rpuntaie/remdups\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/rpuntaie/remdups", "keywords": "Duplicate, File", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "remdups", "package_url": "https://pypi.org/project/remdups/", "platform": "", "project_url": "https://pypi.org/project/remdups/", "project_urls": { "Homepage": "https://github.com/rpuntaie/remdups" }, "release_url": "https://pypi.org/project/remdups/1.3.1/", "requires_dist": [ "pillow", "pillow; extra == 'develop'", "piexif; extra == 'develop'", "pytest-toolbox; extra == 'develop'", "pytest-coverage; extra == 'develop'" ], "requires_python": "", "summary": "remdups - remove duplicate files", "version": "1.3.1" }, "last_serial": 3558588, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "8e02e5fd7a4d41c3130ec9ec5d340a18", "sha256": "acf2fd657b7fd4a254789e92d66783c029794262c238564e2c48821f447109f0" }, "downloads": -1, "filename": "remdups-1.0.tar.gz", "has_sig": false, "md5_digest": "8e02e5fd7a4d41c3130ec9ec5d340a18", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3895, "upload_time": "2013-10-08T10:24:20", "url": "https://files.pythonhosted.org/packages/08/be/8912e897c99280538d1ed73510016525eedef326f5e11b2062278ed9306c/remdups-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "31cf96c5c385e142c36c08ec0fae6797", "sha256": "29fcd20f2f903a734d51411fe1f775229eb4db62edc154dc336bc794c2dfa298" }, "downloads": -1, "filename": "remdups-1.1.tar.gz", "has_sig": false, "md5_digest": "31cf96c5c385e142c36c08ec0fae6797", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4070, "upload_time": "2013-10-31T21:53:51", "url": "https://files.pythonhosted.org/packages/29/89/d2836b4264c37e4d73471fdcafdbda6a2f8363293c1452515d9f34b7b363/remdups-1.1.tar.gz" } ], "1.2": [ { "comment_text": "", "digests": { "md5": "9c53d5d3778e3c78a4ac40e25e3490d7", "sha256": "1952452038bdc3e1e6fd1febd0cf303809be609292636063e67a4fecea622169" }, "downloads": -1, "filename": "remdups-1.2.tar.gz", "has_sig": false, "md5_digest": "9c53d5d3778e3c78a4ac40e25e3490d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8395, "upload_time": "2016-05-29T19:04:04", "url": "https://files.pythonhosted.org/packages/d4/f1/deadc508057f6e9fe53a73a74bfdd7d7fc312e74f3d5073b7651c68d37ec/remdups-1.2.tar.gz" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "e7eab5ece295ddd056ef6c283a46388f", "sha256": "9082be17fade2de3dab3d07e4eb9ab3f8c001d87891a8173598a9b404f5fe564" }, "downloads": -1, "filename": "remdups-1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "e7eab5ece295ddd056ef6c283a46388f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16197, "upload_time": "2018-01-30T20:56:44", "url": "https://files.pythonhosted.org/packages/62/9e/b469c50482abd8cbc035019f1230d366cd56f84e34093f615870aca22282/remdups-1.3-py3-none-any.whl" } ], "1.3.1": [ { "comment_text": "", "digests": { "md5": "3bcd90bdb68469874d900424efad24f2", "sha256": "26004255b0e51033f47e96bf79e600d73db32fed729ace970ebffa079f7f2c86" }, "downloads": -1, "filename": "remdups-1.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "3bcd90bdb68469874d900424efad24f2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16613, "upload_time": "2018-02-06T22:10:49", "url": "https://files.pythonhosted.org/packages/28/60/ffc7d1388800ae5f50ca18e7699a70dde40eb97545543988e1099355520e/remdups-1.3.1-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3bcd90bdb68469874d900424efad24f2", "sha256": "26004255b0e51033f47e96bf79e600d73db32fed729ace970ebffa079f7f2c86" }, "downloads": -1, "filename": "remdups-1.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "3bcd90bdb68469874d900424efad24f2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16613, "upload_time": "2018-02-06T22:10:49", "url": "https://files.pythonhosted.org/packages/28/60/ffc7d1388800ae5f50ca18e7699a70dde40eb97545543988e1099355520e/remdups-1.3.1-py3-none-any.whl" } ] }