{ "info": { "author": "Ben Hoyt", "author_email": "benhoyt@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Multimedia :: Graphics" ], "description": "dhash\n=====\n\ndhash is a Python library that generates a \"difference hash\" for a given image\n-- a `perceptual hash`_ based on Neal Krawetz's dHash algorithm in `this\n\"Hacker Factor\" blog entry`_.\n\nThe library is `on the Python Package Index (PyPI)`_ and works on both Python\n3 and Python 2.7. To install it, fire up a command prompt, activate your\nvirtual environment if you're using one, and type:\n\n::\n\n pip install dhash\n\nThe algorithm to create a difference hash is very simple:\n\n* Convert the image to grayscale\n* Downsize it to a 9x9 thumbnail (size=8 means an 8+1 by 8+1 image)\n* Produce a 64-bit \"row hash\": a 1 bit means the pixel intensity is increasing\n in the x direction, 0 means it's decreasing\n* Do the same to produce a 64-bit \"column hash\" in the y direction\n* Combine the two values to produce the final 128-bit hash value\n\nThe library defaults to producing a size 8 dhash, but you can override this\neasily by passing ``size=N`` as a keyword argument to most functions. For\nexample, you can produce a more accurate (but slower to work with) dhash of\n512 bits by specifying ``size=16``.\n\nI've found that the dhash is great for detecting near duplicates (at\n`Jetsetter`_ we find dupes using a size 8 dhash with a maximum delta of 2\nbits). But because of the simplicity of the algorithm, it's not great at\nfinding similar images or duplicate-but-cropped images -- you'd need a more\nsophisticated image fingerprint if you want that. However, the dhash is good\nfor finding exact duplicates and near duplicates, for example, the same image\nwith slightly altered lighting, a few pixels of cropping, or very light\nphotoshopping.\n\nTo use the dhash library, you need either the `wand`_ ImageMagick binding or\nthe `Pillow (PIL)`_ library installed. Pick one and stick with it -- they will\nproduce slightly different dhash values due to differences in their grayscale\nconversion and resizing algorithms.\n\nIf you have both libraries installed, dhash will use wand by default. To\noverride this and force use of Pillow/PIL, call ``dhash.force_pil()`` before\nusing the library.\n\nTo produce a dhash value using wand:\n\n.. code:: python\n\n import dhash\n from wand.image import Image\n\n with Image(filename='dhash-test.jpg') as image:\n row, col = dhash.dhash_row_col(image)\n print(dhash.format_hex(row, col))\n\nTo produce a dhash value using Pillow:\n\n.. code:: python\n\n import dhash\n from PIL import Image\n\n image = Image.open('dhash-test.jpg')\n row, col = dhash.dhash_row_col(image)\n print(dhash.format_hex(row, col))\n\nIf you have your own library to convert an image to grayscale and downsize it\nto 9x9 (or 17x17 for size=16), you can pass ``dhash_row_col()`` a list of\ninteger pixel intensities (for example, from 0 to 255). For example:\n\n.. code:: python\n\n >>> import dhash\n >>> row, col = dhash.dhash_row_col([0,0,1,1,1, 0,1,1,3,4, 0,1,6,6,7, 7,7,7,7,9, 8,7,7,8,9], size=4)\n >>> format(row, '016b')\n '0100101111010001'\n >>> format(col, '016b')\n '0101001111111001'\n\nTo produce the hash value as a 128-bit integer directly, use\n``dhash_int(image, size=N)``. To format the hash value in various ways, use\nthe ``format_*`` functions:\n\n.. code:: python\n\n >>> row, col = (13962536140006260880, 9510476289765573406)\n >>> dhash.format_bytes(row, col)\n b'\\xc1\\xc4\\xe4\\xa4\\x84\\xa0\\x80\\x90\\x83\\xfb\\xff\\xcc\\x00@\\x83\\x1e'\n >>> dhash.format_hex(row, col)\n 'c1c4e4a484a0809083fbffcc0040831e'\n\nTo compute the number of bits different (hamming distance) between two\nhashes, you can use the ``get_num_bits_different(hash1, hash2)`` helper\nfunction:\n\n.. code:: python\n\n >>> import dhash\n >>> dhash.get_num_bits_different(0x4bd1, 0x5bd2)\n 3\n\nYou can also use dhash to generate the difference hash for a specific image\nfrom the command line:\n\n::\n\n $ python -m dhash dhash-test.jpg\n c1c4e4a484a0809083fbffcc0040831e\n\n $ python -m dhash --format=decimal dhash-test.jpg\n 13962536140006260880 9510476289765573406\n\n # show the 8x8 row and column grids\n $ python -m dhash --format=matrix dhash-test.jpg\n * * . . . . . * \n * * . . . * . . \n * * * . . * . . \n * . * . . * . . \n * . . . . * . . \n * . * . . . . . \n * . . . . . . . \n * . . * . . . . \n\n * . . . . . * * \n * * * * * . * * \n * * * * * * * * \n * * . . * * . . \n . . . . . . . . \n . * . . . . . . \n * . . . . . * * \n . . . * * * * . \n\n # compute the bit delta between two images\n $ python -m dhash dhash-test.jpg similar.jpg\n 1 bit differs out of 128 (0.8%)\n\nRead the code in `dhash.py`_ for more details \u2013 it's pretty small!\n\ndhash was written by `Ben Hoyt`_ for `Jetsetter`_ and is licensed with a\npermissive MIT license (see `LICENSE.txt`_).\n\n\n.. _perceptual hash: https://en.wikipedia.org/wiki/Perceptual_hashing\n.. _on the Python Package Index (PyPI): https://pypi.python.org/pypi/dhash\n.. _this \"Hacker Factor\" blog entry: http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html\n.. _wand: https://pypi.python.org/pypi/Wand\n.. _Pillow (PIL): https://pypi.python.org/pypi/Pillow\n.. _dhash.py: https://github.com/Jetsetter/dhash/blob/master/dhash.py\n.. _Ben Hoyt: http://benhoyt.com/\n.. _Jetsetter: http://www.jetsetter.com/\n.. _LICENSE.txt: https://github.com/Jetsetter/dhash/blob/master/LICENSE.txt\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Jetsetter/dhash", "keywords": "", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "dhash", "package_url": "https://pypi.org/project/dhash/", "platform": "", "project_url": "https://pypi.org/project/dhash/", "project_urls": { "Homepage": "https://github.com/Jetsetter/dhash" }, "release_url": "https://pypi.org/project/dhash/1.3/", "requires_dist": null, "requires_python": "", "summary": "Calculate difference hash (perceptual hash) for a given image, useful for detecting duplicates", "version": "1.3" }, "last_serial": 3117422, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "a969a1df0a8e0cdadc55bf6b099c0565", "sha256": "1c2904ba77a2fde41a3d9e2cd92b9ed3865d986b86bcf2d84ab16be2c4cedead" }, "downloads": -1, "filename": "dhash-1.0.tar.gz", "has_sig": false, "md5_digest": "a969a1df0a8e0cdadc55bf6b099c0565", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7709, "upload_time": "2017-01-31T18:23:45", "url": "https://files.pythonhosted.org/packages/7e/c0/0ba08a401bab76d26899e7351c7c47f90ac7f0ec42d4509ce13013d030d4/dhash-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "89847af9c50649bccf01e2d0d79aff63", "sha256": "6bb3639db07fdad05a7fc495552a161b9a4eee4f1ba89213fd116e2191d16302" }, "downloads": -1, "filename": "dhash-1.1.tar.gz", "has_sig": false, "md5_digest": "89847af9c50649bccf01e2d0d79aff63", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7740, "upload_time": "2017-01-31T19:19:11", "url": "https://files.pythonhosted.org/packages/59/b1/808d5b8dd92197b6d22617c356482f5c5bd1978f41934bcb708223d6c187/dhash-1.1.tar.gz" } ], "1.2": [ { "comment_text": "", "digests": { "md5": "b836c5b27593954100b4a2634fc3dd8b", "sha256": "5f70b09266474aebc07a946b917878c4e1cdfe8307a69c888365f3ee4f0e336f" }, "downloads": -1, "filename": "dhash-1.2.tar.gz", "has_sig": false, "md5_digest": "b836c5b27593954100b4a2634fc3dd8b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7834, "upload_time": "2017-08-22T13:36:47", "url": "https://files.pythonhosted.org/packages/2b/34/34968c71120acffd40500165ca87c45a468c14532abdbbb7412fe38d0483/dhash-1.2.tar.gz" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "95e1fb39b6b11928d32093de6d270f05", "sha256": "e6c8cd09d330f1ac44d3c9735d6b2a637d713dcb6b6091e340f91dda2484acb8" }, "downloads": -1, "filename": "dhash-1.3.tar.gz", "has_sig": false, "md5_digest": "95e1fb39b6b11928d32093de6d270f05", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7860, "upload_time": "2017-08-23T12:48:15", "url": "https://files.pythonhosted.org/packages/3d/98/621b5dffc1d47135377ceeebd173f0daaa3e4c993a4dfd5cc0d63a3ffdbd/dhash-1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "95e1fb39b6b11928d32093de6d270f05", "sha256": "e6c8cd09d330f1ac44d3c9735d6b2a637d713dcb6b6091e340f91dda2484acb8" }, "downloads": -1, "filename": "dhash-1.3.tar.gz", "has_sig": false, "md5_digest": "95e1fb39b6b11928d32093de6d270f05", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7860, "upload_time": "2017-08-23T12:48:15", "url": "https://files.pythonhosted.org/packages/3d/98/621b5dffc1d47135377ceeebd173f0daaa3e4c993a4dfd5cc0d63a3ffdbd/dhash-1.3.tar.gz" } ] }