{
    "info": {
        "author": "Robin H.",
        "author_email": "twizzle@gmx.net",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "# Twizzl - A Multi-Purpose Benchmarking Tool\n\nTwizzl was originally developed to offer an easy to use and flexible benchmarking framework for perceptual image hashing algorithms having different approaches of generating, saving and comparing hashes. But Twizzle is more than that. You can use it to evaluate every algorithm used for the task of content identification like facial recognition, video hashing and many more.\n\n## Basic Idea\n\nThe underlying idea of Twizzle is the usecase of content identification. You have original objects like images, videos, audio files etc. and you compare them to manipulated versions of them. What you wanna know is how good a specific algorithm with different settings or many different algorithms perform at this task.\n\n### Example: Perceptual Image Hashing for Print-Scan usage\n\nThink about the following task: You try to find a perceptual image hashing algorithm that works best for matching images to its printed and scanned selfs. In a later setup you would like to generate a hash for every image that should be identified and save the hash together with metadata like the name or contextual information in a database. Given a printed and scanned image you would generate a hash using the same algorithm and settings and search the database for hashes being close to this one. For binary hashes one would normally use the normalized hamming distance and call it a match if some threshold falls below a certain limit.\n\n<img src=\"_img/content_identification.png\" height=\"300\">\n\nFacing that every algorithm has its own way of hash representation we realized that we have to abstract the task to the following Question: Are two objects (in this case images) the same or not?\n\n<img src=\"_img/content_identification2.png\" height=\"300\">\n \n#### Challenges and Tests\nTwizzl distinguishes between **challenges** and **tests**.\n\nA **challenge** is a set of original objects, a set of comparative objects (both images in our case) and a set of correct decisions that a algorithm under test should make if it works correctly. Additionally it can be enriched with arbitrary metadata like in our example the printer used to create the images or the printer settings used.\n\n<table>\n  <tr>\n    <th>Original</th>\n    <th>Comparative</th>\n    <th>Target Decision</th>\n    <th>Metadata</th>\n  </tr>\n  <tr>\n    <td><img src=\"_img/p1.png\" height=\"100\"></td>\n    <td><img src=\"_img/p1_S.png\" height=\"100\"></td>\n    <td width=\"20%\">True</td>\n    <td width=\"30%\" rowspan=\"3\">\n{\n    \"printer\": \"DC785\",\n    \"toner_settings\": \"toner saving mode\",\n    \"printer_dpi\": 300\n}\n</td>\n  </tr>\n  <tr>\n    <td><img src=\"_img/p2.png\" height=\"100\"></td>\n    <td><img src=\"_img/p2_D.png\" height=\"100\"></td>\n    <td>False</td>\n  </tr>\n  <tr>\n    <td><img src=\"_img/p3.png\" height=\"100\"></td>\n    <td><img src=\"_img/p3_S.png\" height=\"100\"></td>\n    <td>True</td>\n  </tr>\n</table>\n\n**Tests** are runs of a challenge done by one algorithm running with specific settings. A run gets a list of original objects and a list of comparative objects and decides based on the algorithm whether a original-comparative pair is believed to be the same or not.\n\n## Installation\n\n```\npip install twizzle\n```\n\n## Create challenges\n\nTwizzle offeres an easy way to add challenges. Just initiate a new instance of Twizzle. Then create a list of strings describing paths to original objects and one describing pathes to ist comparative objects. Create a third list of booleans coding whether the objects are the same or not. See the basic example in `example_challenge_creator.py`.\n\n```python\nfrom twizzle import Twizzle\n\nsDBPath = \"test.db\"\ntw = Twizzle(sDBPath)\n\nsChallengeName = \"image_hashing_challenge_print_scan_1\"\n\naOriginals = [\"c1.png\", \"c2.png\", \"c3.png\"]\naComparatives = [\"c1.png\", \"c5.png\", \"c6.png\"]\naTargetDecisions = [True, False, False]\n\ndicMetadata = {\n    \"printer\": \"DC783\",\n    \"paper\": \"recycled paper\",\n    \"print_dpi\": 300\n}\n\ntw.add_challenge(sChallengeName, aOriginals,\n                    aComparatives, aTargetDecisions, dicMetadata)\n```\n\n## Run tests\n\nThe **tests** of Twizzle are like a blind test for the algorithms. A test gives a set of original objects and corresponding comparative objects to a user defined algorithm. This algorithms compares every single original and comparative object pair and decides whether they are the same for it or not. With all decisions for all object pairs of a challenge returned to the Twizzle framework it can compare the decisions with the target decisions for the challenge and calculate the error rate (and accuracy, recall, precision, F1 score, FAR, FRR). Based on the error rate you can compare your algorithm or different configurations of your algorithm with others.\n\nRunning tests and evaluating the performance of your algorithms can take a lot of time. Make sure you used relative paths while defining challenges in order to run the testing part on a server having a lot of computational power.\n\nThe Twizzle Framework offers the user a TestRunner component. Have a look at `example_tests.py` in order to get an idea how to use it. There you can see an example for the image hashing algorithm `dHash`. First of all you have to write a wrapper function for your algorithm that handles to get a list of original and comparative objects (images in this case) as first two arguments. Remember that the two list are list of strings. You can also use the strings to save paths where the objects are saved or to encode the objects directly. Moreover you can set an arbitrary amount of additional named arguments.\n\nThe first step your wrapper function has to do is to load the objects link to in the two list. Then it has to evaluate whether the two objects are the same or not. In this example the `dhash` wrapper has to compare every original to its comparative image and decide whether they are the same or not based on the normalized hamming distance of the two hashes and a threshold. The wrapper function has to return a list of decisions having the same dimension like the list of original and comparative images. Additionally it can return a dictionary filled with arbitrary metadata. This metadata will be saved along with the results of the test.\n\n```python\ndef test_dhash(aOriginalImages, aComparativeImages, lThreshold=0.15, lHashSize=16):\n    from twizzle.hashalgos_preset import dhash\n\n    # create dictionary of metadata\n    dicMetadata = {\"algorithm\": \"dhash\",\n                   \"hash_size\": lHashSize, \"threshold\": lThreshold}\n\n    # compare every image\n    aDecisions = []\n    for i, aOriginalImagePath in enumerate(aOriginalImages):\n        aComparativeImagePath = aComparativeImages[i]\n\n        # get images from path\n        aOriginalImage = load_image(aOriginalImagePath)\n        aComparativeImage = load_image(aComparativeImagePath)\n\n        # calculate hashes\n        aHashOriginal = dhash(aOriginalImage, hash_size=lHashSize)\n        aHashComparative = dhash(aComparativeImage, hash_size=lHashSize)\n\n        # calculate deviation\n        dDeviation = hamming_distance(aHashComparative, aHashOriginal)\n\n        # make decision\n        bDecision = False\n        if(dDeviation <= lThreshold):\n            # images are considered to be the same\n            bDecision = True\n\n        # push decision to array of decisions\n        aDecisions.append(bDecision)\n\n    # return decision and dictionary of metadata\n    return aDecisions, dicMetadata\n```\n\nHaving this wrapper around `dHash` we now can test the performance of different configurations of the algorithm. First of all create a instance of the `TestRunner` and define the number of threads it should use:\n\n```python\n    oRunner = TestRunner(lNrOfThreads=NR_OF_THREADS)\n```\n\nNow we can create some useful values for the two variable arguments and add single test for all of them to the `TestRunner` instance. The first parameter of `run_test_async` is the name of the challenge. The second is a pointer to the wrapper function defined before and the last is a dictionary define all further named arguments for the wrapper function.\n\n```python\n # iterate over thresholds\n    for lThreshold in np.arange(0.05, 0.5, 0.05):\n        # iterate over hash sizes\n        for lHashSize in [8, 16, 32]:\n            # add test to testrunner\n            oRunner.run_test_async(\"image_hashing_challenge_print_scan_1\",  test_dhash, {\n                                   \"lThreshold\": lThreshold, \"lHashSize\": lHashSize})\n```\n\nThe test will be executed as fast as a CPU is available to execute the thread. To ensure your script does not exit before all tests are done call the `wait_till_tests_finished()` function to wait for all threads being finished.\n\n```python\n    oRunner.wait_till_tests_finished()\n```\n\n## Analyze data\n\nAfter all your test are done you can get the database from the server and analyze the data. Twizzle supplies you with an `AnalysisDataGenerator` component. It will collect and merge all tests and the corresponding challenges and give you a [pandas](https://pandas.pydata.org/) dataframe. Have a look at `example_analyser.py` to get an idea how to use the component.\n\n`AnalysisDataGenerator` provides you with the `get_pandas_dataframe()` function to get all data as pandas dataframe. Additionally you can save all data to a `csv` file by calling `save_pandas_dataframe_to_file(sPathToFile)`.\n\n## MISC:\n\nTwizzl offers many utils and predefined manipulation functions for the test of perceptual image hashing. Read the corresponding documentation of the [Challenge Creator script](CC_PIH.md)",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/twizzle-code/twizzle",
        "keywords": "",
        "license": "GNU GPLv3",
        "maintainer": "",
        "maintainer_email": "",
        "name": "twizzle",
        "package_url": "https://pypi.org/project/twizzle/",
        "platform": "Linux",
        "project_url": "https://pypi.org/project/twizzle/",
        "project_urls": {
            "Homepage": "https://github.com/twizzle-code/twizzle"
        },
        "release_url": "https://pypi.org/project/twizzle/0.1.1/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "A multi-purpose benchmarking framework",
        "version": "0.1.1"
    },
    "last_serial": 5443294,
    "releases": {
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a66c23c9d713e9921a166f1c8c6d221d",
                    "sha256": "9660c69247aaaaa3ddaf6636aebee8fe445b9f1b3fcf461d413cfe1d6ee2b95c"
                },
                "downloads": -1,
                "filename": "twizzle-0.1.1.linux-x86_64.tar.gz",
                "has_sig": false,
                "md5_digest": "a66c23c9d713e9921a166f1c8c6d221d",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 26758,
                "upload_time": "2019-06-25T00:55:47",
                "url": "https://files.pythonhosted.org/packages/88/6c/dae36f258519fbd8b70261131358ac94f553cfa1d2564698b9402def2b2c/twizzle-0.1.1.linux-x86_64.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "a66c23c9d713e9921a166f1c8c6d221d",
                "sha256": "9660c69247aaaaa3ddaf6636aebee8fe445b9f1b3fcf461d413cfe1d6ee2b95c"
            },
            "downloads": -1,
            "filename": "twizzle-0.1.1.linux-x86_64.tar.gz",
            "has_sig": false,
            "md5_digest": "a66c23c9d713e9921a166f1c8c6d221d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26758,
            "upload_time": "2019-06-25T00:55:47",
            "url": "https://files.pythonhosted.org/packages/88/6c/dae36f258519fbd8b70261131358ac94f553cfa1d2564698b9402def2b2c/twizzle-0.1.1.linux-x86_64.tar.gz"
        }
    ]
}