{
    "info": {
        "author": "Felipe Aguirre Martinez",
        "author_email": "felipeam86@gmail.com",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "imgdl\n=====\n\nPython package for downloading a collection of images from a list of\nurls. It comes with the following features:\n\n-  Downloads are multithreaded using ``concurrent.futures``.\n-  Relies on a persistent cache. Already downloaded images are not\n   downloaded again, unless you force ``imgdl`` to do so.\n-  Can hide requests behind proxies\n-  It can be used as a command line utility or as a python library.\n-  Normalizes images to JPG format + RGB mode after download.\n-  Generates thumbnails of varying sizes automatically.\n-  Can space downloads with a random timeout drawn from an uniform\n   distribution.\n\nInstallation\n------------\n\n.. code:: bash\n\n    pip install imgdl\n\nOr, from the root project directory:\n\n.. code:: bash\n\n    pip install .\n\nUsage\n-----\n\nHere is a simple example using the default configurations:\n\n.. code:: python\n\n    from imgdl import download\n\n    urls = [\n        'https://upload.wikimedia.org/wikipedia/commons/9/92/Moh_%283%29.jpg',\n        'https://upload.wikimedia.org/wikipedia/commons/8/8b/Moh_%284%29.jpg',\n        'https://upload.wikimedia.org/wikipedia/commons/c/cd/Rostige_T%C3%BCr_P4RM1492.jpg'\n    ]\n\n    paths = download(urls, store_path='~/.datasets/images', n_workers=50)\n\n``100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:08<00:00,  2.68s/it]``\n\nImages will be downloaded to ``~/.datasets/images`` using 50 threads.\nThe function returns the list of paths to each image. Paths are\nconstructed as ``{store_data}/{SHA1-hash(url).jpg}``. If for any reason a\ndownload fails, ``imgdl`` returns a ``None`` as path.\n\nNotice that if you invoke ``download`` again with the same urls, it\nwill not download them again as it will check first that they are\nalready downloaded.\n\n.. code:: python\n\n    paths = download(urls, store_path='~/.datasets/images', n_workers=50)\n\n``100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 24576.00it/s]``\n\nDownload was instantaneous! and ``imgdl`` is clever enough to return\nthe image paths.\n\nHere is the complete list of parameters taken by ``download``:\n\n-  ``iterator``: The only mandatory parameter. Usually a list of urls,\n   but can be any kind of iterator.\n-  ``store_path``: Root path where images should be stored\n-  ``n_workers``: Number of simultaneous threads to use\n-  ``timeout``: Timeout that the url request should tolerate\n-  ``thumbs``: If True, create thumbnails of sizes according to\n   thumbs_size\n-  ``thumbs_size``: Dictionary of the kind {name: (width, height)}\n   indicating the thumbnail sizes to be created.\n-  ``min_wait``: Minimum wait time between image downloads\n-  ``max_wait``: Maximum wait time between image downloads\n-  ``proxies``: Proxy or list of proxies to use for the requests\n-  ``headers``: headers to be given to ``requests``\n-  ``user_agent``: User agent to be used for the requests\n-  ``notebook``: If True, use the notebook version of tqdm progress bar\n-  ``debug`` If True, ``imgdl`` logs urls that could not be downloaded\n-  ``force``: ``download`` checks first if the image already exists on\n   ``store_path`` in order to avoid double downloads. If you want to\n   force downloads, set this to True.\n\nMost of these parameters can also be set on a ``config.yaml`` file found\non the directory where the Python process was launched. See\n`config.yaml.example`_\n\nCommand Line Interface\n----------------------\n\nIt can also be used as a command line utility:\n\n.. code:: bash\n\n    $ imgdl --help\n    usage: imgdl [-h] [-o STORE_PATH] [--thumbs THUMBS] [--n_workers N_WORKERS]\n                 [--timeout TIMEOUT] [--min_wait MIN_WAIT] [--max_wait MAX_WAIT]\n                 [--proxy PROXY] [-u USER_AGENT] [-f] [--notebook] [-d]\n                 urls\n\n    Bulk image downloader from a list of urls\n\n    positional arguments:\n      urls                  Text file with the list of urls to be downloaded\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -o STORE_PATH, --store_path STORE_PATH\n                            Root path where images should be stored (default:\n                            ~/.datasets/imgdl)\n      --thumbs THUMBS       Thumbnail size to be created. Can be specified as many\n                            times as thumbs sizes you want (default: None)\n      --n_workers N_WORKERS\n                            Number of simultaneous threads to use (default: 50)\n      --timeout TIMEOUT     Timeout to be given to the url request (default: 5.0)\n      --min_wait MIN_WAIT   Minimum wait time between image downloads (default:\n                            0.0)\n      --max_wait MAX_WAIT   Maximum wait time between image downloads (default:\n                            0.0)\n      --proxy PROXY         Proxy or list of proxies to use for the requests\n                            (default: None)\n      -u USER_AGENT, --user_agent USER_AGENT\n                            User agent to be used for the requests (default:\n                            Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0)\n                            Gecko/20100101 Firefox/55.0)\n      -f, --force           Force the download even if the files already exists\n                            (default: False)\n      --notebook            Use the notebook version of tqdm (default: False)\n      -d, --debug           Activate debug mode (default: False)\n\n\nDownload images from google\n===========================\n\nThis is an example of how we can use ``imgdl`` to download images from a google image search.\nI currently use this to quickly build up image datasets. I took inspiration from `this`_ blog\npost by `pyimagesearch`_.\n\nRequirements\n------------\n\nInstall imgdl with the ``[google]`` extra requirements:\n\n.. code:: bash\n\n    pip install imgdl[google]\n\n\nDownload the webdriver for Chrome `here`_  and make sure it\u2019s in your PATH, e. g., place it in /usr/bin or /usr/local/bin.\n\n.. code:: bash\n\n    sudo cp chromedriver /usr/local/bin/\n\nClone this repository, or simply download the ``google.py`` script.\n\nUsage\n-----\n\n\nYou are ready to download images from a google images search. Here is an example of usage:\n\n.. code:: bash\n\n    $ python google.py \"paris by night\" -n 600 --interactive\n    Querying google images for 'paris by night'\n    Scrolling down five times\n    600 images found.\n    Downloading to /Users/aguirre/Projets/imagedownloader/examples/images\n    100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 600/600 [01:15<00:00,  7.91it/s]\n    2018-03-04 23:21:52,616 - imgdl.downloader - INFO - 0 images failed to download\n\nThe first argument is the query to be sent to google. With ``-n 600`` you are asking for at least 600 images.\nBy default, a google image query page has only 100 images and requires you to scroll down if you want more.\nWhat the script is doing is using `selenium`_ to simulate a browsing session and scroll down on google search.\nWith the ``--interactive`` flag, chrome will open and you will be able to see how it scrolls down in order to\nget more images. Here is the full list of the command line options:\n\n.. code:: bash\n\n    $ python google.py --help\n    usage: google.py [-h] [-n N_IMAGES] [--interactive] [-o STORE_PATH]\n                     [--thumbs THUMBS] [--n_workers N_WORKERS] [--timeout TIMEOUT]\n                     [--min_wait MIN_WAIT] [--max_wait MAX_WAIT] [--proxy PROXY]\n                     [-u USER_AGENT] [-f] [--notebook] [-d]\n                     query\n\n    Download images from a google images query\n\n    positional arguments:\n      query                 Query string to be executed on google images\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -n N_IMAGES, --n_images N_IMAGES\n                            Number of expected images to download (default: 100)\n      --interactive         Open up chrome interactively to see the search results\n                            and scrolling action. (default: False)\n      -o STORE_PATH, --store_path STORE_PATH\n                            Root path where images should be stored (default:\n                            images)\n      --thumbs THUMBS       Thumbnail size to be created. Can be specified as many\n                            times as thumbs sizes you want (default: None)\n      --n_workers N_WORKERS\n                            Number of simultaneous threads to use (default: 40)\n      --timeout TIMEOUT     Timeout to be given to the url request (default: 5.0)\n      --min_wait MIN_WAIT   Minimum wait time between image downloads (default:\n                            0.0)\n      --max_wait MAX_WAIT   Maximum wait time between image downloads (default:\n                            0.0)\n      --proxy PROXY         Proxy or list of proxies to use for the requests\n                            (default: None)\n      -u USER_AGENT, --user_agent USER_AGENT\n                            User agent to be used for the requests (default:\n                            Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0)\n                            Gecko/20100101 Firefox/55.0)\n      -f, --force           Force the download even if the files already exists\n                            (default: False)\n      --notebook            Use the notebook version of tqdm (default: False)\n      -d, --debug           Activate debug mode (default: False)\n\n\nAcknowledgements\n----------------\n\nImages used for tests are from the `wikimedia commons`_\n\n.. _config.yaml.example: config.yaml.example\n.. _wikimedia commons: https://commons.wikimedia.org\n.. _here: https://sites.google.com/a/chromium.org/chromedriver/downloads\n.. _this: https://www.pyimagesearch.com/2017/12/04/how-to-create-a-deep-learning-dataset-using-google-images/\n.. _pyimagesearch: https://www.pyimagesearch.com/\n.. _selenium: http://selenium-python.readthedocs.io/\n\n\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/felipeam86/imagedownloader",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "imgdl",
        "package_url": "https://pypi.org/project/imgdl/",
        "platform": "",
        "project_url": "https://pypi.org/project/imgdl/",
        "project_urls": {
            "Homepage": "https://github.com/felipeam86/imagedownloader"
        },
        "release_url": "https://pypi.org/project/imgdl/1.1.0/",
        "requires_dist": [
            "Pillow (>=4.2.1)",
            "requests (>=2.14.2)",
            "tqdm (>=4.15.0)",
            "PyYAML",
            "attrs",
            "python-json-logger",
            "jupyter; extra == 'docs'",
            "ipython; extra == 'docs'",
            "pandas; extra == 'docs'",
            "invoke; extra == 'docs'",
            "selenium; extra == 'google'",
            "beautifulsoup4; extra == 'google'",
            "lxml; extra == 'google'",
            "pytest; extra == 'tests'",
            "pytest-pep8; extra == 'tests'",
            "pep8; extra == 'tests'",
            "autopep8; extra == 'tests'",
            "pytest-xdist; extra == 'tests'",
            "pytest-cov; extra == 'tests'"
        ],
        "requires_python": "~=3.6",
        "summary": "Bulk image downloader from a list of urls",
        "version": "1.1.0"
    },
    "last_serial": 3777533,
    "releases": {
        "1.0.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "5094404930ef60e7b34b89ac78c42842",
                    "sha256": "d1a9334c4c841b5bf2d2fd378a56c0f4b96c093e38660971d157bdecb3653ff2"
                },
                "downloads": -1,
                "filename": "imgdl-1.0.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "5094404930ef60e7b34b89ac78c42842",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": "~=3.6",
                "size": 15203,
                "upload_time": "2018-04-10T23:06:48",
                "url": "https://files.pythonhosted.org/packages/ee/27/6964bb0240a8f33cae32918338b718cf356e4d104d63ee54a699fa61a57a/imgdl-1.0.0-py3-none-any.whl"
            }
        ],
        "1.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "0504695be675073d9534b6562a902a68",
                    "sha256": "b263304d89eac5900ff0913a004b8c0a344c275a758a5a6940a7ebc64ffe477e"
                },
                "downloads": -1,
                "filename": "imgdl-1.1.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "0504695be675073d9534b6562a902a68",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": "~=3.6",
                "size": 14962,
                "upload_time": "2018-04-18T15:11:21",
                "url": "https://files.pythonhosted.org/packages/96/db/f513563d9c9578bfbb325456c57d055f0c327a18ef7804d33b8da9f91deb/imgdl-1.1.0-py3-none-any.whl"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "0504695be675073d9534b6562a902a68",
                "sha256": "b263304d89eac5900ff0913a004b8c0a344c275a758a5a6940a7ebc64ffe477e"
            },
            "downloads": -1,
            "filename": "imgdl-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0504695be675073d9534b6562a902a68",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.6",
            "size": 14962,
            "upload_time": "2018-04-18T15:11:21",
            "url": "https://files.pythonhosted.org/packages/96/db/f513563d9c9578bfbb325456c57d055f0c327a18ef7804d33b8da9f91deb/imgdl-1.1.0-py3-none-any.whl"
        }
    ]
}