{ "info": { "author": "Caleb Hattingh", "author_email": "caleb.hattingh@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "Intended Audience :: Information Technology", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3", "Topic :: Internet :: WWW/HTTP :: Browsers", "Topic :: Internet :: WWW/HTTP :: Indexing/Search", "Topic :: Multimedia :: Graphics", "Topic :: System :: Archiving" ], "description": "========================\ngoogle-images-downloader\n========================\n\nAn experiment in browser automation\n\n\u26a0 Warning \u26a0\n===========\n\nCalling this **alpha quality** would be a kind and noble gesture.\nDon't use it for anything.\nDon't expect anything to work. If it destroys your computer you're\non your own. This is a hobby project, and in all likelihood will eventually\nbe abandoned, perhaps even by next Tuesday.\n\nIf you create issues, I'll expect you to help work on them.\nPull requests, as always, are very welcome.\n\nIntroduction\n============\n\nThis library pulls images out of `Google Images `_\nsearch results and saves them to disk. The neat trick is *not*\nthat it saves the images in the search results, instead it saves the\n**original source images** (e.g. high-res images) that the search results\nrefers to.\n\nThis is made possible by the\n`Chrome Remote Debugging API `_\nwhich also means you've discovered the first gotcha: this only works on\nthe Chrome browser.\n\nWhence the name?\n================\n\nDefinition of **GID** ler: **G** oogle **I** mage **D** ownloader.\n\nInstall\n=======\n\nThe usual will work, but with caveats:\n\n.. code-block:: shell\n\n $ python -m pip install gidler\n\n\nThe caveats are that you're *probably* going to need **Python >= 3.5** for this.\nI don't have a lot of free time for hobby projects, and they're how I\nexperiment with new Python features. It is an incredible amount of work to\nmake a Python package that works on everything (I've done it for other projects),\nand I just don't have the\ntime and/or energy. *If you want it to work on 2.7, and you provide a working\nPR, I will\nvery likely merge that in*. I just don't have time to do it myself.\n\n**However**: you don't actually need to do all that work. Just use\nAnaconda Python. Using conda, you can create a new environment with the\nright version of Python, and then pip install into that:\n\n.. code-block:: shell\n\n $ conda create -n mygidlerenv python=3.5\n $ source activate mygidlerenv\n (mygidlerenv) $ python -m pip install gidler\n\n\nUsing and Abusing\n=================\n\nStep 1\n------\n\nFirst start up Chrome with remote debugging activated on a specific port::\n\n $ --remote-debugging-port=9222\n\nNow we can play that instance like a marionette!\n\nExample using Chromium browser (on my Mac)::\n\n $ open /Users/calebhattingh/Applications/Chromium.app \\\n --args -remote-debugging-port=9222\n\nIf you get this working on Windows or Linux, let me know and I'll add\nmore examples here.\n\nStep 2\n------\n\nYou can execute the module directly from the command-line:\n\n.. code-block:: bash\n\n python -m gidler -p 9222 --max 5 -q \"mandala\"\n\nThis:\n\n#. Starts up **gidler**...\n#. ...on port **9222** (this must match what we gave chrome)...\n#. ...returning no more than **5 images**...\n#. with a query string of \"mandala\"\n\nThis query string is the same as what you would type into the Google Images\nsearch box, so e.g., this all works: \"site:deviantart.com sketch portrait\"\n\nYou can also ``python -m gidler -h`` to see the help.\n\nCurrent status\n==============\n\nIt works on my machine\u2122.\n\nThe script tells Chrome to do an image search, using the given query\nstring on the CLI. Then, the content of the page is parsed to extract\nthe original image URLs, which are then downloaded separately with\n`urllib` inside a thread pool with 8 workers (yet another hard-coded\nsettings that will eventually become a CLI option...)\n\nThis means that Google is getting hit only with the initial search query,\nnot the all the subsequent (large) image downloads.\n\nFuture steps\n============\n\nCurrently, several things are hard-coded:\n\n* The \"large\" filter is automatically set. This is quite restrictive, and\n is probably not what you want all the time. This should be a CLI option``*``.\n If you peek in the source code, you'll see some documentation about all the\n possible settings; you can even specify width and height requirements. None\n of that is configurable yet though\"\\*\".\n* If no `max` is given, all the images on the first page of results are\n fetched. The code even forces scroll actions to the bottom of the page\n in order to get Chrome to load all 400. This might not be what you want.\n* The images are saved into a new subfolder in the local folder. This should\n be a CLI option\\*\n* The subfolder name is a slugified version of the query string, plus a\n small uuid (so that you can run the same query multiple times with no\n collisions)\n* The image names are the *original* image names, prefixed also with a\n small uuid to avoid collisions in case multple images have the same filename.\n* timeouts, and other applied pauses are all hardcoded. The pauses are\n largely to give Chrome a chance to complete the previous instruction. I\n tweaked these for my situation, but you may find longer pauses are necessary.\n* The work was done on OS X. I have *no idea*\\* whether this will work on\n other platforms.\n\n\\*PRs welcome.\n\n\nBut Selenium/ABC/XYZ already exists!\n------------------------------------\n\nYes, yes, I know there are other tools. I wanted a more lightweight option.\nCurrently, this library really only *depends on* Chrome and Python, although\nthere are several of the usual suspects in the `requires` list. (At the time\nof writing, `requires` lists `chromote` and `python-slugify`, but those\neach bring in a few other things, like `requests`, `ws4py` and so on.)\n\nWhy are you `require`ing your own fork of the `chromote` library?\n-----------------------------------------------------------------\n\nThe `chromote` package provides a Python abstraction for Chrome Remote\nDebugging API. Currently, `chromote` uses the `websocket-client` package\nwhich has been terribly unstable for me. Sometimes `ws.recv()` returns, but\nwith nothing. In my fork I changed to use the high-quality `ws4py` package and\nsince then the connection to the debugging API has been rock solid.", "description_content_type": null, "docs_url": null, "download_url": null, "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/cjrh/google-images-downloader", "keywords": null, "license": null, "maintainer": null, "maintainer_email": null, "name": "gidler", "package_url": "https://pypi.org/project/gidler/", "platform": null, "project_url": "https://pypi.org/project/gidler/", "project_urls": { "Homepage": "https://github.com/cjrh/google-images-downloader" }, "release_url": "https://pypi.org/project/gidler/0.0.2/", "requires_dist": [ "python-slugify (>=1.2)", "git+https://github.com/cjrh/chromote.git@switch-to-ws4py" ], "requires_python": "3", "summary": "gidler", "version": "0.0.2" }, "last_serial": 2082526, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "f6ad17f68d0c1772b039685d1631a56a", "sha256": "dbe13b85744aea82b7dab04d81633cd344fa3a60a0e130a2879d4a8afcaae6a4" }, "downloads": -1, "filename": "gidler-0.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "f6ad17f68d0c1772b039685d1631a56a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "3", "size": 6522, "upload_time": "2016-04-25T10:17:15", "url": "https://files.pythonhosted.org/packages/bf/e2/35783ab64788464c56c2ca4ea182c17cd6b171687d2cd97ec234c0f84dbe/gidler-0.0.0-py3-none-any.whl" } ], "0.0.1": [ { "comment_text": "", "digests": { "md5": "ec26e2d3e1a9915856086f87b427dd0c", "sha256": "ea5bec6f7d6e324122e733be81ca1d353b6cf0082e551ccf94a0caa605af8a87" }, "downloads": -1, "filename": "gidler-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "ec26e2d3e1a9915856086f87b427dd0c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "3", "size": 6555, "upload_time": "2016-04-25T10:30:26", "url": "https://files.pythonhosted.org/packages/0d/d8/003d0e81e10583097806395368f660ad944fd072ab6a3b773f3e3aead52e/gidler-0.0.1-py3-none-any.whl" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "e4cab856c1ee58a77737e81205fb2ab3", "sha256": "c9ec04eb27ac9c40f7e46bfb7c6a3f924a5b5773dd1a644ba9d470b125d90f27" }, "downloads": -1, "filename": "gidler-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e4cab856c1ee58a77737e81205fb2ab3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "3", "size": 6558, "upload_time": "2016-04-25T10:38:26", "url": "https://files.pythonhosted.org/packages/71/ac/80cfbcde4668a0c7c81ddf8f1d85957b1ba95f22db8e288991da3becb3f4/gidler-0.0.2-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e4cab856c1ee58a77737e81205fb2ab3", "sha256": "c9ec04eb27ac9c40f7e46bfb7c6a3f924a5b5773dd1a644ba9d470b125d90f27" }, "downloads": -1, "filename": "gidler-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e4cab856c1ee58a77737e81205fb2ab3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": "3", "size": 6558, "upload_time": "2016-04-25T10:38:26", "url": "https://files.pythonhosted.org/packages/71/ac/80cfbcde4668a0c7c81ddf8f1d85957b1ba95f22db8e288991da3becb3f4/gidler-0.0.2-py3-none-any.whl" } ] }