{ "info": { "author": "Amol Umrale", "author_email": "babaiscool@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Framework :: Scrapy", "Framework :: Twisted", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 2.7", "Topic :: Internet :: WWW/HTTP :: Indexing/Search" ], "description": "========\nimagebot\n========\n\nThis bot (image scraper) crawls a given url(s) and downloads all the images.\n\nFeatures\n========\n\n* Supported platforms: Linux / Windows / Python 2.7.\n* Maintains a database of all downloaded images to avoid duplicate downloads.\n* Optionally, it can scrape only under a particular url, e.g. scraping *\\http://website.com/albums/new* with this option will only download from new album.\n* Filters urls by regex.\n* Filters images by minimum size.\n* Scrapes through javascript popup links (limited support).\n* Live monitor window for displaying images as they are scraped.\n* Asynchronous i/o design using scrapy and twisted.\n\nUsage\n=====\n\n**crawl command:**\n\n* Scrape images::\n\n\timagebot crawl http://website.com\n\timagebot crawl http://website.com,http://otherwebsite.com\n\n* Options for crawl command:\n\n\t*-d, --domains*\n\n\tScrape images while allowing images to be downloaded from other domain(s) (add multiple domains with comma separated list). The domain in the start url(s) is(are) allowed by default.\n\n\t``imagebot crawl http://website.com -d otherwebsite.com,anotherwebsite.com``\n\t\t\t\t\t\n\t*-is, --images-store*\n\t\t\t\t\n\tSpecify image store location. Default: ~/Pictures/crawled/[jobname]\n\n\t``imagebot crawl http://website.com -is /home/images``\n\t\n\t*-s, --min-size*\n\n\tSpecify minimum size of image to be downloaded (width x height).\n\n\t``imagebot crawl http://website.com -s 300x300``\n\n\t*-u, --stay-under*\n\n\tStay under the start url. Only those urls that have the start url as prefix will be crawled. Useful, for example, to crawl an album or a subsection on a website.\n\n\t``imagebot crawl http://website.com/albums/new -u``\n\n\t*-m, --monitor*\n\n\tLaunch monitor window for displaying images as they are scraped.\n\n\t``imagebot crawl http://website.com -m``\n\n\t*-a, --user-agent*\n\n\tSet user-agent string. Default: imagebot. It is recommended to change it to identify your bot as a matter of responsible crawling.\n\n\t``imagebot crawl http://website.com -a \"my_imagebot(http://mysite.com)\"``\n\n\t*-r, --url-regex*\n\n\tSpecify regex for urls. Only those urls matching the regex will be crawled. It does not apply to start url(s).\n\n\t``imagebot crawl http://website.com -r .*gallery.*``\n\n\t*-dl, --depth-limit*\n\n\tSpecify depth limit for crawling. Use value of 0 to scrape only on start url(s). \n\n\t``imagebot crawl http://website.com -dl 2``\n\n\t*--no-cdns*\n\n\tA list of well known cdn's is included and enabled by default for image downloads. Use this option to disable it.\n\n\t*-at, --auto-throttle*\n\n\tEnable auto throttle feature of scrapy. (`details in scrapy docs `_).\n\n\t*-j, --jobname*\n\n\tSpecify a job name. This will be used to store image meta data in the database. By default, domain name of the start url is used as the job name.\n\n\t*-nc, --no-cache*\n\n\tDisable http caching.\n\n\t*-l, --log-level*\n\n\tSpecify log level.\n\tSupported levels: info, silent, critical, error, debug, warning. Default: error.\n\n\t``imagebot crawl http://website.com -l debug``\n\n\t*-h, --help*\n\n\tGet help on crawl command options.\n\n**clear command:**\n\n* This command is useful for various kinds of cleanup.\n\n* Options for clear command:\t\n\n\t*--cache*\n\n\tClear http cache.\n\t\n\t*--db*\n\n\tRemove image metadata for a job from the database.\n\n\t``imagebot clear --db website.com``\n\n\t*--duplicate-images*\n\n\tMultiple copies of same image may be downloaded due to different urls. Use this option to delete duplicate images for a job.\n\n\t``imagebot clear --duplicate-images website.com``\n\n\t*-h, --help*\n\n\tGet help on clear command options.\n\nDependencies\n============\n\n#. pywin32 (http://sourceforge.net/projects/pywin32/)\n\n\tNeeded on Windows.\n\n#. python-gi (Python GObject Introspection API)\n\n\tNeeded on Linux for gtk UI. (Optional). If not found, python built-in Tkinter will be used.\n\tOn Ubuntu: ``apt-get install python-gi``\n\n#. scrapy (web crawling framework)\n\n\tIt will be automatically installed by pip.\n\n#. Pillow (Python Imaging Library)\n\n\tIt will be automatically installed by pip.\n\nDownload\n========\n\n* PyPI: http://pypi.python.org/pypi/imagebot/\n* Source: https://github.com/amol9/imagebot/\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/imagebot/", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "imagebot", "package_url": "https://pypi.org/project/imagebot/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/imagebot/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/imagebot/" }, "release_url": "https://pypi.org/project/imagebot/1.2.1/", "requires_dist": null, "requires_python": null, "summary": "A web bot to crawl websites and scrape images.", "version": "1.2.1" }, "last_serial": 1631127, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "0327754c305ad0095b376933c232cc7c", "sha256": "5f0d8b8d3a497a6a07a89c0c30f52d802b85a34a96acc72e2316b87291625723" }, "downloads": -1, "filename": "imagebot-1.0.tar.gz", "has_sig": false, "md5_digest": "0327754c305ad0095b376933c232cc7c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11820, "upload_time": "2015-01-31T15:27:34", "url": "https://files.pythonhosted.org/packages/e1/32/1d0b3e50547778ddc0def04c7ebbea90e2a278df0dd3f0149e3358e98677/imagebot-1.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "b324c0a0330897c20a9765b47ed5ff99", "sha256": "aca83d88fdd40d27d48e904d6cf56ff0c73faf0613c6dd27b99899e6b9704dcb" }, "downloads": -1, "filename": "imagebot-1.0.1.tar.gz", "has_sig": false, "md5_digest": "b324c0a0330897c20a9765b47ed5ff99", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11860, "upload_time": "2015-01-31T15:46:28", "url": "https://files.pythonhosted.org/packages/b0/39/8f76dc32bb6c9d2984e3137948011bdb0b7cfe0670cee25154ff4b6c9882/imagebot-1.0.1.tar.gz" } ], "1.0.2": [ { "comment_text": "", "digests": { "md5": "c02cb8d652b511bbdce85796f2947b43", "sha256": "ea091b734269c896967a159a7de48c8aee8d351d2ac3ab335dc9cd42aa548551" }, "downloads": -1, "filename": "imagebot-1.0.2.tar.gz", "has_sig": false, "md5_digest": "c02cb8d652b511bbdce85796f2947b43", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11886, "upload_time": "2015-02-01T13:22:28", "url": "https://files.pythonhosted.org/packages/c8/03/55a8b1cd1832cc992755b412b0d4ec2d8432922422a1bb2694c660fcbd0c/imagebot-1.0.2.tar.gz" } ], "1.0.3": [ { "comment_text": "", "digests": { "md5": "0b9544c96700f2233c0eb90e0687816a", "sha256": "b9080ae9267aa22972ec812e549e1de1aa0dc46ef32ee421a50fc3e77ebef776" }, "downloads": -1, "filename": "imagebot-1.0.3.tar.gz", "has_sig": false, "md5_digest": "0b9544c96700f2233c0eb90e0687816a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13046, "upload_time": "2015-02-03T18:57:45", "url": "https://files.pythonhosted.org/packages/eb/8f/0d497b286f2fd73df383f893d8e01fe04eefd7c35b0b622e39f5eeff10c4/imagebot-1.0.3.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "66c0d57f5e875b2a22df79978cf39a80", "sha256": "c20d514d6f5490a141dc841af66edf024ae3a54ab05eaaac63d74ef88a388945" }, "downloads": -1, "filename": "imagebot-1.1.0.tar.gz", "has_sig": false, "md5_digest": "66c0d57f5e875b2a22df79978cf39a80", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20896, "upload_time": "2015-02-23T16:33:58", "url": "https://files.pythonhosted.org/packages/0b/fa/dcb07e436856e5221cb24b6179b6e864153929d81147ed247a29d84747ec/imagebot-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "7772f3827995a4f675d892764a3757ec", "sha256": "a5ad30d7fdc5a4f44923caa7ae36041de636be4e8ed737477184e7951fd48cba" }, "downloads": -1, "filename": "imagebot-1.1.1.tar.gz", "has_sig": false, "md5_digest": "7772f3827995a4f675d892764a3757ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21342, "upload_time": "2015-02-24T12:41:08", "url": "https://files.pythonhosted.org/packages/ff/b8/e5feaa729bc851b1cf1c28846ba406b7e7305fb542e324be63bf8b4e4e23/imagebot-1.1.1.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "30f40dd09eb67c87255e3b4fbf64f5f6", "sha256": "f6fd3dfd1c16bfc5b0b75c3fd0cae891b956092b8cff980366ea2967f674f727" }, "downloads": -1, "filename": "imagebot-1.2.0.tar.gz", "has_sig": false, "md5_digest": "30f40dd09eb67c87255e3b4fbf64f5f6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22334, "upload_time": "2015-03-06T13:27:03", "url": "https://files.pythonhosted.org/packages/9b/37/aa4bc67dd761f8ac2eda730c33a9b71feff525e133dfc7d5cb7db760b159/imagebot-1.2.0.tar.gz" } ], "1.2.1": [ { "comment_text": "", "digests": { "md5": "b51b896d7fb0b177d1ef6eacafdcb914", "sha256": "6a5f44d3a97310b7f5b2af3b088718bc5becf5560ba423ff72e8b88dd1341d37" }, "downloads": -1, "filename": "imagebot-1.2.1.tar.gz", "has_sig": false, "md5_digest": "b51b896d7fb0b177d1ef6eacafdcb914", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15928, "upload_time": "2015-07-13T13:02:55", "url": "https://files.pythonhosted.org/packages/8e/44/6384661b9942f16056240ba1fb9c0f4397686e4cfa2a45e1e4c059e65bae/imagebot-1.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b51b896d7fb0b177d1ef6eacafdcb914", "sha256": "6a5f44d3a97310b7f5b2af3b088718bc5becf5560ba423ff72e8b88dd1341d37" }, "downloads": -1, "filename": "imagebot-1.2.1.tar.gz", "has_sig": false, "md5_digest": "b51b896d7fb0b177d1ef6eacafdcb914", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15928, "upload_time": "2015-07-13T13:02:55", "url": "https://files.pythonhosted.org/packages/8e/44/6384661b9942f16056240ba1fb9c0f4397686e4cfa2a45e1e4c059e65bae/imagebot-1.2.1.tar.gz" } ] }