{ "info": { "author": "Paul M. Winkler", "author_email": "slinkp@gmail.com", "bugtrack_url": null, "classifiers": [ "Environment :: Console", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Topic :: Internet :: WWW/HTTP", "Topic :: Internet :: WWW/HTTP :: Site Management :: Link Checking" ], "description": "Spydey\n=======\n\nA simple web spider with several recursion strategies.\nHome page is at http://github.com/slinkp/spydey.\n\nIt doesn't do much except follow links and report status. I mostly\nuse it for quick and dirty smoke testing and link checking.\n\nThe only unusual feature is the ``--traversal=pattern`` option, which\ndoes recursive traversal in an unusual order: It tries to recognize\npatterns in URLs, and will follow URLs of novel patterns before those\nwith patterns it has seen before. When there are no novel patterns to\nfollow, it follows random links to URLs of known patterns. If you use\nthis for smoke-testing a typical modern web app that maps URL\npatterns to views/controllers, this will very quickly hit all your\nviews/controllers at least once... usually. But it's not very\ninteresting when pointed at a website that has arbitrarily deep trees\n(static files, VCS repositories, and the like).\n\nAlso, it's designed so that adding a new recursion strategy is\ntrivial. Spydey was originally written for the purpose of\nexperimenting with different recursive crawling strategies. Read the\nsource.\n\nOh, and if you install Fabulous, console output is in color.\n\nFor lazy, zero-configuration smoke testing, I typically run it like::\n\n spydey -r --stop-on-error --max-requests=200 --traversal=pattern --profile --log-referrer URL\n\nThere are a number of other command-line options, many stolen from\nwget. Use ``--help`` to see what they are.\n\nUsage\n=======\n\n::\n\n Usage: spydey [options] URL\n \n Options:\n -h, --help show this help message and exit\n -r, --recursive Recur into subdirectories\n -p, --page-requisites\n Get all images, etc. needed to display HTML page.\n --no-parent Don't ascend to the parent directory.\n -R REJECT, --reject=REJECT\n Regex for filenames to reject. May be given multiple\n times.\n -A ACCEPT, --accept=ACCEPT\n Regex for filenames to accept. May be given multiple\n times.\n -t TRAVERSAL, --traversal=TRAVERSAL, --traverse=TRAVERSAL\n Recursive traversal strategy. Choices are: breadth-\n first, depth-first, hybrid, pattern, random\n -H, --span-hosts Go to foreign hosts when recursive.\n -w WAIT, --wait=WAIT Wait SECONDS between retrievals.\n --random-wait=RANDOM_WAIT\n Wait from 0...2*WAIT secs between retrievals.\n --loglevel=LOGLEVEL Log level.\n --log-referrer, --log-referer\n Log referrer URL for each request.\n --transient-log Use Fabulous transient logging config.\n --max-redirect=MAX_REDIRECT\n Maximum number of redirections to follow for a\n resource.\n --max-requests=MAX_REQUESTS\n Maximum number of requests to make before exiting. (-1\n used with --traversal=pattern means exit when out of\n new patterns)\n --stop-on-error Stop after the first HTTP error (response code 400 or\n greater).\n -T TIMEOUT, --timeout=TIMEOUT\n Set the network timeout in seconds. 0 means no\n timeout.\n -P, --profile Print the time to download each resource, and a\n summary of the 20 slowest at the end.\n --stats Print a summary of traversal patterns, if\n --traversal=pattern\n -v, --version Print version information and exit.\n \nChangelog\n=========\n\n0.5\n---\n\n* Remove useless pattern stats unless --stats is given\n* Fix to prevent spanning hosts when following redirects, unless -H is on.\n\n0.4\n---\n\n* Add ``--stop-on-error`` option\n* Add ``--max-requests=-1`` to mean stop after all patterns are seen (when used with --traversal=pattern)\n* Add usage text automatically to pkg info\n\n\n0.3\n---\n\n* Better redirect handling: obeys -A, -R, --max-redirect, and --max-requests options\n* Minor bugfixes and refactoring", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/slinkp/spydey", "keywords": "", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "spydey", "package_url": "https://pypi.org/project/spydey/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/spydey/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/slinkp/spydey" }, "release_url": "https://pypi.org/project/spydey/0.5/", "requires_dist": null, "requires_python": null, "summary": "A simple web spider with pluggable recursion strategies", "version": "0.5" }, "last_serial": 799988, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "56d159ae8de9fcc889fa6c43bff49707", "sha256": "098026f9d5da35b15282aa6ef035861bd1a040545d59240a320a9cb87a7269f5" }, "downloads": -1, "filename": "spydey-0.1.tar.gz", "has_sig": false, "md5_digest": "56d159ae8de9fcc889fa6c43bff49707", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7946, "upload_time": "2010-12-07T20:15:17", "url": "https://files.pythonhosted.org/packages/eb/33/25763e044afb693e0053e6f4d8445ed17e28bb862d36c1c64833a0a9cb90/spydey-0.1.tar.gz" } ], "0.2": [ { "comment_text": "", "digests": { "md5": "315d32d75cfb2ce24d499f94e6d7d836", "sha256": "f931f8f70b14ded616b86c5405ac0ebcc60a7c3a16e2b0411a2b85363aa8c8a3" }, "downloads": -1, "filename": "spydey-0.2.tar.gz", "has_sig": false, "md5_digest": "315d32d75cfb2ce24d499f94e6d7d836", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8638, "upload_time": "2010-12-08T21:34:28", "url": "https://files.pythonhosted.org/packages/97/77/6a85e4f6f34267bab6b06b61f62d6ae1c9910c8633689128df2f4c8567e1/spydey-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "d0a8f7da11283942b2fd30e4db1b4bd9", "sha256": "3fed456104cff28e10de27eed7bdd6012612d410efdd9850ab544a5df41aa24c" }, "downloads": -1, "filename": "spydey-0.3.tar.gz", "has_sig": false, "md5_digest": "d0a8f7da11283942b2fd30e4db1b4bd9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8991, "upload_time": "2010-12-15T22:55:28", "url": "https://files.pythonhosted.org/packages/7e/0b/cecc7d85d02f4e0e239dfc1284ebe88a26fc7968be593be674bd8b2b4db2/spydey-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "2e1d304dffcd7e147429566be97331e5", "sha256": "1a215c92c16e02423e21a593975c0173ac0d2dd75e33b1154767d9d6d405c767" }, "downloads": -1, "filename": "spydey-0.4.tar.gz", "has_sig": false, "md5_digest": "2e1d304dffcd7e147429566be97331e5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9496, "upload_time": "2011-06-17T16:56:06", "url": "https://files.pythonhosted.org/packages/e6/5c/857660363add2efee5f4797c951753331f37caf30d13f69f0e03c6e6bf8b/spydey-0.4.tar.gz" } ], "0.4r1": [ { "comment_text": "", "digests": { "md5": "f2520a3bdbe4b65cdabc1aaf897f2e1a", "sha256": "8a168755c633f0adc03e729fa68a03d0f5e7cf22dbf22ede09376e0cd78eecf6" }, "downloads": -1, "filename": "spydey-0.4r1.tar.gz", "has_sig": false, "md5_digest": "f2520a3bdbe4b65cdabc1aaf897f2e1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10310, "upload_time": "2011-06-17T19:33:45", "url": "https://files.pythonhosted.org/packages/17/30/76e91e6142d4c91a0ee788313820b600c62095a0f5a6afc6a681443b63d1/spydey-0.4r1.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "a7f174988d1102e8bc68fd6adcc7e057", "sha256": "05280a0a85fd6b00667b8c32be7e5c38cc8b1077473ce2efc218cdf0c92c2ae7" }, "downloads": -1, "filename": "spydey-0.5.tar.gz", "has_sig": false, "md5_digest": "a7f174988d1102e8bc68fd6adcc7e057", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10500, "upload_time": "2012-02-09T19:40:32", "url": "https://files.pythonhosted.org/packages/48/2b/f3202c99f27fbe4403bff755df58b5064fb92896743f4a9cd0c78a16dd85/spydey-0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a7f174988d1102e8bc68fd6adcc7e057", "sha256": "05280a0a85fd6b00667b8c32be7e5c38cc8b1077473ce2efc218cdf0c92c2ae7" }, "downloads": -1, "filename": "spydey-0.5.tar.gz", "has_sig": false, "md5_digest": "a7f174988d1102e8bc68fd6adcc7e057", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10500, "upload_time": "2012-02-09T19:40:32", "url": "https://files.pythonhosted.org/packages/48/2b/f3202c99f27fbe4403bff755df58b5064fb92896743f4a9cd0c78a16dd85/spydey-0.5.tar.gz" } ] }