{ "info": { "author": "Steven Almeroth", "author_email": "sroth77@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Environment :: Web Environment", "Framework :: Scrapy", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Topic :: Internet :: WWW/HTTP", "Topic :: Internet :: WWW/HTTP :: Browsers", "Topic :: Internet :: WWW/HTTP :: HTTP Servers", "Topic :: Software Development :: Libraries", "Topic :: Software Development :: Libraries :: Application Frameworks", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "README\n\nScrapybox - a Scrapy GUI\n------------------------\n\nA RESTful async Python web server that runs arbitrary code within Scrapy spiders\nvia an HTML webapge interface.\n\n1. Server receives POST request:\n\n[26/Mar/2016:21:50:38 +0000] \"POST /api/eval HTTP/1.1\" 200 3172 \"-\" \"Mozilla/5.0 (X11; Linux...\"\n\n2. Server uses Scrapy to crawl site:\n\n2016-03-26 14:50:34 [scrapy] INFO: Enabled extensions:\n['scrapy.extensions.logstats.LogStats', 'scrapy.extensions.corestats.CoreStats']\n2016-03-26 14:50:34 [scrapy] INFO: Enabled downloader middlewares:\n['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',\n 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',\n 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',\n 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',\n 'scrapy.downloadermiddlewares.retry.RetryMiddleware',\n 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',\n 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',\n 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',\n 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',\n 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',\n 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',\n 'scrapy.downloadermiddlewares.stats.DownloaderStats']\n2016-03-26 14:50:34 [scrapy] INFO: Enabled spider middlewares:\n['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',\n 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',\n 'scrapy.spidermiddlewares.referer.RefererMiddleware',\n 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',\n 'scrapy.spidermiddlewares.depth.DepthMiddleware']\n2016-03-26 14:50:34 [scrapy] INFO: Enabled item pipelines:\n[]\n2016-03-26 14:50:34 [scrapy] INFO: Spider opened\n2016-03-26 14:50:34 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)\n2016-03-26 14:50:37 [scrapy] DEBUG: Crawled (404) (referer: None)\n2016-03-26 14:50:38 [scrapy] DEBUG: Crawled (200) (referer: None)\n2016-03-26 14:50:38 [scrapy] DEBUG: Scraped from <200 http://scrapy.org>\n{'request': {'encoding': 'utf-8', 'cookies': {}, 'headers': {'User-Agent': ['Mozilla/5.0 (X11; Linux...\n2016-03-26 14:50:38 [scrapy] INFO: Closing spider (finished)\n2016-03-26 14:50:38 [scrapy] INFO: Dumping Scrapy stats:\n{'downloader/request_bytes': 494,\n 'downloader/request_count': 2,\n 'downloader/request_method_count/GET': 2,\n 'downloader/response_bytes': 10719,\n 'downloader/response_count': 2,\n 'downloader/response_status_count/200': 1,\n 'downloader/response_status_count/404': 1,\n 'finish_reason': 'finished',\n 'finish_time': datetime.datetime(2016, 3, 26, 21, 50, 38, 238364),\n 'item_scraped_count': 1,\n 'log_count/DEBUG': 3,\n 'log_count/INFO': 7,\n 'response_received_count': 2,\n 'scheduler/dequeued': 1,\n 'scheduler/dequeued/memory': 1,\n 'scheduler/enqueued': 1,\n 'scheduler/enqueued/memory': 1,\n 'start_time': datetime.datetime(2016, 3, 26, 21, 50, 34, 479901)}\n2016-03-26 14:50:38 [scrapy] INFO: Spider closed (finished)\n2016-03-26 14:50:38 [aiohttp] INFO: 127.0.0.1 - [26/Mar/2016:21:50:38] \"POST /api/eval HTTP/1.1\" 200...\"\n\n\n3. JSON response is sent back to user:\n\n{\n items: [\n {\n request: {\n encoding: \"utf-8\",\n cookies: { },\n headers: {\n User-Agent: [\n \"Mozilla/5.0 (X11; Linux x86_64) Scrapybox/0.1 Scrapy/1.2 Python/3.5\"\n ],\n Accept: [\n \"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\"\n ],\n Accept-Language: [\n \"en\"\n ],\n Accept-Encoding: [\n \"gzip,deflate\"\n ]\n },\n url: \"http://scrapy.org\",\n method: \"GET\",\n meta: {\n download_slot: \"scrapy.org\",\n download_latency: 0.6338846683502197,\n download_timeout: 180,\n depth: 0\n }\n },\n response: {\n url: \"http://scrapy.org\",\n flags: [ ],\n headers: {\n Content-Type: [\n \"text/html; charset=utf-8\"\n ],\n Cache-Control: [\n \"max-age=600\"\n ],\n Last-Modified: [\n \"Thu, 03 Mar 2016 10:37:55 GMT\"\n ],\n X-Github-Request-Id: [\n \"BDADB31E:7CE4:240796FC:56F7042B\"\n ],\n Server: [\n \"GitHub.com\"\n ],\n Expires: [\n \"Sat, 26 Mar 2016 22:00:37 GMT\"\n ],\n Date: [\n \"Sat, 26 Mar 2016 21:50:37 GMT\"\n ],\n Access-Control-Allow-Origin: [\n \"*\"\n ]\n },\n body: {\n title: \"Scrapy | A Fast and Powerful Scraping and Web Crawling Framework\",\n head: \" Scrapy | A Fast and Powerful Scraping...\"\n },\n meta: {\n download_slot: \"scrapy.org\",\n download_latency: 0.6338846683502197,\n download_timeout: 180,\n depth: 0\n },\n status: 200\n }\n }\n ],\n scrapybox: {\n request: {\n cookies: { },\n version: \"HttpVersion(major=1, minor=1)\",\n headers: {\n CONTENT-LENGTH: \"170\",\n HOST: \"localhost:8080\",\n ORIGIN: \"null\",\n CACHE-CONTROL: \"max-age=0\",\n UPGRADE-INSECURE-REQUESTS: \"1\",\n ACCEPT-LANGUAGE: \"en-US,en;q=0.8\",\n ACCEPT-ENCODING: \"gzip, deflate\",\n CONTENT-TYPE: \"application/x-www-form-urlencoded\",\n ACCEPT: \"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\",\n CONNECTION: \"keep-alive\",\n USER-AGENT: \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)...\"\n },\n path: \"/api/eval\",\n method: \"POST\",\n scheme: \"http\",\n POST: {\n settings.ROBOTSTXT_OBEY: \"on\",\n settings.USER_AGENT: \"Mozilla/5.0 (X11; Linux x86_64) Scrapybox/0.1 Scrapy/1.2 Python/3.5\",\n parse: \"\",\n yield_item: \"on\",\n start_url: \"scrapy.org\"\n },\n payload: [ ],\n has_body: true,\n query: \"\",\n content: [ ],\n host: \"localhost:8080\"\n }\n }\n}\n\n\nCommand line interface\n----------------------\n\n$ curl \"http://127.0.0.1:8080/api/eval/response.status\"\n200\n\n$ curl \"http://127.0.0.1:8080/api/eval/response.headers\"\n{b'Last-Modified': [b'Fri, 09 Aug 2013 23:54:35 GMT'], b'Etag': [b'\"359670651+gzip\"'], b'X-Cache': [b'HIT'],...}\n\n$ curl \"http://127.0.0.1:8080/api/eval/response\"\n<200 http://www.example.com/>\n\n$ curl \"http://127.0.0.1:8080/api/eval/self\"\n<Spider 'parse' at 0x7f3a3af2fb38>\n\n$ curl \"http://127.0.0.1:8080/api/eval/self.__dict__\"\n{'settings': <scrapy.settings.Settings object at 0x7f3a3af47e10>, 'result': {...}, 'crawler': <scrapy.crawler...>}\n\n$ curl localhost:8080/api/eval -d start_urls=[\\\"http://scrapinghub.com\\\",\\\"http://scrapy.org\\\"]\nCrawled responses: [<200 http://scrapinghub.com/>, <200 http://scrapy.org>]\n\n\nRequirements\n------------\n\nPython 3.5.0+\n\n.. seealso:: requirements.txt\n\nLicense\n-------\n\nBSD licensed", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/stav/scrapybox/archive/master.zip", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/stav/scrapybox", "keywords": "scrapy,async,server", "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "scrapybox", "package_url": "https://pypi.org/project/scrapybox/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/scrapybox/", "project_urls": { "Download": "https://github.com/stav/scrapybox/archive/master.zip", "Homepage": "https://github.com/stav/scrapybox" }, "release_url": "https://pypi.org/project/scrapybox/0.1/", "requires_dist": null, "requires_python": null, "summary": "A Scrapy GUI", "version": "0.1" }, "last_serial": 2032349, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "b52c56ae482532b93cf2b77680ba11dd", "sha256": "89f3125b5dcb57032aa9d79a757c5feffb0bca50ea4091b17b59dabfa4c33390" }, "downloads": -1, "filename": "scrapybox-0.1.tar.gz", "has_sig": false, "md5_digest": "b52c56ae482532b93cf2b77680ba11dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12025, "upload_time": "2016-03-28T18:08:36", "url": "https://files.pythonhosted.org/packages/42/46/06af4abd325a896ef0e2666286e6e03d72480ebfe5f9a9f2f9e2b5aca678/scrapybox-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b52c56ae482532b93cf2b77680ba11dd", "sha256": "89f3125b5dcb57032aa9d79a757c5feffb0bca50ea4091b17b59dabfa4c33390" }, "downloads": -1, "filename": "scrapybox-0.1.tar.gz", "has_sig": false, "md5_digest": "b52c56ae482532b93cf2b77680ba11dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12025, "upload_time": "2016-03-28T18:08:36", "url": "https://files.pythonhosted.org/packages/42/46/06af4abd325a896ef0e2666286e6e03d72480ebfe5f9a9f2f9e2b5aca678/scrapybox-0.1.tar.gz" } ] }