{ "info": { "author": "Tyrone Zhao", "author_email": "tyrone-zhao@qq.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Code Generators" ], "description": "# crawlerUtils\nSpecial gift for spiderman, make spinning a web easier.\n\n## Installation\n```shell\npip install --user --upgrade crawlerUtils\n```\n\n## Usages\n**crawlerUtils.utils.crawler contains the follow methods:**\n\nCrawler is the BaseClass, which is inherited by Get Class and Post Class in utils/crawler.py.\nthe other Classes in utils is inherited by Crawler.\nAlso some of the Classes maybe inherite BaseCrawler Class in utils/base.py\n\n- Crawler.headersAdd(value) -- add the requests headers\n- Crawler.headersSet(value) -- reset the requests headers\n- Crawler.beautifulJson(text) -- deal the text to json\n- Crawler.beautifulSoup(text, parser=\"html.parser\") -- return BeautifulSoup object\n- Crawler.cookiesStringToDict(cookie) -- get cookies to dict from type string cookies\n- Crawler.cookiesSetFromDict(cookies_dict) -- set session cookies from dict\n- Crawler.cookiesRead(filepath=\"\", cookies=\"\") -- set session cookies from txt\n- Crawler.htmlParser(doc) -- read string object and return requests-html HTML object\n- Crawler.asyncRun(func, number, *args, **kwargs) -- run async requests-html Aysnc func\n\n- Get(url).text == requests.get(url).text\n- Get(url).rtext ~= webdriver.Chrome().get(url).page_source\n- Get(url).rhtext ~= webdriver.Chrome().headless.get(url).page_source\n- Get(url).json ~= json.loads(requests.get(url).text)\n- Get(url).rjson ~= json.loads(webdriver.Chrome().get(url).page_source)\n- Get(url).rhjson ~= json.loads(webdriver.Chrome().headless.get(url).page_source)\n- Get(url).soup ~= BeautifulSoup(requests.get(url).text, \"html.parser\")\n- Get(url).rsoup ~= BeautifulSoup(webdriver.Chrome().get(url).page_source, \"html.parser\")\n- Get(url).rhsoup ~= BeautifulSoup(webdriver.Chrome().headless.get(url).page_source, \"html.parser\")\n- Get(url).html == request-html.get(url).html\n- Get(url).rhtml ~= request-html.get(url).html.render().html\n- Get(url).ahtml ~= await request-html.get(url).html\n- Get(url).atext ~= await request-html.get(url).text\n- Get(url).ajson ~= await json.loads(request-html.get(url).text)\n- Get(url).asoup ~= await BeautifulSoup(request-html.get(url).text, \"html.parser\")\n- Get(url).arhtml ~= await request-html.get(url).html.arender()\n- Get(url).artext ~= await request-html.get(url).text.arender()\n- Get(url).arjson ~= await json.loads(request-html.get(url).text.arender())\n- Get(url).arsoup ~= await BeautifulSoup(request-html.get(url).text.arender(), \"html.parser\")\n- Post(url).text == requests.post(url).text\n- Post(url).rtext ~= webdriver.Chrome().get(url).page_source\n- ...\n- Post.cookiesToFile(filepath='crawlerUtilsCookies.txt') == login in and save cookies locally\n\n## What else can this Crawler do?\n```python\nfrom crawlerUtils import Crawler\n\n\nprint(dir(Crawler))\n```\n\n## Coding Examples\n\n### Inserting data to Mongodb \nYou can set the amount of data to be inserted each time.\n```python\nfrom crawlerUtils import Get\n\nGet.mongoConnect(mongo_url=\"mongodb://localhost:27017\",\n mongo_db=\"crawler_db\", username=\"\", password=\"\")\nurl = \"http://books.toscrape.com/\"\n\n\ndef crawler(url):\n print(url)\n html = Get(url).html\n css_selector = \"article.product_pod\"\n books = html.find(css_selector)\n for book in books:\n name = book.xpath('//h3/a')[0].text\n price = book.find('p.price_color')[0].text\n Get.mongoInsertLength(\n {\n \"\u4e66\u540d\": name,\n \"\u4ef7\u683c\": price\n }, collection=\"crawler_collection\", length=100\n )\n next_url = html.find('li.next a')\n if next_url:\n next_url = Get.urljoin(url, next_url[0].attrs.get(\"href\"))\n crawler(next_url)\n\n\ncrawler(url)\nGet.mongoClose()\n```\nYou can also insert all the data at a time.\n```python\nfrom crawlerUtils import Get\n\n\nlist1 = []\nfor i in range(10000):\n list1.append({\n \"\u59d3\u540d\": \"\u5f20\u4e09{}\".format(i),\n \"\u6027\u522b\": \"\u7537\"\n })\n\nGet.mongoInsertAll(list1)\n```\nor you can insert one data at a time.\n```python\nGet.mongoConnect()\nGet.mongoInsert({\"\u59d3\u540d\": \"\u5f20\u4e09\", \"\u6027\u522b\": \"\u7537\"})\nGet.mongoClose()\n```\n\n### Recognizing Captcha\nonly for Constant width 4-letters\n```python\nfrom crawlerUtils import Post\n\n# \u9a8c\u8bc1\u7801\u7684\u5b57\u7b26\u96c6\u5408\nCAPTCHA_SET = [\n '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',\n 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',\n 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'\n]\n\n# \u6839\u636e\u9a8c\u8bc1\u7801\u7684\u5b57\u7b26\u96c6\u5408\u521b\u5efa\u9a8c\u8bc1\u7801\u8bad\u7ec3\u6587\u4ef6\u5939\nPost.captchaCreateTestSet(captcha_set=CAPTCHA_SET)\n\n\n# \u8bf7\u6c42\u5e76\u83b7\u53d6\u9a8c\u8bc1\u7801\u51fd\u6570\ndef getCaptcha():\n \"\"\" \u83b7\u53d6\u9a8c\u8bc1\u7801\u7684\u51fd\u6570\u5fc5\u987b\u81f3\u5c11\u8fd4\u56defilepath->\u9a8c\u8bc1\u7801\u8def\u5f84, \u548cextension->\u9a8c\u8bc1\u7801\u56fe\u7247\u6269\u5c55\u540d\u5982jpeg\u4e24\u4e2a\u53c2\u6570 \"\"\"\n captcha_params = {\n \"captcha_str\": \"your telephone number\"\n }\n\n captcha_url = \"https://h5.ele.me/restapi/eus/v3/captchas\"\n\n captcha_json = Post(captcha_url, jsons=captcha_params).json\n b64data = captcha_json['captcha_image']\n\n filepath, extension = Post.base64decode(b64data)\n\n return filepath, extension\n\n\n# \u8fdb\u884c\u9a8c\u8bc1\u7801\u8bad\u7ec3, \u6bd4\u5982\u8bad\u7ec32\u6b21\nPost.captchaTrain(getCaptcha, times=2)\n\n# \u8bf7\u6c42\u4e00\u6b21\u9a8c\u8bc1\u7801\ncaptcha_code = Post.captchaRecognize(getCaptcha)\nprint(f\"\\n\u9a8c\u8bc1\u7801\u8bc6\u522b\u7ed3\u679c\uff1a{captcha_code}, \", end=\"\")\n```\n\n### MultiProcessing and Asyncio\n```python\nimport asyncio\nfrom multiprocessing import Process, cpu_count\nimport requests\nimport numpy\n\nheaders = {\n \"user-agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36\"\n}\n\n\nasync def getResponse(url):\n r = requests.get(url, headers=headers)\n return r\n\n\ndef processStart(url_list):\n tasks = []\n loop = asyncio.get_event_loop()\n for url in url_list:\n if url:\n tasks.append(asyncio.ensure_future(yourFunc(url)))\n loop.run_until_complete(asyncio.wait(tasks))\n\n\ndef tasksStart(url_list):\n # \u8fdb\u7a0b\u6c60\u8fdb\u7a0b\u6570\u91cf\n cpu_num = cpu_count()\n if len(url_list) <= cpu_num:\n processes = []\n for i in range(len(url_list)):\n url = url_list[i]\n url_list = [url]\n p = Process(target=processStart, args=(url_list,))\n processes.append(p)\n for p in processes:\n p.start()\n else:\n coroutine_num = len(url_list) // cpu_num\n processes = []\n url_list += [\"\"] * (cpu_num * (coroutine_num + 1) - len(url_list))\n data = numpy.array(url_list).reshape(coroutine_num + 1, cpu_num)\n for i in range(cpu_num):\n url_list = data[:, i]\n p = Process(target=processStart, args=(url_list,))\n processes.append(p)\n for p in processes:\n p.start()\n\n\nasync def yourFunc(url):\n r = await getResponse(url)\n print('end:{}'.format(url))\n\n\ndef multiProcessAsync(url_list):\n tasksStart(url_list)\n\n\nif __name__ == \"__main__\":\n url_list = []\n for x in range(1, 10000):\n url_ = 'http://www.baidu.com/?page=%s' % x\n url_list.append(url_)\n\n multiProcessAsync(url_list)\n\n```\n\n### Base64 is Supported\n```python\nfrom crawlerUtils import Post\n\nurl = \"https://aip.baidubce.com/oauth/2.0/token\"\n\nparams = {\n 'grant_type': 'client_credentials',\n 'client_id': 'YXVFHX8RtewBOSb6kUq73Yhh',\n 'client_secret': 'ARhdQmGQy9QQa5x6nggz6louZq9jHXCk',\n}\n\naccess_token_json = Post(url, params=params).json\naccess_token = access_token_json[\"access_token\"]\n\ncontents = Post.base64encode(\"/Users/zhaojunyu/Library/Mobile Documents/com~apple~CloudDocs/study/python/CPU\u7684\u65f6\u949f\u901f\u5ea6\u968f\u65f6\u95f4\u7684\u53d8\u5316.jpeg\")\n\nimage_recognize_url = \"https://aip.baidubce.com/rest/2.0/ocr/v1/webimage\"\nimage_recognize_headers = {\n \"Content-Type\": \"application/x-www-form-urlencoded\",\n}\nimage_recognize_params = {\n \"access_token\": access_token,\n}\nimage_recognize_data = {\n \"image\": contents[0],\n # \"url\": \"https://img-blog.csdnimg.cn/2019030221472810.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTg0NTUzMw==,size_16,color_FFFFFF,t_70\",\n \"detect_direction\": False,\n \"detect_language\": False,\n}\n\nresult_json = Post(image_recognize_url, image_recognize_headers, image_recognize_params, image_recognize_data).json\nprint(result_json)\n```\n\n### Deal JavaScript in Iframe\n```python\nstart_urls = []\nfor x in range(3):\n url = \"http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-year-2018-0-1-{}\".format(\n x+1)\n start_urls.append(url)\n\n\nasync def DangdangBook():\n ''' \u4ece\u5f53\u5f53\u56fe\u4e66\u83b7\u53d6\u524d3\u9875\u4e66\u7c4d\u7684\u4fe1\u606f '''\n while start_urls:\n url = start_urls.pop(0)\n try:\n html = await Get(url, encoding=\"gb18030\").ahtml\n books = html.find(\"ul.bang_list\", first=True).find(\"li\")\n for book in books:\n iterm = {}\n iterm[\"name\"] = book.find(\"div.name\", first=True).text\n iterm[\"author\"] = book.find(\"div.publisher_info\", first=True).text\n iterm[\"price\"] = book.find(\"span.price_n\", first=True).text\n print(iterm)\n except BaseException:\n pass\n\n\ndef runDangdangBook(number_asynchronous=3):\n ''' \u4ece\u5f53\u5f53\u56fe\u4e66\u83b7\u53d6\u524d3\u9875\u4e66\u7c4d\u7684\u4fe1\u606f '''\n Get.asyncRun(DangdangBook, number_asynchronous)\n```\n\n### Get(url).html\n```python\nfrom crawlerUtils import Get\n\nurl = \"https://book.douban.com/top250?start=0\"\n\nsoup = Get(url).html\ntrs = soup.find(\"tr.item\")\nfor tr in trs:\n book_name = tr.find(\"td\")[1].find(\"a\", first=True).text\n author = tr.find(\"p.pl\", first=True).text\n rating = tr.find(\"span.rating_nums\", first=True).text\n introduction = tr.find(\"span.inq\", first=True).text\n print(\"\u4e66\u540d\uff1a{0}\\n\u4f5c\u8005\uff1a{1}\\n\u8bc4\u5206\uff1a{2}\\n\u7b80\u4ecb\uff1a{3}\\n\".format(\n book_name, author, rating, introduction))\n```\n\n\n### crawlerUtils.utils.requests and crawlerUtils.utils.csv\n```python\nfrom crawlerUtils import Get\nimport time\n\n\n__all__ = [\"getShiGuang\"]\n\n\nurl_list = [\n 'http://www.mtime.com/top/tv/top100/',\n]\nurl_list += [f\"http://www.mtime.com/top/tv/top100/index-{str(x)}.html\" for x in range(2, 11)]\n\n\nasync def crawler():\n content = [\"\u5267\u540d\", \"\u5bfc\u6f14\", \"\u4e3b\u6f14\", \"\u7b80\u4ecb\"]\n while url_list:\n url = url_list.pop(0)\n rhtml = await Get(url).arhtml\n contents = rhtml.find(\"#asyncRatingRegion\", first=True).find(\"li\")\n for li in contents:\n content_dict = {}\n title = li.find(\"h2\", first=True).text\n content_dict[content[0]] = title\n contents = li.find(\"p\")\n for i in range(0, min([3, len(contents)])):\n if contents[i].text.strip():\n if not contents[i].text.strip()[0].isdigit():\n if contents[i].text[:2] in content:\n content_dict[contents[i].text[:2]] = contents[i].text\n else:\n content_dict[content[3]] = contents[i].text\n Get.csvWrite(fieldnames=[\"\u5267\u540d\", \"\u5bfc\u6f14\", \"\u4e3b\u6f14\", \"\u7b80\u4ecb\"], filepath=\"shiguang.csv\", dict_params=content_dict)\n return url\n\n\ndef runShiGuang(coroutine_number=5):\n ''' \u4f7f\u7528\u534f\u7a0b\u722c\u53d6\u65f6\u5149\u7535\u5f71\u7f51top100\u7535\u5f71\u4fe1\u606f '''\n start = time.time()\n Get.csvWrite(fieldnames=[\"\u5267\u540d\", \"\u5bfc\u6f14\", \"\u4e3b\u6f14\", \"\u7b80\u4ecb\"], filepath=\"shiguang.csv\")\n results = Get.asyncRun(crawler, coroutine_number)\n for result in results:\n print(result)\n end = time.time()\n print(end - start)\n```\n\n### crawlerUtils.utils.gevent and crawlerUtils.utils.csv\n```python\nfrom gevent import monkey\nmonkey.patch_all()\nfrom crawlerUtils import Get\n\n\nurl_list = [Get.queue.put_nowait(\n f\"http://www.boohee.com/food/group/{str(i)}?page={str(j)}\") for i in range(1, 11) for j in range(1, 11)]\nurl_list2 = [Get.queue.put_nowait(\n f\"http://www.boohee.com/food/view_menu?page={str(i)}\") for i in range(1, 11)]\nurl_list += url_list2\n\n\ndef crawler():\n while not Get.queue.empty():\n url = Get.queue.get_nowait()\n res_soup = Get(url).soup\n foods = res_soup.find_all('li', class_='item clearfix')\n for i in range(0, len(foods)):\n food_name = foods[i].find_all('a')[1]['title']\n print(food_name)\n food_url = 'http://www.boohee.com' + foods[i].find_all('a')[1]['href']\n food_calorie = foods[i].find('p').text\n Get.csvWrite(filepath=\"\u8584\u8377.csv\", row=[food_name, food_url, food_calorie])\n\n\ndef runBoheGevent():\n Get.csvWrite(filepath=\"\u8584\u8377.csv\")\n Get.csvWrite(filepath=\"\u8584\u8377.csv\", row=[\"\u98df\u7269\u540d\u79f0\", \"\u98df\u7269\u94fe\u63a5\", \"\u98df\u7269\u70ed\u91cf\"])\n Get.geventRun(crawler, 5)\n```\n\n### crawlerUtils.utils.log\nresult will be writen into all.log and error.log\n```python\nfrom crawlerUtils import Crawler\n\nlogger = Crawler.logSet()\nlogger.debug(\"\u8fd9\u662f\u4e00\u6761debug\u4fe1\u606f\")\nlogger.info(\"\u8fd9\u662f\u4e00\u6761info\u4fe1\u606f\")\nlogger.warning(\"\u8fd9\u662f\u4e00\u6761warning\u4fe1\u606f\")\nlogger.error(\"\u8fd9\u662f\u4e00\u6761error\u4fe1\u606f\")\nlogger.critical(\"\u8fd9\u662f\u4e00\u6761critical\u4fe1\u606f\")\nlogger.exception(\"\u8fd9\u662f\u4e00\u6761exception\u4fe1\u606f\")\n```\n\n**all.log**\n```\n2019-03-05 21:51:12,118 - DEBUG - \u8fd9\u662f\u4e00\u6761debug\u4fe1\u606f\n2019-03-05 21:51:12,119 - INFO - \u8fd9\u662f\u4e00\u6761info\u4fe1\u606f\n2019-03-05 21:51:12,121 - WARNING - \u8fd9\u662f\u4e00\u6761warning\u4fe1\u606f\n2019-03-05 21:51:12,122 - ERROR - \u8fd9\u662f\u4e00\u6761error\u4fe1\u606f\n2019-03-05 21:51:12,123 - CRITICAL - \u8fd9\u662f\u4e00\u6761critical\u4fe1\u606f\n2019-03-05 21:51:12,124 - ERROR - \u8fd9\u662f\u4e00\u6761exception\u4fe1\u606f\nNoneType: None\n```\n\n**error.log**\n```\n2019-03-05 21:51:12,122 - ERROR - noUse.py[:7] - \u8fd9\u662f\u4e00\u6761error\u4fe1\u606f\n2019-03-05 21:51:12,123 - CRITICAL - noUse.py[:8] - \u8fd9\u662f\u4e00\u6761critical\u4fe1\u606f\n2019-03-05 21:51:12,124 - ERROR - noUse.py[:9] - \u8fd9\u662f\u4e00\u6761exception\u4fe1\u606f\nNoneType: None\n```\n\n\n### crawlerUtils.utils.selenium\n```python\nfrom crawlerUtils import Get\n\n\ndef runLoginAndPrintZens():\n ''' \u5b9e\u73b0\u767b\u5f55\u52a8\u4f5c\u5e76\u6253\u5370\u4e2d\u82f1\u6587\u7248python\u4e4b\u7985 '''\n url = \"https://localprod.pandateacher.com/python-manuscript/hello-spiderman/\"\n method_params = [\n (\"id\", \"teacher\"),\n (\"id\", \"assistant\"),\n (\"cl\", \"sub\"),\n ]\n username = \"\u9171\u9171\"\n password = \"\u9171\u9171\"\n\n driver = Get.loginNoCaptcha(url, method_params, username, password)\n zens = Get.locateElement(driver, \"ids\")(\"p\")\n english_zen = Get.beautifulSoup(zens[0].text)\n chinese_zen = Get.beautifulSoup(zens[1].text)\n print(f\"\u82f1\u6587\u7248Python\u4e4b\u7985\uff1a\\n{english_zen.text}\\n\")\n print(f\"\\n\u4e2d\u6587\u7248Python\u4e4b\u7985\uff1a\\n{chinese_zen.text}\\n\")\n```\n\n### crawlerUtils.utils.crawler and crawlerUtils.utils.excel\n```python\nimport time\nfrom crawlerUtils import Get\n\ndef _getAuthorNames(name):\n \"\"\" \u83b7\u53d6\u4f5c\u8005\u540d\u5b57 \"\"\"\n author_headers = {\n \"referer\": \"https://www.zhihu.com/search?type=content&q=python\"\n }\n\n author_params = {\n \"type\": \"content\",\n \"q\": name,\n }\n\n author_url = \"https://www.zhihu.com/search\"\n\n author_soup = Get(author_url, headers=author_headers, params=author_params).soup\n author_name_json = Get.beautifulJson(\n author_soup.find(\"script\", id=\"js-initialData\").text\n )\n author_names = list(author_name_json['initialState']['entities']['users'])\n return author_names\n\n\ndef _getOneAuthorsArticles(author, wb):\n \"\"\" \u722c\u53d6\u4e00\u4e2a\u4f5c\u8005\u7684\u6240\u6709\u6587\u7ae0 \"\"\"\n ws = Get.excelWrite(workbook=wb, sheetname=f\"{author}Articles\")\n Get.excelWrite(0, 0, label=\"\u6587\u7ae0\u540d\", worksheet=ws)\n Get.excelWrite(0, 1, label=\"\u6587\u7ae0\u94fe\u63a5\", worksheet=ws)\n Get.excelWrite(0, 2, label=\"\u6587\u7ae0\u6458\u8981\", worksheet=ws)\n\n headers = {\n \"referer\": f\"https://www.zhihu.com/people/{author}/posts\"\n }\n\n # \u6587\u7ae0\u8ba1\u6570\n article_nums = 0\n offset = 0\n page_num = 1\n\n while True:\n articles_params = {\n \"include\": \"data[*].comment_count,suggest_edit,is_normal,thumbnail_extra_info,thumbnail,can_comment,comment_permission,admin_closed_comment,content,voteup_count,created,updated,upvoted_followees,voting,review_info,is_labeled,label_info;data[*].author.badge[?(type=best_answerer)].topics\",\n \"offset\": str(offset),\n \"limit\": \"20\",\n \"sort_by\": \"created\",\n }\n\n articles_url = f\"https://www.zhihu.com/api/v4/members/{author}/articles\"\n\n articles_res_json = Get(articles_url, headers=headers, params=articles_params).json\n\n articles = articles_res_json[\"data\"]\n for article in articles:\n article_nums += 1\n article_title = article[\"title\"]\n article_url = article[\"url\"]\n article_excerpt = article[\"excerpt\"]\n print(article_title)\n Get.excelWrite(article_nums, 0, label=article_title, worksheet=ws)\n Get.excelWrite(article_nums, 1, label=article_url, worksheet=ws)\n Get.excelWrite(article_nums, 2, label=article_excerpt, worksheet=ws)\n\n offset += 20\n headers[\"referer\"] = f\"https://www.zhihu.com/people/{author}/posts?page={page_num}\"\n page_num += 1\n\n articles_is_end = articles_res_json[\"paging\"][\"is_end\"]\n if articles_is_end:\n break\n\n # # \u722c\u4e24\u9875\u5c31\u7ed3\u675f\n # if page_num > 2:\n # break\n\n\ndef runZhiHuArticle():\n \"\"\" \u83b7\u53d6\u4e00\u4e2a\u77e5\u4e4e\u4f5c\u8005\u7684\u6240\u6709\u6587\u7ae0\u540d\u79f0\u3001\u94fe\u63a5\u3001\u53ca\u6458\u8981\uff0c\u5e76\u5b58\u5230Excel\u8868\u91cc \"\"\"\n # Excel\n wb = Get.excelWrite(encoding='ascii')\n\n # \u7528\u6237\u8f93\u5165\u77e5\u4e4e\u4f5c\u8005\u540d\n name = input(\"\u8bf7\u8f93\u5165\u4f5c\u8005\u7684\u540d\u5b57\uff1a\")\n # \u83b7\u53d6\u4f5c\u8005url_name\n authors = _getAuthorNames(name)\n if not authors:\n authors = _getAuthorNames(name)\n # \u83b7\u53d6\u4f5c\u8005\u7684\u6240\u6709\u6587\u7ae0\n for author in authors:\n time.sleep(1)\n _getOneAuthorsArticles(author, wb)\n\n wb.save(f\"zhihu{name}.xls\")\n\n```\n\n### crawlerUtils.utils.urllib and crawlerUtils.utils.mail and crawlerUtils.utils.schedule\n```python\nfrom crawlerUtils import Get\nimport re\n\n\ndef queryChineseWeather(city_name=\"\u5e7f\u5dde\"):\n ''' \u5728\u4e2d\u56fd\u5929\u6c14\u7f51\u67e5\u8be2\u5929\u6c14 '''\n while True:\n if not city_name:\n city_name = input(\"\u8bf7\u95ee\u8981\u67e5\u8be2\u54ea\u91cc\u7684\u5929\u6c14\uff1a\")\n city_url = f\"http://toy1.weather.com.cn/search?cityname={Get.urlencode(city_name)}\"\n city_json = Get.urllibOpenJson(city_url)\n\n if city_json:\n if city_json[0].get(\"ref\"):\n city_string = city_json[0][\"ref\"]\n city_code = re.findall(\"\\d+\", city_string)[0]\n else:\n print(\"\u57ce\u5e02\u5730\u5740\u8f93\u5165\u6709\u8bef\uff0c\u8bf7\u91cd\u65b0\u8f93\u5165\uff01\")\n city_name = \"\"\n continue\n\n weather_url = f\"http://www.weather.com.cn/weather1d/{city_code}.shtml\"\n weather_soup = Get.urllibOpenSoup(weather_url)\n weather = weather_soup.find(\n \"input\", id=\"hidden_title\").get(\"value\").split()\n\n return weather\n\n\ndef runSendCityWeatherEveryDay(city=\"\u5317\u4eac\"):\n ''' \u6bcf\u5929\u5b9a\u65f6\u53d1\u9001\u5929\u6c14\u4fe1\u606f\u5230\u6307\u5b9a\u90ae\u7bb1 '''\n recipients, account, password, subj, text = Get.mailSendInput()\n weather = queryChineseWeather(city)\n text = \" \".join(weather)\n daytime = input(\"\u8bf7\u95ee\u6bcf\u5929\u7684\u51e0\u70b9\u53d1\u9001\u90ae\u4ef6\uff1f\u683c\u5f0f'18:30'\uff0c\u4e0d\u5305\u542b\u5355\u5f15\u53f7 \uff1a\")\n\n Get.scheduleFuncEveryDayTime(Get.mailSend, daytime, recipients, account,\n password, subj, text)\n\n```\n\n### More...\n\n### Documentation\uff1a\nrequests: https://github.com/kennethreitz/requests\n\nbs4: https://www.crummy.com/software/BeautifulSoup/bs4/doc/\n\nrequests-html: https://github.com/kennethreitz/requests-html\n\nselenium: https://www.seleniumhq.org/docs/\n\ngevent: http://www.gevent.org/contents.html\n\nexcel: http://www.python-excel.org/\n\ncsv: https://docs.python.org/3/library/csv.html?highlight=csv#module-csv\n\nlog: https://docs.python.org/3/library/logging.html?highlight=log#module-logging\n\nurllib: https://docs.python.org/3/library/urllib.html\n\nemail: https://docs.python.org/3/library/email.html?highlight=mail#module-email\n\nschedule: https://schedule.readthedocs.io/en/stable/\n\nregex: https://regexr.com/\n\n\n## \u66f4\u65b0\u8bb0\u5f55\n- Future\n\u53ef\u9009\u5185\u5bb9: \u517c\u5bb9tornado\u7684\u5f02\u6b65\u6027\u80fd\u5e76\u52a0\u5165\u591a\u8fdb\u7a0b\u3001\u589e\u52a0robots.txt\u9009\u9879\u3001\u81ea\u52a8\u7ffb\u9875\u3001\u589e\u91cf\u6293\u53d6\u3001\u7279\u6027\u5b9a\u5236\u3001redis\u6a21\u5757\u3001\u8bbe\u7f6e\u4ee3\u7406\u3001\u76d1\u63a7\u3001\u5206\u5e03\u5f0f\u3001\u6570\u636e\u5206\u6790\u4e0e\u53ef\u89c6\u5316\u3001cython\u3001PyPy\u4f18\u5316\u3001\u9a8c\u8bc1\u7801\u8bc6\u522b\u6a21\u5757\u3001\u9488\u5bf9\u5c01ip\u7684\u89e3\u51b3\u65b9\u6848(\u4ee3\u7406\u6c60)\u3001\u6570\u636e\u5199\u5165\u95f4\u9694\u7b49; \u6b22\u8fce\u63d0\u4ea4Pull Request\u3002\n\n- V1.8.1 \n\u66f4\u65b0\u5185\u5bb9: \u589e\u52a0\u4e86\u7b49\u5bbd4\u5b57\u7b26\u9a8c\u8bc1\u7801\u7684\u8bc6\u522b, \u91cd\u6784\u4e86\u4e86utils\u6587\u4ef6\u5939\u4e0b\u7684\u6587\u4ef6\u540d\uff0c\u589e\u52a0\u4e86mongodb\u6570\u636e\u7684\u63d2\u5165\u652f\u6301\u3002\n\n- V1.8.0 \n\u66f4\u65b0\u5185\u5bb9: \u589e\u52a0\u4e86\u591a\u8fdb\u7a0b\u53ca\u534f\u7a0b\u7684\u811a\u672c\uff0c\u4f46\u662f\u56e0\u4e3a\u6587\u4ef6\u63cf\u8ff0\u7b26\u95ee\u9898\uff0c\u76ee\u524d\u4e0d\u80fd\u96c6\u6210\u5230\u6846\u67b6\uff0c\u7b49\u5f85\u540e\u7eed\u89e3\u51b3\u3002\u589e\u52a0\u4e86base64\u7f16\u7801\u548c\u89e3\u7801\u652f\u6301\u3002\n\n- V1.7.0\n\u66f4\u65b0\u5185\u5bb9: \u96c6\u6210\u4e86requests-html\uff0c\u652f\u6301\u5e76\u53d1\u548cJavaScript\u89e3\u6790(\u5982r = Get(url).html; r.render();r.find();r.search();r.xpath())\uff0c\u91cd\u5199examples\u91cc\u7684shiguang.py\uff1b\u589e\u52a0\u4e86utils.request\u91cc\u7684async\u65b9\u6cd5.\n\n- V1.6.0\n\u66f4\u65b0\u5185\u5bb9: \u96c6\u6210gevent\uff0c\u652f\u6301\u534f\u7a0b\uff0c\u589e\u52a0examples\u91cc\u7684shiguang.py\uff1b\u96c6\u6210csv\u3001math;\u91cd\u6784utils.py\u53ca\u5bf9\u5e94example\uff0c\u91c7\u7528\u9762\u5411\u5bf9\u8c61\u65b9\u5f0f\u7f16\u5199\u3002\n\n- V1.5.2\n\u66f4\u65b0\u5185\u5bb9: \u589e\u52a0utils.log\u6a21\u5757\uff0c\u52a0\u5165moviedownload.py \u591a\u7ebf\u7a0bWindows64\u4f4d\u7248\n\n- V1.5.0 \n\u66f4\u65b0\u5185\u5bb9: \u96c6\u6210schedule\u5e93\u51fd\u6570, \u91cd\u6784utils\u4ee3\u7801\n\n- V1.4.2 \n\u66f4\u65b0\u5185\u5bb9: \u589e\u52a0\u6bcf\u65e5\u5b9a\u65f6\u53d1\u9001\u5929\u6c14\u7684example\u53ca\u5b9a\u65f6\u53d1\u9001\u90ae\u4ef6\u7b49\u51fd\u6570\n\n- V1.4.1 \n\u66f4\u65b0\u5185\u5bb9: \u5c01\u88c5\u4e86\u4e00\u4e9bBeautifulSoup\u548cSelenium\u51fd\u6570\u3001\u589e\u52a0\u6253\u5370python\u4e4b\u7985\u7684\u4f8b\u5b50\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Tyrone-Zhao/crawlerUtils", "keywords": "crawler html selenium-python python3 requests beautifulsoup4 urllib mail schedule captcha excel scraping log requests-html csv gevent spiders geohash base64", "license": "", "maintainer": "", "maintainer_email": "", "name": "crawlerUtils", "package_url": "https://pypi.org/project/crawlerUtils/", "platform": "", "project_url": "https://pypi.org/project/crawlerUtils/", "project_urls": { "Blog": "https://blog.csdn.net/weixin_41845533", "Homepage": "https://github.com/Tyrone-Zhao/crawlerUtils" }, "release_url": "https://pypi.org/project/crawlerUtils/1.8.1.post4/", "requires_dist": [ "bs4 (>=0.0.1)", "selenium (>=3.141.0)", "schedule (>=0.6.0)", "xlrd (>=1.2.0)", "xlwt (>=1.3.0)", "gevent (>=1.4.0)", "requests-html (>=0.10.0)", "Pillow (>=5.3.0)", "pymongo (>=3.7.2)" ], "requires_python": ">=3.6.0", "summary": "Crawler Utils examples", "version": "1.8.1.post4" }, "last_serial": 4984087, "releases": { "1.7.9": [ { "comment_text": "", "digests": { "md5": "e657a3e4694c933addca15704f699089", "sha256": "f8f9a24a74758e592c142989e001b45729b582595026333944ff9074ed4b5d36" }, "downloads": -1, "filename": "crawlerUtils-1.7.9-py3-none-any.whl", "has_sig": false, "md5_digest": "e657a3e4694c933addca15704f699089", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 576962, "upload_time": "2019-03-09T03:46:46", "url": "https://files.pythonhosted.org/packages/90/b2/836e7850c6af2c54da0360eb90fa400acabb7173055219baf3df57f8dba7/crawlerUtils-1.7.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ec6469743bf5f5c545e0ee085ae26094", "sha256": "cad324b9580752896f664dd78e686074e57d0e2484e95c2b05080a14e86d2ee0" }, "downloads": -1, "filename": "crawlerUtils-1.7.9.tar.gz", "has_sig": false, "md5_digest": "ec6469743bf5f5c545e0ee085ae26094", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 257389, "upload_time": "2019-03-09T03:46:48", "url": "https://files.pythonhosted.org/packages/9b/54/64498a37f4c2910e8df347233ee0bb8e86158a13e1f93e77650400fbfa40/crawlerUtils-1.7.9.tar.gz" } ], "1.8.0.post2": [ { "comment_text": "", "digests": { "md5": "fc2c68d9a73f9c14f3eb326c8a15bdd2", "sha256": "bf2b2690703ccf911ba9565da5488e7ae337adb5909f4828859035a79922ead2" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post2-py3-none-any.whl", "has_sig": false, "md5_digest": "fc2c68d9a73f9c14f3eb326c8a15bdd2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 580406, "upload_time": "2019-03-10T12:58:26", "url": "https://files.pythonhosted.org/packages/04/16/92176469aeef96a876edf1819763d367873a2a9a58b493ca03490d81d873/crawlerUtils-1.8.0.post2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9e296f4ab905b9f3b8a85af9a3683ce4", "sha256": "eaf344b932ed02c5349ab2640c566953e028085bf5b7e73e0e4653d4a849b769" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post2.tar.gz", "has_sig": false, "md5_digest": "9e296f4ab905b9f3b8a85af9a3683ce4", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 258827, "upload_time": "2019-03-10T12:58:30", "url": "https://files.pythonhosted.org/packages/ad/a9/f2a59d6db40633520d47e9ea68db40dc4a11e0dd3272d8895355fc36cc51/crawlerUtils-1.8.0.post2.tar.gz" } ], "1.8.0.post3": [ { "comment_text": "", "digests": { "md5": "2767eebcec7c475265e1e88404fabd4f", "sha256": "98fa6bbd1a67d0e2068d2d03702ea2a51f603df9a593d0b4ff4ee838cf942b23" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post3-py3-none-any.whl", "has_sig": false, "md5_digest": "2767eebcec7c475265e1e88404fabd4f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 580382, "upload_time": "2019-03-11T11:24:50", "url": "https://files.pythonhosted.org/packages/53/9a/3504a3096737e14ef3394dff6d27f9d2650e6d2b91eaa9b8b968806be449/crawlerUtils-1.8.0.post3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1d23eb43db3ee1b9a69f2d2a9456aa9e", "sha256": "3122cf9aae4ed18f5bcdf89d7fcd1969704aff9d1e810cd19955e253904eb14f" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post3.tar.gz", "has_sig": false, "md5_digest": "1d23eb43db3ee1b9a69f2d2a9456aa9e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 259138, "upload_time": "2019-03-11T11:24:52", "url": "https://files.pythonhosted.org/packages/d2/1c/d52914ca288b5acdc4e2f92ef186f8208fb649d337013270a97603b2d915/crawlerUtils-1.8.0.post3.tar.gz" } ], "1.8.0.post4": [ { "comment_text": "", "digests": { "md5": "698edc5515cd2bd8cdd88202539f0413", "sha256": "c189459b408d2629355c7b8ff665541ea5d7a88e56a98b984a206740734d1d07" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post4-py3-none-any.whl", "has_sig": false, "md5_digest": "698edc5515cd2bd8cdd88202539f0413", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 581449, "upload_time": "2019-03-11T12:41:59", "url": "https://files.pythonhosted.org/packages/9d/19/4ba491da5a3dddb87e34e9866068a8174c336c0db806860b9b4aa1d06a08/crawlerUtils-1.8.0.post4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f8d54ceda7cc981a4c360c563fc09165", "sha256": "8ded67837afd4d9c44ded1f9bdd10ef0215d08e4a154226be767c1dd04144dd0" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post4.tar.gz", "has_sig": false, "md5_digest": "f8d54ceda7cc981a4c360c563fc09165", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 261372, "upload_time": "2019-03-11T12:42:01", "url": "https://files.pythonhosted.org/packages/e1/24/fc1440ca6ba18556ef4b4fb324af61c27c9178f59480eb6789b30024ba63/crawlerUtils-1.8.0.post4.tar.gz" } ], "1.8.0.post5": [ { "comment_text": "", "digests": { "md5": "b900575b2700cae28dd13dba0d58734e", "sha256": "0e8b8dd2ef4bc398333b24f4a5c376d5fe4d771eb5d934da9d73e6e1a5d8934a" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post5-py3-none-any.whl", "has_sig": false, "md5_digest": "b900575b2700cae28dd13dba0d58734e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 586451, "upload_time": "2019-03-11T14:32:45", "url": "https://files.pythonhosted.org/packages/7f/98/a8eaf03ffe5b30a34662adaea6bf3bd41c3264736b8a0aff89e1ea99fd47/crawlerUtils-1.8.0.post5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8dd6e049c1555e2907cefc9feb96a1f5", "sha256": "46c25f0b33232d0167ee95151ad5a6d1539e6425356678c165a23d8430fbc4e9" }, "downloads": -1, "filename": "crawlerUtils-1.8.0.post5.tar.gz", "has_sig": false, "md5_digest": "8dd6e049c1555e2907cefc9feb96a1f5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 265792, "upload_time": "2019-03-11T14:32:47", "url": "https://files.pythonhosted.org/packages/9f/51/9d167de3d14b4e19c7dc76fcd648ad69a2f2a524805e3ab440899a0ee37f/crawlerUtils-1.8.0.post5.tar.gz" } ], "1.8.1.post1": [ { "comment_text": "", "digests": { "md5": "0ab289ff2dc87784e4803e04d75f171e", "sha256": "e5606f2a9bd9d16d6fff421f52a44b12e24b26b483c154572046b8899cf64bd0" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post1-py3-none-any.whl", "has_sig": false, "md5_digest": "0ab289ff2dc87784e4803e04d75f171e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 590511, "upload_time": "2019-03-23T01:00:51", "url": "https://files.pythonhosted.org/packages/15/04/f35bac27246c415ba0e4b8f0f00f2bf62fa086925ac3aebc1f68526e0e5a/crawlerUtils-1.8.1.post1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf512ef41ba2e6dcc98e0f59a972b688", "sha256": "e3d3277c0d36f54bbab25f8a631f57949e84e4f2ec0f66e01f697ff75fa2127d" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post1.tar.gz", "has_sig": false, "md5_digest": "bf512ef41ba2e6dcc98e0f59a972b688", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 259298, "upload_time": "2019-03-23T01:00:53", "url": "https://files.pythonhosted.org/packages/ce/b6/bea35285a6b9412cf333641c610f335f39bedea2b6d3d7138765ee80f285/crawlerUtils-1.8.1.post1.tar.gz" } ], "1.8.1.post2": [ { "comment_text": "", "digests": { "md5": "ec86dbefed6399f387b283bb54520d47", "sha256": "8e5c114dda0cfddd50c6532cf2a334561f2b81b12deaea85825276d18fb41eea" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post2-py3-none-any.whl", "has_sig": false, "md5_digest": "ec86dbefed6399f387b283bb54520d47", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 590552, "upload_time": "2019-03-25T14:05:47", "url": "https://files.pythonhosted.org/packages/f1/f0/8fa42b178d1c5c5b8345f3ecbbdd52d5a19875e29d5a7566a5a63f9f378d/crawlerUtils-1.8.1.post2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2ec5b50d78f349c509ddf8cad612d260", "sha256": "bd9cceda4e15ef93a497084a693a83beea1f5175486c80031b84b788dcd981bc" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post2.tar.gz", "has_sig": false, "md5_digest": "2ec5b50d78f349c509ddf8cad612d260", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 259288, "upload_time": "2019-03-25T14:05:49", "url": "https://files.pythonhosted.org/packages/66/47/a8536f6a6815aae548c0b2e6bb3596ada3b089532f009e00acaa69619702/crawlerUtils-1.8.1.post2.tar.gz" } ], "1.8.1.post3": [ { "comment_text": "", "digests": { "md5": "cd9a20ab31f4d8ff2c49548692f54e96", "sha256": "243c15a7ab819d75078983f093401958b841f94b0b286cd716993283571780a6" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post3-py3-none-any.whl", "has_sig": false, "md5_digest": "cd9a20ab31f4d8ff2c49548692f54e96", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 585801, "upload_time": "2019-03-25T18:32:57", "url": "https://files.pythonhosted.org/packages/7e/ee/12cab092d726dc24d2cb74d2e3fe4b29bb7780c674902129495418667008/crawlerUtils-1.8.1.post3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8d3f50b838b03881d38b1e8aeaddec6a", "sha256": "0c71bab1ee812e116be4775cefb7574798064b242add2b145e7211c2212af603" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post3.tar.gz", "has_sig": false, "md5_digest": "8d3f50b838b03881d38b1e8aeaddec6a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 248693, "upload_time": "2019-03-25T18:32:59", "url": "https://files.pythonhosted.org/packages/15/7b/20b389290fd7c791fb279c32e4edbe4d1c3402e056b5804205e0f7402d21/crawlerUtils-1.8.1.post3.tar.gz" } ], "1.8.1.post4": [ { "comment_text": "", "digests": { "md5": "a85e0f1b1fe4ff1883a512e50e6a61fe", "sha256": "3e04582debcf977ac329cf528a8ea04be53449d3964b7a6507d4ce6fe4062dd8" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post4-py3-none-any.whl", "has_sig": false, "md5_digest": "a85e0f1b1fe4ff1883a512e50e6a61fe", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 585794, "upload_time": "2019-03-25T18:44:56", "url": "https://files.pythonhosted.org/packages/07/6f/946bb23a7bbdb9f3b592be94ddc599ff6f6473990b95feb243dc0a1a846d/crawlerUtils-1.8.1.post4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d07287472f2e3628db9796950ebb6896", "sha256": "b1abedabfc1cdf21c1ad60f2b5b891b77ab5c1b0ad61a065b05f6b14b27ad646" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post4.tar.gz", "has_sig": false, "md5_digest": "d07287472f2e3628db9796950ebb6896", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 247480, "upload_time": "2019-03-25T18:44:59", "url": "https://files.pythonhosted.org/packages/78/c6/ce5bef02bd358d71ae9f39a1bb61187c9f3200d1d3f95878ab2b290a572b/crawlerUtils-1.8.1.post4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a85e0f1b1fe4ff1883a512e50e6a61fe", "sha256": "3e04582debcf977ac329cf528a8ea04be53449d3964b7a6507d4ce6fe4062dd8" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post4-py3-none-any.whl", "has_sig": false, "md5_digest": "a85e0f1b1fe4ff1883a512e50e6a61fe", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6.0", "size": 585794, "upload_time": "2019-03-25T18:44:56", "url": "https://files.pythonhosted.org/packages/07/6f/946bb23a7bbdb9f3b592be94ddc599ff6f6473990b95feb243dc0a1a846d/crawlerUtils-1.8.1.post4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d07287472f2e3628db9796950ebb6896", "sha256": "b1abedabfc1cdf21c1ad60f2b5b891b77ab5c1b0ad61a065b05f6b14b27ad646" }, "downloads": -1, "filename": "crawlerUtils-1.8.1.post4.tar.gz", "has_sig": false, "md5_digest": "d07287472f2e3628db9796950ebb6896", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 247480, "upload_time": "2019-03-25T18:44:59", "url": "https://files.pythonhosted.org/packages/78/c6/ce5bef02bd358d71ae9f39a1bb61187c9f3200d1d3f95878ab2b290a572b/crawlerUtils-1.8.1.post4.tar.gz" } ] }