{ "info": { "author": "Nuncjo", "author_email": "zoreander@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": "Delver\n======\n\nProgrammatic web browser/crawler in Python. **Alternative to Mechanize,\nRoboBrowser, MechanicalSoup** and others. Strict power of Request and\nLxml. Some features and methods usefull in scraping \"out of the box\".\n\nInstall\n-------\n\n.. code:: shell\n\n $ pip install delver\n\nDocumentation\n-------------\n\nhttp://delver.readthedocs.io/en/latest/\n\nQuick start - usage examples\n----------------------------\n\n- `Basic examples <#basic-examples>`__\n\n - `Form submit <#form-submit>`__\n - `Find links narrowed by\n filters <#find-links-narrowed-by-filters>`__\n - `Download file <#download-file>`__\n - `Download files list in\n parallel <#download-files-list-in-parallel>`__\n - `Xpath selectors <#xpath-selectors>`__\n - `Css selectors <#css-selectors>`__\n - `Xpath result with filters <#xpath-result-with-filters>`__\n\n- `Use examples <#use-examples>`__\n\n - `Scraping Steam Specials using\n XPath <#scraping-steam-specials-using-xpath>`__\n - `Simple tables scraping out of the\n box <#simple-tables-scraping-out-of-the-box>`__\n - `User login <#user-login>`__\n - `One Punch Man Downloader <#one-punch-man-downloader>`__\n\n--------------\n\nBasic examples\n--------------\n\nForm submit\n-----------\n\n.. code:: python\n\n\n >>> from delver import Crawler\n >>> c = Crawler()\n >>> response = c.open('https://httpbin.org/forms/post')\n >>> forms = c.forms()\n\n # Filling up fields values:\n >>> form = forms[0]\n >>> form.fields = {\n ... 'custname': 'Ruben Rybnik',\n ... 'custemail': 'ruben.rybnik@fakemail.com',\n ... 'size': 'medium',\n ... 'topping': ['bacon', 'cheese'],\n ... 'custtel': '+48606505888'\n ... }\n >>> submit_result = c.submit(form)\n >>> submit_result.status_code\n 200\n\n # Checking if form post ended with success:\n >>> c.submit_check(\n ... form,\n ... phrase=\"Ruben Rybnik\",\n ... url='https://httpbin.org/forms/post',\n ... status_codes=[200]\n ... )\n True\n\nFind links narrowed by filters\n------------------------------\n\n.. code:: python\n\n\n >>> c = Crawler()\n >>> c.open('https://httpbin.org/links/10/0')\n \n\n # Links can be filtered by some html tags and filters\n # like: id, text, title and class:\n >>> links = c.links(\n ... tags = ('style', 'link', 'script', 'a'),\n ... filters = {\n ... 'text': '7'\n ... },\n ... match='NOT_EQUAL'\n ... )\n >>> len(links)\n 8\n\nDownload file\n-------------\n\n.. code:: python\n\n\n >>> import os\n\n >>> c = Crawler()\n >>> local_file_path = c.download(\n ... local_path='test',\n ... url='https://httpbin.org/image/png',\n ... name='test.png'\n ... )\n >>> os.path.isfile(local_file_path)\n True\n\nDownload files list in parallel\n-------------------------------\n\n.. code:: python\n\n\n >>> c = Crawler()\n >>> c.open('https://xkcd.com/')\n \n >>> full_images_urls = [c.join_url(src) for src in c.images()]\n >>> downloaded_files = c.download_files('test', files=full_images_urls)\n >>> len(full_images_urls) == len(downloaded_files)\n True\n\nXpath selectors\n---------------\n\n.. code:: python\n\n\n c = Crawler()\n c.open('https://httpbin.org/html')\n p_text = c.xpath('//p/text()')\n\nCss selectors\n-------------\n\n.. code:: python\n\n\n c = Crawler()\n c.open('https://httpbin.org/html')\n p_text = c.css('div')\n\nXpath result with filters\n-------------------------\n\n.. code:: python\n\n\n c = Crawler()\n c.open('https://www.w3schools.com/')\n filtered_results = c.xpath('//p').filter(filters={'class': 'w3-xlarge'})\n\nUsing retries\n-------------\n\n.. code:: python\n\n\n c = Crawler()\n # sets max_retries to 2 means that after there will be max two attempts to open url\n # if first attempt will fail, wait 1 second and try again, second attempt wait 2 seconds\n # and then try again\n c.max_retries = 2\n c.open('http://www.delver.cg/404')\n\nUse examples\n------------\n\nScraping Steam Specials using XPath\n-----------------------------------\n\n.. code:: python\n\n\n from pprint import pprint\n from delver import Crawler\n\n c = Crawler(absolute_links=True)\n c.logging = True\n c.useragent = \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"\n c.random_timeout = (0, 5)\n c.open('http://store.steampowered.com/search/?specials=1')\n titles, discounts, final_prices = [], [], []\n\n\n while c.links(filters={\n 'class': 'pagebtn',\n 'text': '>'\n }):\n c.open(c.current_results[0])\n titles.extend(\n c.xpath(\"//div/span[@class='title']/text()\")\n )\n discounts.extend(\n c.xpath(\"//div[contains(@class, 'search_discount')]/span/text()\")\n )\n final_prices.extend(\n c.xpath(\"//div[contains(@class, 'discounted')]//text()[2]\").strip()\n )\n\n all_results = {\n row[0]: {\n 'discount': row[1],\n 'final_price': row[2]\n } for row in zip(titles, discounts, final_prices)}\n pprint(all_results)\n\nSimple tables scraping out of the box\n-------------------------------------\n\n.. code:: python\n\n\n from pprint import pprint\n from delver import Crawler\n\n c = Crawler(absolute_links=True)\n c.logging = True\n c.useragent = \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"\n c.open(\"http://www.boxofficemojo.com/daily/\")\n pprint(c.tables())\n\nUser login\n----------\n\n.. code:: python\n\n\n\n from delver import Crawler\n\n c = Crawler()\n c.useragent = (\n \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) \"\n \"Chrome/60.0.3112.90 Safari/537.36\"\n )\n c.random_timeout = (0, 5)\n c.open('http://testing-ground.scraping.pro/login')\n forms = c.forms()\n if forms:\n login_form = forms[0]\n login_form.fields = {\n 'usr': 'admin',\n 'pwd': '12345'\n }\n c.submit(login_form)\n success_check = c.submit_check(\n login_form,\n phrase='WELCOME :)',\n status_codes=[200]\n )\n print(success_check)\n\nOne Punch Man Downloader\n------------------------\n\n.. code:: python\n\n\n import os\n from delver import Crawler\n\n class OnePunchManDownloader:\n \"\"\"Downloads One Punch Man free manga chapers to local directories.\n Uses one main thread for scraper with random timeout.\n Uses 20 threads just for image downloads.\n \"\"\"\n def __init__(self):\n self._target_directory = 'one_punch_man'\n self._start_url = \"http://m.mangafox.me/manga/onepunch_man_one/\"\n self.crawler = Crawler()\n self.crawler.random_timeout = (0, 5)\n self.crawler.useragent = \"Googlebot-Image/1.0\"\n\n def run(self):\n self.crawler.open(self._start_url)\n for link in self.crawler.links(filters={'text': 'Ch '}, match='IN'):\n self.download_images(link)\n\n def download_images(self, link):\n target_path = '{}/{}'.format(self._target_directory, link.split('/')[-2])\n full_chapter_url = link.replace('/manga/', '/roll_manga/')\n self.crawler.open(full_chapter_url)\n images = self.crawler.xpath(\"//img[@class='reader-page']/@data-original\")\n os.makedirs(target_path, exist_ok=True)\n self.crawler.download_files(target_path, files=images, workers=20)\n\n\n downloader = OnePunchManDownloader()\n downloader.run()\n\n\n=======\nHistory\n=======\n\n0.1.3 (2017-10-03)\n------------------\n\n* First release on PyPI.\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/nuncjo/delver", "keywords": "delver", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "delver", "package_url": "https://pypi.org/project/delver/", "platform": "", "project_url": "https://pypi.org/project/delver/", "project_urls": { "Homepage": "https://github.com/nuncjo/delver" }, "release_url": "https://pypi.org/project/delver/0.1.6/", "requires_dist": null, "requires_python": "", "summary": "Modern user friendly web automation and scraping library.", "version": "0.1.6" }, "last_serial": 3228976, "releases": { "0.1.2": [ { "comment_text": "", "digests": { "md5": "51cec95b6c0f85e581de55b0b4041f34", "sha256": "1d678922a95dd27ec571dea23ca0ef9cabfaf093a3b823eb34e739cf2f809b99" }, "downloads": -1, "filename": "delver-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "51cec95b6c0f85e581de55b0b4041f34", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 25180, "upload_time": "2017-10-02T21:39:03", "url": "https://files.pythonhosted.org/packages/92/8d/d310ecf387436a121e4b462b7778f2e240fa6cd046bcaef49c06b6beaa70/delver-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e8354497d45a30c09b2efc322599a0c2", "sha256": "76488b47b14f0acdbe13da84358d300221a4840d5e87bb99d919789f230a700d" }, "downloads": -1, "filename": "delver-0.1.2.tar.gz", "has_sig": false, "md5_digest": "e8354497d45a30c09b2efc322599a0c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 32736, "upload_time": "2017-10-02T21:38:34", "url": "https://files.pythonhosted.org/packages/98/3f/56fd5a67373105699c90f157e1468a57d387f38c1c126a5758860e446c69/delver-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "9283e93b516ff7676e453259a09cf113", "sha256": "6fd107614c4fefbe1d88918a6c60bf79eabf328cd36676ee64300ea19d821975" }, "downloads": -1, "filename": "delver-0.1.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "9283e93b516ff7676e453259a09cf113", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 25240, "upload_time": "2017-10-03T21:02:55", "url": "https://files.pythonhosted.org/packages/2b/48/c06d62e21ff720d0d698a91a9041c51083c896c8dba99a471386fd2bbfc0/delver-0.1.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "27b7e50a4a5131f3706ef963c41a0987", "sha256": "aebaccff9a182a78c5ea3299072658adf725a3aa5b1d4057017f93bad9bc0d88" }, "downloads": -1, "filename": "delver-0.1.3.tar.gz", "has_sig": false, "md5_digest": "27b7e50a4a5131f3706ef963c41a0987", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 32804, "upload_time": "2017-10-03T21:02:02", "url": "https://files.pythonhosted.org/packages/84/bd/9e7786fdf98c9defce5ad629940f248c25162a3a108df881581f2e2d0ad9/delver-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "c6a56ae4757a312a822d4d0f83d26f2b", "sha256": "135ec104ac3b7b4032df087e8dced78284a74f3d05c584440431a54559f16121" }, "downloads": -1, "filename": "delver-0.1.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c6a56ae4757a312a822d4d0f83d26f2b", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 25414, "upload_time": "2017-10-05T21:06:25", "url": "https://files.pythonhosted.org/packages/78/d2/37fe7e984769d8749adc5d4570f91cfeb7236b547319a5519a85178be489/delver-0.1.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5f722f0875dda090d34bcb76eb471395", "sha256": "c4e4ffe75cf96b81ff6ecc9dd46b8bdb00ed41dcbed47cf17b02382fc86278c4" }, "downloads": -1, "filename": "delver-0.1.4.tar.gz", "has_sig": false, "md5_digest": "5f722f0875dda090d34bcb76eb471395", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33063, "upload_time": "2017-10-05T21:06:12", "url": "https://files.pythonhosted.org/packages/66/19/6e0b1e89816b70286ae8b1a811a121528e8929d528a0b5f69db5af0e2edf/delver-0.1.4.tar.gz" } ], "0.1.5.1": [ { "comment_text": "", "digests": { "md5": "dd437d72018352b83a2ef88258bbbd25", "sha256": "c14e46d261f7005b0c8473bb96457c671000e19119fc8168fa2d784c04b6efc7" }, "downloads": -1, "filename": "delver-0.1.5.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "dd437d72018352b83a2ef88258bbbd25", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 25634, "upload_time": "2017-10-05T21:18:24", "url": "https://files.pythonhosted.org/packages/c4/18/75a4936b173db57880db290590b3d18138a2d0711bb7216d60955f432889/delver-0.1.5.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f286f2181dc5b945ea74f87333747a66", "sha256": "62228d7ffd53c3543a9967277a356511e08f3870f71a154816571fcfed51c171" }, "downloads": -1, "filename": "delver-0.1.5.1.tar.gz", "has_sig": false, "md5_digest": "f286f2181dc5b945ea74f87333747a66", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33322, "upload_time": "2017-10-05T21:18:09", "url": "https://files.pythonhosted.org/packages/10/a5/68e12094e286e19272270087a62ed3a4860cb3c19d5c52a821b38c783769/delver-0.1.5.1.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "3668ad17fdd26b5c46d69629fa78bc1d", "sha256": "f52eed92840841bb301b0067b6b8fc5432c82c3b9d80ec59ac60010ddac7b95d" }, "downloads": -1, "filename": "delver-0.1.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3668ad17fdd26b5c46d69629fa78bc1d", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 25619, "upload_time": "2017-10-05T21:38:11", "url": "https://files.pythonhosted.org/packages/df/98/9f84f35a61ba2a214ec03201d0baffb03207b8a7a422dda6b5e1bcff1e32/delver-0.1.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "66b6d09d5eda36e735ab9e4639a774d7", "sha256": "49f45ff3a739e2cb5d979e5a6877da6343808f68124ddc5a8fa1fc9554d39fbe" }, "downloads": -1, "filename": "delver-0.1.6.tar.gz", "has_sig": false, "md5_digest": "66b6d09d5eda36e735ab9e4639a774d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33330, "upload_time": "2017-10-05T21:38:01", "url": "https://files.pythonhosted.org/packages/e8/8a/77ad365427982b4f60640cafe590410ef71a2cfccc9fd5947bd04ff53ff0/delver-0.1.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3668ad17fdd26b5c46d69629fa78bc1d", "sha256": "f52eed92840841bb301b0067b6b8fc5432c82c3b9d80ec59ac60010ddac7b95d" }, "downloads": -1, "filename": "delver-0.1.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3668ad17fdd26b5c46d69629fa78bc1d", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 25619, "upload_time": "2017-10-05T21:38:11", "url": "https://files.pythonhosted.org/packages/df/98/9f84f35a61ba2a214ec03201d0baffb03207b8a7a422dda6b5e1bcff1e32/delver-0.1.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "66b6d09d5eda36e735ab9e4639a774d7", "sha256": "49f45ff3a739e2cb5d979e5a6877da6343808f68124ddc5a8fa1fc9554d39fbe" }, "downloads": -1, "filename": "delver-0.1.6.tar.gz", "has_sig": false, "md5_digest": "66b6d09d5eda36e735ab9e4639a774d7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33330, "upload_time": "2017-10-05T21:38:01", "url": "https://files.pythonhosted.org/packages/e8/8a/77ad365427982b4f60640cafe590410ef71a2cfccc9fd5947bd04ff53ff0/delver-0.1.6.tar.gz" } ] }