{ "info": { "author": "Max Humber", "author_email": "max.humber@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6" ], "description": "

\n \"gazpacho\"\n

\n

\n \"Dependencies\"\n \"Travis\"\n \"PyPI\"\n \"Downloads\" \n

\n\n\n\n#### About\n\ngazpacho is a web scraping library. It replaces requests and BeautifulSoup for most projects. gazpacho is small, simple, fast, and consistent. You should use it!\n\n\n\n#### Usage\n\ngazpacho is easy to use. To retrieve the contents of a web page use `get`. And to parse the retrieved contents use `Soup`.\n\n\n\n#### Get\n\nThe `get` function retrieves content from a web page:\n\n```python\nfrom gazpacho import get\n\nurl = 'https://en.wikipedia.org/wiki/Gazpacho'\nhtml = get(url)\nprint(html[:50])\n\n# \n# \\nIngredients and preparation\n\nresults = soup.find('span', {'class': 'mw-headline'})\n```\n\nThe `find` method will either return a list of `Soup` objects if there are multiple elements that satisfy the tag and attribute constraints, or a single `Soup` object if there's just one:\n\n```python\nprint(results)\n\n# [History,\n# Ingredients and preparation,\n# Variations,\n# In Spain,\n# Arranque rote\u00f1o,\n# Extremaduran variations,\n# La Mancha variations,\n# Castilian variations,\n# See also,\n# References]\n```\n\nThe return behaviour of `find` can be adjusted and more predictable with the `mode` argument `{'auto', 'first', 'all'}`:\n\n```python\nsoup.find('span', {'class': 'mw-headline'}, mode='first')\n# History\n```\n\n`Soup` objects returned by the `find` method will have `html`, `tag`, `attrs`, and `text` attributes:\n\n```python\nresult = results[3]\nprint(result.html)\n# In Spain\nprint(result.tag)\n# span\nprint(result.attrs)\n# {'class': 'mw-headline', 'id': 'In_Spain'}\nprint(result.text)\n# In Spain\n```\n\nAnd, importantly, returned `Soup` objects can reimplement the `find` method!\n\n\n\n#### Production\n\ngazpacho is production ready. The library currently powers [quote](https://github.com/maxhumber/quote), a python wrapper for the Goodreads Quote API. And a fully worked example of gazpacho in action is available [here](https://maxhumber.com/scraping_fantasy_hockey).\n\n\n\n#### Comparison\n\ngazpacho is a drop-in replacement for most projects that use requests and BeautifulSoup:\n\n```python\nimport requests\nfrom bs4 import BeautifulSoup\nimport pandas as pd\n\nurl = 'https://www.capfriendly.com/browse/active/2020/salary?p=1'\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'lxml')\ndf = pd.read_html(str(soup.find('table')))[0]\nprint(df[['PLAYER', 'TEAM', 'SALARY', 'AGE']].head(3))\n\n# PLAYER TEAM SALARY AGE\n# 0 1. Mitchell Marner TOR $16,000,000 22\n# 1 2. Auston Matthews TOR $15,900,000 21\n# 2 3. John Tavares TOR $15,900,000 28\n```\n\nPowered by gazpacho:\n\n```python\nfrom gazpacho import get, Soup\nimport pandas as pd\n\nurl = 'https://www.capfriendly.com/browse/active/2020/salary?p=1'\nresponse = get(url)\nsoup = Soup(response)\ndf = pd.read_html(str(soup.find('table')))[0]\nprint(df[['PLAYER', 'TEAM', 'SALARY', 'AGE']].head(3))\n\n# PLAYER TEAM SALARY AGE\n# 0 1. Mitchell Marner TOR $16,000,000 22\n# 1 2. Auston Matthews TOR $15,900,000 21\n# 2 3. John Tavares TOR $15,900,000 28\n```\n\n\n\n#### Speed\n\ngazpacho is fast:\n\n```python\nfrom gazpacho import Soup\n\n%%timeit\nsoup = Soup(html)\nsoup.find('span', {'class': 'mw-headline'})\n# 15 ms \u00b1 325 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 100 loops each)\n```\n\ngazpacho is often 20-40% faster than BeautifulSoup:\n\n```python\nfrom bs4 import BeautifulSoup\n\n%%timeit\nsoup = BeautifulSoup(html, 'lxml')\nsoup.find('span', {'class': 'mw-headline'})\n# 19.4 ms \u00b1 583 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10 loops each)\n```\n\nAnd 200-300% faster than requests-html:\n\n```python\nfrom requests_html import HTML\n\n%%timeit\nsoup = HTML(html=html)\nsoup.find('span.mw-headline')\n# 40.1 ms \u00b1 418 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10 loops each)\n```\n\n\n\n#### Installation\n\n```\npip install -U gazpacho\n```\n\n\n\n#### Contribute\n\nFor feature requests or bug reports, please use [Github Issues](https://github.com/maxhumber/gazpacho/issues).\n\nFor PRs, please read the [CONTRIBUTING.md](https://github.com/maxhumber/gazpacho/blob/master/CONTRIBUTING.md) document.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/maxhumber/gazpacho", "keywords": "web scraping,web,scraping,BeautifulSoup,requests", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "gazpacho", "package_url": "https://pypi.org/project/gazpacho/", "platform": "", "project_url": "https://pypi.org/project/gazpacho/", "project_urls": { "Homepage": "https://github.com/maxhumber/gazpacho" }, "release_url": "https://pypi.org/project/gazpacho/0.8.1/", "requires_dist": null, "requires_python": ">=3.6", "summary": "gazpacho is a web scraping library", "version": "0.8.1" }, "last_serial": 5957738, "releases": { "0.5": [ { "comment_text": "", "digests": { "md5": "b6a5668fdfad1f1c22494c51d1d5b5f9", "sha256": "6974eb01cf284dcb067465481268a72dfa4222b1247e543d65453366b7806a1c" }, "downloads": -1, "filename": "gazpacho-0.5.tar.gz", "has_sig": false, "md5_digest": "b6a5668fdfad1f1c22494c51d1d5b5f9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 3190, "upload_time": "2019-09-25T17:44:32", "url": "https://files.pythonhosted.org/packages/83/e1/129abdfda0451a550cbca3b27886010f80bd358ccb96734552b1d75723e4/gazpacho-0.5.tar.gz" } ], "0.6.0": [ { "comment_text": "", "digests": { "md5": "441d09287fbc08111346158cc7db4217", "sha256": "317dc1e7280c7b653e12d4fc8e1af633136f166dcba7775e84c09318a50a12bb" }, "downloads": -1, "filename": "gazpacho-0.6.0.tar.gz", "has_sig": false, "md5_digest": "441d09287fbc08111346158cc7db4217", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 3129, "upload_time": "2019-09-25T20:14:42", "url": "https://files.pythonhosted.org/packages/90/5e/2fb320ef0fa5fb29d73a052b61ddf07df09cc99438713b6ffe414e8c3998/gazpacho-0.6.0.tar.gz" } ], "0.7": [ { "comment_text": "", "digests": { "md5": "fb1e104a49c8e17bcc2a31eb2eee9b3f", "sha256": "d2477ed73428ac8e995eeb6ce62582d2aa183619f3db2c45b8709f8705eb81e2" }, "downloads": -1, "filename": "gazpacho-0.7.tar.gz", "has_sig": false, "md5_digest": "fb1e104a49c8e17bcc2a31eb2eee9b3f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 4881, "upload_time": "2019-09-27T16:24:53", "url": "https://files.pythonhosted.org/packages/ab/fb/875a0aaf8541b6fbf88aa302458144a61eb99f0bc663b265e077b705fbe1/gazpacho-0.7.tar.gz" } ], "0.7.1": [ { "comment_text": "", "digests": { "md5": "71aa4716bd7d0e6865af85d3b4d209de", "sha256": "a02d5f1748d0fb866e1c3a0dc0a2065a161b3b183411b43546781a9758cfc522" }, "downloads": -1, "filename": "gazpacho-0.7.1.tar.gz", "has_sig": false, "md5_digest": "71aa4716bd7d0e6865af85d3b4d209de", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5145, "upload_time": "2019-09-28T14:48:04", "url": "https://files.pythonhosted.org/packages/4b/cc/cf65c980a70fe742ff092b22325cc76eaf42cd37aa2b9ef2e9f2e98f2a10/gazpacho-0.7.1.tar.gz" } ], "0.7.2": [ { "comment_text": "", "digests": { "md5": "05c18e611b995d14d0f45a625c2ea502", "sha256": "c419bfe652347262ac93e51299bbc73a74b3b13809762f6c8b60507ae1f32e61" }, "downloads": -1, "filename": "gazpacho-0.7.2.tar.gz", "has_sig": false, "md5_digest": "05c18e611b995d14d0f45a625c2ea502", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5142, "upload_time": "2019-09-30T16:50:28", "url": "https://files.pythonhosted.org/packages/ad/a2/6d42e26fdf143c2e62c11c62a91a5a147cd80696a9e7bc3f90c29e5f9a2f/gazpacho-0.7.2.tar.gz" } ], "0.8": [ { "comment_text": "", "digests": { "md5": "09c24fd1fb68602733a26ad0adb5c029", "sha256": "53e572f75f3adcd5e45331ff4c6ed67c031b143f16fb9170dd8cd4eacf1b9b4b" }, "downloads": -1, "filename": "gazpacho-0.8.tar.gz", "has_sig": false, "md5_digest": "09c24fd1fb68602733a26ad0adb5c029", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5584, "upload_time": "2019-10-07T20:17:37", "url": "https://files.pythonhosted.org/packages/af/71/28746a8c97fa62041776cf8f70f8bd8db2f001de7cbb639023e2784e8d1a/gazpacho-0.8.tar.gz" } ], "0.8.1": [ { "comment_text": "", "digests": { "md5": "e5fe7a0e6d4a7f9d7fda22b1f7006288", "sha256": "ecb271849d79d548f1025f71db03e1848819176a8304ea7ac2a725529c6947ea" }, "downloads": -1, "filename": "gazpacho-0.8.1.tar.gz", "has_sig": false, "md5_digest": "e5fe7a0e6d4a7f9d7fda22b1f7006288", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5700, "upload_time": "2019-10-11T00:44:26", "url": "https://files.pythonhosted.org/packages/b1/75/9d0066ae540c4ebd744bf1d8db29a6ada533bf97353a11e4897feeca76e9/gazpacho-0.8.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e5fe7a0e6d4a7f9d7fda22b1f7006288", "sha256": "ecb271849d79d548f1025f71db03e1848819176a8304ea7ac2a725529c6947ea" }, "downloads": -1, "filename": "gazpacho-0.8.1.tar.gz", "has_sig": false, "md5_digest": "e5fe7a0e6d4a7f9d7fda22b1f7006288", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5700, "upload_time": "2019-10-11T00:44:26", "url": "https://files.pythonhosted.org/packages/b1/75/9d0066ae540c4ebd744bf1d8db29a6ada533bf97353a11e4897feeca76e9/gazpacho-0.8.1.tar.gz" } ] }