{ "info": { "author": "Micah Cochran", "author_email": "", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Internet :: WWW/HTTP", "Topic :: Text Processing :: Markup :: HTML" ], "description": "# scrape-schema-recipe\n\n[![Build Status](https://travis-ci.org/micahcochran/scrape-schema-recipe.svg?branch=master)](https://travis-ci.org/micahcochran/scrape-schema-recipe)\n\nScrapes recipes from HTML https://schema.org/Recipe (Microdata/JSON-LD) into Python dictionaries.\n\n\n## Install\n\n```\npip install scrape-schema-recipe\n```\n\n## Requirements\n\nPython version 3.5+\n\nThis library relies heavily upon [extruct](https://github.com/scrapinghub/extruct).\n\nOther requirements: \n* isodate (>=0.5.1)\n* requests\n* validators (>=12.4).\n\n## Online Example\n\n```python\n>>> import scrape_schema_recipe\n\n>>> url = 'https://www.foodnetwork.com/recipes/alton-brown/honey-mustard-dressing-recipe-1939031'\n>>> recipe_list = scrape_schema_recipe.scrape_url(url, python_objects=True)\n>>> len(recipe_list)\n1\n>>> recipe = recipe_list[0]\n\n# Name of the recipe\n>>> recipe['name']\n'Honey Mustard Dressing'\n\n# List of the Ingredients\n>>> recipe['recipeIngredient']\n['5 tablespoons medium body honey (sourwood is nice)',\n '3 tablespoons smooth Dijon mustard',\n '2 tablespoons rice wine vinegar']\n\n# List of the Instructions\n>>> recipe['recipeInstructions']\n['Combine all ingredients in a bowl and whisk until smooth. Serve as a dressing or a dip.']\n\n# Author\n>>> recipe['author']\n[{'@type': 'Person',\n 'name': 'Alton Brown',\n 'url': 'https://www.foodnetwork.com/profiles/talent/alton-brown'}]\n```\n'@type': 'Person' is a [https://schema.org/Person](https://schema.org/Person) object\n\n\n```python\n# Preparation Time\n>>> recipe['prepTime']\ndatetime.timedelta(0, 300)\n\n# The library pendulum can give you something a little easier to read.\n>>> import pendulum\n\n# for pendulum version 1.0\n>>> pendulum.Interval.instanceof(recipe['prepTime'])\n\n\n# for version 2.0 of pendulum\n>>> pendulum.Duration(seconds=recipe['prepTime'].total_seconds())\n\n```\n\nIf `python_objects` is set to `False`, this would return the string ISO8611 representation of the duration, `'PT5M'`\n\n[pendulum's library website](https://pendulum.eustace.io/).\n\n\n```python\n# Publication date\n>>> recipe['datePublished']\ndatetime.datetime(2016, 11, 13, 21, 5, 50, 518000, tzinfo=)\n\n>>> str(recipe['datePublished'])\n'2016-11-13 21:05:50.518000-05:00'\n\n# Identifying this is http://schema.org/Recipe data (in LD-JSON format)\n >>> recipe['@context'], recipe['@type']\n('http://schema.org', 'Recipe')\n\n# Content's URL\n>>> recipe['url']\n'https://www.foodnetwork.com/recipes/alton-brown/honey-mustard-dressing-recipe-1939031'\n\n# all the keys in this dictionary\n>>> recipe.keys()\ndict_keys(['recipeYield', 'totalTime', 'dateModified', 'url', '@context', 'name', 'publisher', 'prepTime', 'datePublished', 'recipeIngredient', '@type', 'recipeInstructions', 'author', 'mainEntityOfPage', 'aggregateRating', 'recipeCategory', 'image', 'headline', 'review'])\n```\n\n## Example from a File (alternative representations)\n\nAlso works with locally saved [HTML example file](/test_data/google-recipe-example.html).\n```python\n>>> filelocation = 'test_data/google-recipe-example.html'\n>>> recipe_list = scrape_schema_recipe.scrape(filelocation, python_objects=True)\n>>> recipe = recipe_list[0]\n\n>>> recipe['name']\n'Party Coffee Cake'\n\n>>> repcipe['datePublished']\ndatetime.date(2018, 3, 10)\n\n# Recipe Instructions using the HowToStep\n>>> recipe['recipeInstructions']\n[{'@type': 'HowToStep',\n 'text': 'Preheat the oven to 350 degrees F. Grease and flour a 9x9 inch pan.'},\n {'@type': 'HowToStep',\n 'text': 'In a large bowl, combine flour, sugar, baking powder, and salt.'},\n {'@type': 'HowToStep', 'text': 'Mix in the butter, eggs, and milk.'},\n {'@type': 'HowToStep', 'text': 'Spread into the prepared pan.'},\n {'@type': 'HowToStep', 'text': 'Bake for 30 to 35 minutes, or until firm.'},\n {'@type': 'HowToStep', 'text': 'Allow to cool.'}]\n\n```\n\n\n## What Happens When Things Go Wrong\n\nIf there aren't any http://schema.org/Recipe formatted recipes on the site.\n```python\n>>> url = 'https://www.google.com'\n>>> recipe_list = scrape_schema_recipe.scrape(url, python_objects=True)\n\n>>> len(recipe_list)\n0\n```\n\nSome websites will cause an `HTTPError`.\n\nYou may get around a 403 - Forbidden Errror by putting in an alternative user-agent\nvia the variable `user_agent_str`.\n\n\n## Functions\n\n* `load()` - load HTML schema.org/Recipe structured data from a file or file-like object\n* `loads()` - loads HTML schema.org/Recipe structured data from a string\n* `scrape_url()` - scrape a URL for HTML schema.org/Recipe structured data \n* `scrape()` - load HTML schema.org/Recipe structured data from a file, file-like object, string, or URL\n\n```\n Parameters\n ----------\n location : string or file-like object\n A url, filename, or text_string of HTML, or a file-like object.\n\n python_objects : bool, list, or tuple (optional)\n when True it translates certain data types into python objects\n dates into datetime.date, datetimes into datetime.datetimes,\n durations as dateime.timedelta.\n when set to a list or tuple only converts types specified to\n python objects:\n * when set to either [dateime.date] or [datetime.datetimes] either will\n convert dates.\n * when set to [datetime.timedelta] durations will be converted\n when False no conversion is performed\n (defaults to False)\n\n nonstandard_attrs : bool, optional\n when True it adds nonstandard (for schema.org/Recipe) attributes to the\n resulting dictionaries, that are outside the specification such as:\n '_format' is either 'json-ld' or 'microdata' (how schema.org/Recipe was encoded into HTML)\n '_source_url' is the source url, when 'url' has already been defined as another value\n (defaults to False)\n\n migrate_old_schema : bool, optional\n when True it migrates the schema from older version to current version\n (defaults to True)\n\n user_agent_str : string, optional ***only for scrape_url() and scrape()***\n overide the user_agent_string with this value.\n (defaults to None)\n\n Returns\n -------\n list\n a list of dictionaries in the style of schema.org/Recipe JSON-LD\n no results - an empty list will be returned\n```\n\nThese are also available with `help()` in the python console.\n\n## Example function\nThe `example_output()` function gives quick access to data for prototyping and debugging.\nIt accepts the same parameters as load(), but the first parameter, `name`, is different.\n\n```python\n>>> from scrape_schema_recipes import example_names, example_output\n\n>>> example_names\n('irish-coffee', 'google', 'tart', 'tea-cake', 'truffles')\n\n>>> recipes = example_output('truffles')\n>>> recipes[0]['name']\n'Rum & Tonka Bean Dark Chocolate Truffles'\n```\n\n\n## Files\n\nLicense: Apache 2.0 see [LICENSE](LICENSE)\n\nTest data attribution and licensing: [ATTRIBUTION.md](ATTRIBUTION.md)\n\n## Development\n\nUnit testing can be run by:\n```\nschema-recipe-scraper$ python3 test_scrape.py\n```\n\nmypy is used for static type checking\n\nfrom the project directory:\n```\n schema-recipe-scraper$ mypy schema_recipe_scraper/scrape.py\n```\n\nIf you run mypy from another directory the `--ignore-missing-imports` flag will need to be added,\nthus `$ mypy --ignore-missing-imports scrape.py`\n\n`--ignore-missing-imports` flag is used because most libraries don't have static typing information contained\nin their own code or typeshed.\n\n## Reference Documentation\nHere are some references for how schema.org/Recipe *should* be structured:\n\n* [https://schema.org/Recipe](https://schema.org/Recipe) - official specification\n* [Recipe Google Search Guide](https://developers.google.com/search/docs/data-types/recipe) - material teaching developers how to use the schema (with emphasis on how structured data impacts search results)\n\n\n## Other Similar Python Libraries\n\n* [recipe_scrapers](https://github.com/hhursev/recipe-scrapers) - library scrapes\nrecipes by using the HTML tags using BeautifulSoup. It has drivers for each and every\nsupported website. This is a great fallback for when schema-recipe-scraper will not\nscrape a site.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/micahcochran/scrape-schema-recipe", "keywords": "recipe,cooking,food,schema.org,schema.org/Recipe", "license": "Apache-2", "maintainer": "", "maintainer_email": "", "name": "scrape-schema-recipe", "package_url": "https://pypi.org/project/scrape-schema-recipe/", "platform": "", "project_url": "https://pypi.org/project/scrape-schema-recipe/", "project_urls": { "Homepage": "https://github.com/micahcochran/scrape-schema-recipe" }, "release_url": "https://pypi.org/project/scrape-schema-recipe/0.1.1/", "requires_dist": [ "setuptools (>=30.3.0)", "extruct", "isodate (>=0.5.1)", "requests", "validators (>=0.12.4)" ], "requires_python": "", "summary": "Extracts cooking recipe from HTML structured data in the https://schema.org/Recipe format.", "version": "0.1.1" }, "last_serial": 5498418, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "4c89e7f7790a4e88b578f1462a380b66", "sha256": "4ef2af56d709fff13380049d28e8d2ac468b01a8bd2eb21d057f6b61f1e4b14b" }, "downloads": -1, "filename": "scrape_schema_recipe-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "4c89e7f7790a4e88b578f1462a380b66", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12072, "upload_time": "2018-07-26T04:24:15", "url": "https://files.pythonhosted.org/packages/2a/08/d4ee405d1d39b3d6219f208308b497fa5e3ce55e6941bb18e47571d58a7c/scrape_schema_recipe-0.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a08da8d7db627dce65c7bdfa2418aca1", "sha256": "ade7d771d10d191b9cc1f2a5f34b2ba12991865d4a87156507f91146056f3973" }, "downloads": -1, "filename": "scrape-schema-recipe-0.0.1.tar.gz", "has_sig": false, "md5_digest": "a08da8d7db627dce65c7bdfa2418aca1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7109, "upload_time": "2018-07-26T04:24:16", "url": "https://files.pythonhosted.org/packages/77/6c/0be110723937e2ced1a5df06174a47b627e59e9cc1f7f1544bfb033123ea/scrape-schema-recipe-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "e5cbf99e4c6388db057fd477ca364129", "sha256": "4b8869a9418a6ca1d58d22e8b2ee0dd079d5b1befa9096f0e4e90480d57f481d" }, "downloads": -1, "filename": "scrape_schema_recipe-0.0.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e5cbf99e4c6388db057fd477ca364129", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12206, "upload_time": "2018-08-09T01:56:56", "url": "https://files.pythonhosted.org/packages/29/75/2a1780f5c2e44f4b6c3721193dc6cca1dda09a88d50484b99a8762d6e211/scrape_schema_recipe-0.0.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f3d1d6acd74df575ce2404662f8d4df9", "sha256": "9eed3147815905f3d2cf440e0ad613ef234ac0a2ef354ce760686943568e2fad" }, "downloads": -1, "filename": "scrape-schema-recipe-0.0.2.tar.gz", "has_sig": false, "md5_digest": "f3d1d6acd74df575ce2404662f8d4df9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7271, "upload_time": "2018-08-09T01:56:57", "url": "https://files.pythonhosted.org/packages/1b/be/eb6eb830e61ea55dece5ec4a693dac00766182b129f83808825c7104ff05/scrape-schema-recipe-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "eb8b571d76ab35a8accc3f95297f56a1", "sha256": "c0b549945a974822350ba0b4628241a8952366a60f81a01da07991ceb3592253" }, "downloads": -1, "filename": "scrape_schema_recipe-0.0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "eb8b571d76ab35a8accc3f95297f56a1", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12374, "upload_time": "2018-08-26T02:39:30", "url": "https://files.pythonhosted.org/packages/29/4f/97e80ee7c0198cb7e31537eec789f7eb84520412a17ce271757cbb03e24b/scrape_schema_recipe-0.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "187ca413a4a0337fec99b5d4ad484797", "sha256": "a6f185949a7e2bff62b2319b2eef15371cb1724b158084dab8a51ee4932c07cc" }, "downloads": -1, "filename": "scrape-schema-recipe-0.0.3.tar.gz", "has_sig": false, "md5_digest": "187ca413a4a0337fec99b5d4ad484797", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7435, "upload_time": "2018-08-26T02:39:31", "url": "https://files.pythonhosted.org/packages/7d/12/75cc7925981e0d8bd6b31f28084606599baef9b305f60a2a7c706e6575a3/scrape-schema-recipe-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "07400b40bffe8c5d54f52c1392ce21f6", "sha256": "5928021aace69b8209ec424b20ac647b4e1a35b0b86d99209bfe8f89f2e7923e" }, "downloads": -1, "filename": "scrape_schema_recipe-0.0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "07400b40bffe8c5d54f52c1392ce21f6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 13833, "upload_time": "2019-03-11T01:44:44", "url": "https://files.pythonhosted.org/packages/af/cc/03b63e1e05326df861ff72ca9aa0560e00e7992bbd3f3e28e24a42c93b79/scrape_schema_recipe-0.0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "16d01cf053eab96b7eda5ab2ca31a542", "sha256": "f93e02707204b6728bca2fdfa844d985fdd9e3506ab95154840dfe81442cfb51" }, "downloads": -1, "filename": "scrape-schema-recipe-0.0.4.tar.gz", "has_sig": false, "md5_digest": "16d01cf053eab96b7eda5ab2ca31a542", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15014, "upload_time": "2019-03-11T01:44:45", "url": "https://files.pythonhosted.org/packages/b2/44/3bfae57f161b842c53ed81cead93849068c73161ef38acbca56f722b1d1a/scrape-schema-recipe-0.0.4.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "8410a5c1d74c99b470ddeac26e5dc86e", "sha256": "79dbd23ca5b9e0765e3eb61f9d3e583e19beadf5f4f28fd3542ba32bafdd538b" }, "downloads": -1, "filename": "scrape_schema_recipe-0.1.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "8410a5c1d74c99b470ddeac26e5dc86e", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14466, "upload_time": "2019-06-29T00:59:20", "url": "https://files.pythonhosted.org/packages/fd/2b/b5555f25233b258e971b7223626e06f497d47e88b2de7ddf38a6ef21783e/scrape_schema_recipe-0.1.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2f376e3b430ed16dd70024414cb09a1a", "sha256": "17d73f5256544c5529b23c1cc37b8b51d5d1668f77b06e02603d4231f1966a95" }, "downloads": -1, "filename": "scrape-schema-recipe-0.1.0.tar.gz", "has_sig": false, "md5_digest": "2f376e3b430ed16dd70024414cb09a1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15513, "upload_time": "2019-06-29T00:59:22", "url": "https://files.pythonhosted.org/packages/10/00/b82eaaf332ba24a29d50de40793055f28d4a9bf341a8b2b5fd60d34eaf99/scrape-schema-recipe-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "a8c27f59c304faee5c4f90fc08304693", "sha256": "1394dc7a4f60e4cdd1bdd995c69e9f73b5b42fcf96f9e1ec9f1d012b94729553" }, "downloads": -1, "filename": "scrape_schema_recipe-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a8c27f59c304faee5c4f90fc08304693", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14679, "upload_time": "2019-07-07T23:43:24", "url": "https://files.pythonhosted.org/packages/fb/79/4a38221e3bc05b106831f868be468e826aaa50da22372831a7947f385005/scrape_schema_recipe-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d26341115d94a8563273d3afccadce76", "sha256": "9c7bfe7af491480f381a03ef4cd81e67ec770548161c136740f1ebe168fa3911" }, "downloads": -1, "filename": "scrape-schema-recipe-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d26341115d94a8563273d3afccadce76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15750, "upload_time": "2019-07-07T23:43:25", "url": "https://files.pythonhosted.org/packages/33/7c/973457cd7f9f87f7746ca161eaf48bdef28d3d79685a76f55daa4c9500b5/scrape-schema-recipe-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a8c27f59c304faee5c4f90fc08304693", "sha256": "1394dc7a4f60e4cdd1bdd995c69e9f73b5b42fcf96f9e1ec9f1d012b94729553" }, "downloads": -1, "filename": "scrape_schema_recipe-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a8c27f59c304faee5c4f90fc08304693", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14679, "upload_time": "2019-07-07T23:43:24", "url": "https://files.pythonhosted.org/packages/fb/79/4a38221e3bc05b106831f868be468e826aaa50da22372831a7947f385005/scrape_schema_recipe-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d26341115d94a8563273d3afccadce76", "sha256": "9c7bfe7af491480f381a03ef4cd81e67ec770548161c136740f1ebe168fa3911" }, "downloads": -1, "filename": "scrape-schema-recipe-0.1.1.tar.gz", "has_sig": false, "md5_digest": "d26341115d94a8563273d3afccadce76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15750, "upload_time": "2019-07-07T23:43:25", "url": "https://files.pythonhosted.org/packages/33/7c/973457cd7f9f87f7746ca161eaf48bdef28d3d79685a76f55daa4c9500b5/scrape-schema-recipe-0.1.1.tar.gz" } ] }