{ "info": { "author": "Simon Willison", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# delta-scraper\n\nIN EARLY DEVELOPMENT\n\n[![PyPI](https://img.shields.io/pypi/v/delta-scraper.svg)](https://pypi.org/project/delta-scraper/)\n[![Travis CI](https://travis-ci.com/simonw/delta-scraper.svg?branch=master)](https://travis-ci.com/simonw/delta-scraper)\n[![Documentation Status](https://readthedocs.org/projects/delta-scraper/badge/?version=latest)](http://delta-scraper.readthedocs.io/en/latest/?badge=latest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/simonw/delta-scraper/blob/master/LICENSE)\n\nPython library for scraping data sources and creating readable deltas.\n\nFor background, see [Scraping hurricane Irma](https://simonwillison.net/2017/Sep/10/scraping-irma/).\n\n## Concepts\n\nThis library allows you to define _scrapers_, which are objects that know how to retrieve information from a source (usually a web API, but scrapers can be written to operate against HTML or other formats) and persist that data somewhere as JSON.\n\nWhen a scraper fetches fresh information it has the ability to compare that data to the old data and use the difference to create a human-readable message.\n\nThese capabilities can be combined with a git repository to create a commit log, with human-readable commit messages that accompany a machine-readable diff againts the generated JSON.\n\nSee [disaster-scrapers](https://github.com/simonw/disaster-scrapers) and [disaster-data](https://github.com/simonw/disaster-scrapers) for some examples of this pattern in action.\n\n## Basic usage\n\nYou can define new scrapers by subclassing `DeltaScraper`. Here's an example which scrapes a list of FEMA shelters.\n\n class FemaShelters(DeltaScraper):\n url = \"https://gis.fema.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=FEMA:FEMANSSOpenShelters&maxFeatures=250&outputFormat=json\"\n owner = \"simonw\"\n repo = \"disaster-data\"\n filepath = \"fema/shelters.json\"\n\n record_key = \"SHELTER_ID\"\n noun = \"shelter\"\n\n def fetch_data(self):\n data = requests.get(self.url, timeout=10).json()\n return [feature[\"properties\"] for feature in data[\"features\"]]\n\n def display_record(self, record):\n display = []\n display.append(\n \" {SHELTER_NAME} in {CITY}, {STATE} ({SHELTER_STATUS})\".format(**record)\n )\n display.append(\n \" https://www.google.com/maps/search/{LATITUDE},{LONGITUDE}\".format(\n **record\n )\n )\n display.append(\" population = {TOTAL_POPULATION}\".format(**record))\n display.append(\"\")\n return \"\\n\".join(display)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/simonw/delta-scraper", "keywords": "", "license": "Apache License, Version 2.0", "maintainer": "", "maintainer_email": "", "name": "delta-scraper", "package_url": "https://pypi.org/project/delta-scraper/", "platform": "", "project_url": "https://pypi.org/project/delta-scraper/", "project_urls": { "Homepage": "https://github.com/simonw/delta-scraper" }, "release_url": "https://pypi.org/project/delta-scraper/0.1a1/", "requires_dist": [ "click", "requests", "github-contents", "pytest ; extra == 'test'", "black ; extra == 'test'" ], "requires_python": "", "summary": "Python library for scraping data sources and creating readable deltas", "version": "0.1a1" }, "last_serial": 5387125, "releases": { "0.1a1": [ { "comment_text": "", "digests": { "md5": "217fed4f228456b699a53717cefac981", "sha256": "88b863a1a436f184799aad0f2325bd3bab9c0e5d1c3a534779fd494734048a26" }, "downloads": -1, "filename": "delta_scraper-0.1a1-py3-none-any.whl", "has_sig": false, "md5_digest": "217fed4f228456b699a53717cefac981", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9188, "upload_time": "2019-06-11T15:28:44", "url": "https://files.pythonhosted.org/packages/90/04/082f6b9b55c85a82f3c354ea3b7ea29075cce0984b2782a6acee7b7ee16e/delta_scraper-0.1a1-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "217fed4f228456b699a53717cefac981", "sha256": "88b863a1a436f184799aad0f2325bd3bab9c0e5d1c3a534779fd494734048a26" }, "downloads": -1, "filename": "delta_scraper-0.1a1-py3-none-any.whl", "has_sig": false, "md5_digest": "217fed4f228456b699a53717cefac981", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9188, "upload_time": "2019-06-11T15:28:44", "url": "https://files.pythonhosted.org/packages/90/04/082f6b9b55c85a82f3c354ea3b7ea29075cce0984b2782a6acee7b7ee16e/delta_scraper-0.1a1-py3-none-any.whl" } ] }