{ "info": { "author": "Jens Finn\u00e4s and Leo Wallentin, J++; Robin Linderborg", "author_email": "stockholm@jplusplus.org", "bugtrack_url": null, "classifiers": [], "description": "Statscraper is a base library for building web scrapers for statistical data, with a helper ontology for (primarily Swedish) statistical data. A set of ready-to-use scrapers are included.\n\nFor users\n=========\n\nYou can use Statscraper as a foundation for your next scraper, or try out any of the included scrapers. With Statscraper comes a unified interface for scraping, and some useful helper methods for scraper authors.\n\nFull documentation: ReadTheDocs_\n\nFor updates and discussion: Facebook_\n\nBy `Journalism++ Stockholm `_, and Robin Linderborg.\n\nInstalling\n----------\n\n.. code:: bash\n\n pip install statscraper\n\nUsing a scraper\n---------------\nScrapers acts like \u201ccursors\u201d that move around a hierarchy of datasets and collections of datasets. Collections and datasets are refered to as \u201citems\u201d.\n\n::\n\n \u250f\u2501 Collection \u2501\u2501\u2501 Collection \u2501\u2533\u2501 Dataset\n ROOT \u2501\u254b\u2501 Collection \u2501\u2533\u2501 Dataset \u2523\u2501 Dataset\n \u2517\u2501 Collection \u2523\u2501 Dataset \u2517\u2501 Dataset\n \u2517\u2501 Dataset\n\n \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n items\n\nHere's a simple example, with a scraper that returns only a single dataset: The number of cranes spotted at Hornborgarsj\u00f6n each day as scraped from `L\u00e4nsstyrelsen i V\u00e4stra G\u00f6talands l\u00e4n `_.\n\n.. code:: python\n\n >>> from statscraper.scrapers import Cranes\n\n >>> scraper = Cranes()\n >>> scraper.items # List available datasets\n []\n\n >>> dataset = scraper[\"Number of cranes\"]\n >>> dataset.dimensions\n [, , ]\n\n >>> row = dataset.data[0] # first row in this dataset\n >>> row\n \n >>> row.dict\n {'value': '7', u'date': u'7', u'month': u'march', u'year': u'2015'}\n\n >>> df = dataset.data.pandas # get this dataset as a Pandas dataframe\n\nBuilding a scraper\n------------------\nScrapers are built by extending a base scraper, or a derative of that. You need to provide a method for listing datasets or collections of datasets, and for fetching data.\n\nStatscraper is built for statistical data, meaning that it's most useful when the data you are scraping/fetching can be organized with a numerical value in each row:\n\n======== ====== =======\n city year value\n======== ====== =======\nVoi 2009 45483\nKabarnet 2006 10191\nTaveta 2009 67505\n======== ====== =======\n\nA scraper can override these methods:\n\n* `_fetch_itemslist(item)` to yield collections or datasets at the current cursor position\n* `_fetch_data(dataset)` to yield rows from the currently selected dataset\n* `_fetch_dimensions(dataset)` to yield dimensions available for the currently selected dataset\n* `_fetch_allowed_values(dimension)` to yield allowed values for a dimension\n\nA number of hooks are avaiable for more advanced scrapers. These are called by adding the on decorator on a method:\n\n.. code:: python\n\n @BaseScraper.on(\"up\")\n def my_method(self):\n # Do something when the user moves up one level\n\nFor developers\n==============\nThese instructions are for developers working on the BaseScraper. See above for instructions for developing a scraper using the BaseScraper.\n\nDownloading\n-----------\n\n.. code:: bash\n\n git clone https://github.com/jplusplus/skrejperpark\n python setup.py install\n\nTests\n-----\n\n.. code:: bash\n\n python setup.py test\n\nRun `python setup.py test` from the root directory. This will install everything needed for testing, before running tests with `nosetests`.\n\n\nChangelog\n---------\n\n- 1.0.7\n\n - Remove logic from SCBScraper that is already handled by BaseScraper\n\n- 1.0.6\n\n - Added dialect:skatteverket (two/four digit county/municipality codes)\n - Added data type for road category\n - Make SCB scraper treat a \u201cRegion\u201d as, well, a region\n\n- 1.0.5\n - Added station key to SMHI scraper\n\n- 1.0.4\n - Added SMHI scraper\n\n- 1.0.3\n - Re-add demo scrapers that accidently got left out in the first release\n\n- 1.0.0\n - First release\n\n- 1.0.0.dev2\n\n - Implement translation\n - Add Dataset.fetch_next() as generator for results\n\n- 1.0.0.dev1\n\n - Semantic versioning starts here\n - Implement datatypes and dialects\n\n- 0.0.2\n\n - Added some demo scrapers\n - The cursor is now moved when accessing datasets\n - Renamed methods for moving cursor: move_up(), move_to()\n - Added tests\n - Added datatypes subtree\n\n- 0.0.1\n - First version\n\n.. _Facebook: https://www.facebook.com/groups/skrejperpark\n.. _ReadTheDocs: http://statscraper.readthedocs.io\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/jplusplus/skrejperpark/archive/1.0.7.tar.gz", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jplusplus/statscraper", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "statscraper", "package_url": "https://pypi.org/project/statscraper/", "platform": "", "project_url": "https://pypi.org/project/statscraper/", "project_urls": { "Download": "https://github.com/jplusplus/skrejperpark/archive/1.0.7.tar.gz", "Homepage": "https://github.com/jplusplus/statscraper" }, "release_url": "https://pypi.org/project/statscraper/1.0.7/", "requires_dist": [ "csvkit", "pandas", "requests", "six" ], "requires_python": "", "summary": "A base class for building web scrapers for statistical data.", "version": "1.0.7" }, "last_serial": 4018330, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "7131f01abdeeb9aa6b2eae69a4d7c0a5", "sha256": "0090dcd559d7af3044d9126ace4743d4c24a85d907ededfb842ad82925aca59b" }, "downloads": -1, "filename": "statscraper-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "7131f01abdeeb9aa6b2eae69a4d7c0a5", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 31728, "upload_time": "2017-08-28T14:41:56", "url": "https://files.pythonhosted.org/packages/f2/bb/88828db2d01f67800036eca84b7cd33177af5ccc3005e1a3158b291c91c8/statscraper-1.0.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8a9bf140597a17be87b2f94509bd5fcb", "sha256": "fdb9c485bcffe3660aca2be884ee87aae1c45502de3386c5128602f596dd6a8b" }, "downloads": -1, "filename": "statscraper-1.0.0.tar.gz", "has_sig": false, "md5_digest": "8a9bf140597a17be87b2f94509bd5fcb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23986, "upload_time": "2017-08-28T14:41:57", "url": "https://files.pythonhosted.org/packages/3b/85/0edab6a0b9204e9d6824a72e189dda8990fd89b0596dc8817d1977e7b54c/statscraper-1.0.0.tar.gz" } ], "1.0.0.dev1": [ { "comment_text": "", "digests": { "md5": "e23cd7cf419840f85307a82607dcc969", "sha256": "e85bce4195a810f0ab7441f10c2e40d099005bc34f11de662096f0b352fff7f9" }, "downloads": -1, "filename": "statscraper-1.0.0.dev1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "e23cd7cf419840f85307a82607dcc969", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 26124, "upload_time": "2017-06-30T18:49:38", "url": "https://files.pythonhosted.org/packages/57/67/53270c07f4911db745d8eaa180c7854401cecf27c4be7e12d9be2f7de184/statscraper-1.0.0.dev1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f39c071b550fe110ddf37482d6e3dbcb", "sha256": "cea66b8e96791c772f18169096e29a4b26b5cec1e6c741ab5707480ff3752040" }, "downloads": -1, "filename": "statscraper-1.0.0.dev1.tar.gz", "has_sig": false, "md5_digest": "f39c071b550fe110ddf37482d6e3dbcb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20837, "upload_time": "2017-06-30T18:49:41", "url": "https://files.pythonhosted.org/packages/13/bf/f0b83e003a4b73f11360220c8e207bb428fc92a952f952c18ab9e7414ece/statscraper-1.0.0.dev1.tar.gz" } ], "1.0.0.dev2": [ { "comment_text": "", "digests": { "md5": "0be90754ff765e1375404002b20bb02b", "sha256": "6e91fe3aeb7394c03a8182c0d3b144af65a9a9fb206bd9dd72feca79cd3634e2" }, "downloads": -1, "filename": "statscraper-1.0.0.dev2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0be90754ff765e1375404002b20bb02b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 31682, "upload_time": "2017-08-09T15:10:03", "url": "https://files.pythonhosted.org/packages/95/c1/3d1e93564de1a24ff54a46f7e7b41b635d27991193468bcd50ebc91dc513/statscraper-1.0.0.dev2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "49e5c21033b876e4fe2681448d47eed6", "sha256": "44aace0747d3317f1061b9b6f240d8639d850010d082734c0d2b2c49b7f4bbeb" }, "downloads": -1, "filename": "statscraper-1.0.0.dev2.tar.gz", "has_sig": false, "md5_digest": "49e5c21033b876e4fe2681448d47eed6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23949, "upload_time": "2017-08-09T15:10:05", "url": "https://files.pythonhosted.org/packages/93/0c/e366b72fa0ba0c7ffec54036e4214000cb7f746dcbcfd0856d916c430edb/statscraper-1.0.0.dev2.tar.gz" } ], "1.0.3": [ { "comment_text": "", "digests": { "md5": "56c49a9650867c8f60167c40eb0885d8", "sha256": "7473b528075061c619261bb9fd3ee56401247dd7a998f3e06ac33e3ea6a0900f" }, "downloads": -1, "filename": "statscraper-1.0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "56c49a9650867c8f60167c40eb0885d8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 49241, "upload_time": "2017-09-21T14:47:21", "url": "https://files.pythonhosted.org/packages/33/ed/e7f9bc737e0558a04a22f90890b896928f10fc883283b25b9150b55cfe7c/statscraper-1.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "84fa32b413c62690dc73dc3bd9790bf5", "sha256": "877ad69b57a382433cf02f1a6e0128e7b425c3bd4ad86bbd802c04c34d39abdd" }, "downloads": -1, "filename": "statscraper-1.0.3.tar.gz", "has_sig": false, "md5_digest": "84fa32b413c62690dc73dc3bd9790bf5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 37423, "upload_time": "2017-09-21T14:47:23", "url": "https://files.pythonhosted.org/packages/28/1a/473b63f05671fae57b41d1a920d9261b459149c59f545dd2b982581d07c1/statscraper-1.0.3.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "505d5d5db013e70d5e5c3aff18452c62", "sha256": "651dc14af881ee5df0ea8e74c1f296054e9b335d570465f61e935d7de33e9d8e" }, "downloads": -1, "filename": "statscraper-1.0.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "505d5d5db013e70d5e5c3aff18452c62", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 54832, "upload_time": "2017-09-22T11:18:31", "url": "https://files.pythonhosted.org/packages/70/e6/9fddbc0513d776bd80afe30bc3aae943248fd40379c6c3e61d03d5f282f7/statscraper-1.0.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "023135c380ed14f68f2a40e445307811", "sha256": "0c8c2352d059c5f141748fb6f032463ca3f89ab8451bc01d7c662e4426a9318a" }, "downloads": -1, "filename": "statscraper-1.0.5.tar.gz", "has_sig": false, "md5_digest": "023135c380ed14f68f2a40e445307811", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44305, "upload_time": "2017-09-22T11:18:33", "url": "https://files.pythonhosted.org/packages/ba/ea/0e19322995a00335421db66ec005d5d7885211d6d8cee53187fdc01dcf0e/statscraper-1.0.5.tar.gz" } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "96e82fff07c9655b55dc2d0d9e443955", "sha256": "514732564b61b3d665219ee010973954509bf3890128ac04ed3a4d631a836da8" }, "downloads": -1, "filename": "statscraper-1.0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "96e82fff07c9655b55dc2d0d9e443955", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 57412, "upload_time": "2018-02-06T15:04:35", "url": "https://files.pythonhosted.org/packages/a0/df/4da3ca0fafec91b32e357f5ea6a9cc75b4551f4e50bb2a5d4350073b0f40/statscraper-1.0.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4dbf28012edd23bd7aa00063fcb7ebe9", "sha256": "2ba1f5ab2187da5e0e14fa790fdfb03d17f85f46cb17c712d6b8f078e4646e12" }, "downloads": -1, "filename": "statscraper-1.0.6.tar.gz", "has_sig": false, "md5_digest": "4dbf28012edd23bd7aa00063fcb7ebe9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44765, "upload_time": "2018-02-06T15:04:38", "url": "https://files.pythonhosted.org/packages/ad/8c/194edb7a612aea3a5aceb8e5d79ca3034d9adb9ef553f3f5bc8612a4ea7e/statscraper-1.0.6.tar.gz" } ], "1.0.7": [ { "comment_text": "", "digests": { "md5": "208fae29792e705bdeb0370729c0b676", "sha256": "a445e6ea09e0f3c1094aaab9b0357be6ff75531eae3e795c75c00c1fcd9e6b33" }, "downloads": -1, "filename": "statscraper-1.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "208fae29792e705bdeb0370729c0b676", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 57425, "upload_time": "2018-06-30T14:47:41", "url": "https://files.pythonhosted.org/packages/20/6a/a723dcb694a75e46a5e3cdd94985dd56df2e001136875852e6ce39dec2f9/statscraper-1.0.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b6b42bb99945d2ee0a55eab736d8fa7f", "sha256": "e7c28b4ed6025cd08df73752dc5cbea47fc4689b259afe1a75182bdfbd68ebf1" }, "downloads": -1, "filename": "statscraper-1.0.7.tar.gz", "has_sig": false, "md5_digest": "b6b42bb99945d2ee0a55eab736d8fa7f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44731, "upload_time": "2018-06-30T14:47:42", "url": "https://files.pythonhosted.org/packages/13/47/259792b661972be434669636d3ceba118341176f686abf8feb36b63fa9c8/statscraper-1.0.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "208fae29792e705bdeb0370729c0b676", "sha256": "a445e6ea09e0f3c1094aaab9b0357be6ff75531eae3e795c75c00c1fcd9e6b33" }, "downloads": -1, "filename": "statscraper-1.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "208fae29792e705bdeb0370729c0b676", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 57425, "upload_time": "2018-06-30T14:47:41", "url": "https://files.pythonhosted.org/packages/20/6a/a723dcb694a75e46a5e3cdd94985dd56df2e001136875852e6ce39dec2f9/statscraper-1.0.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b6b42bb99945d2ee0a55eab736d8fa7f", "sha256": "e7c28b4ed6025cd08df73752dc5cbea47fc4689b259afe1a75182bdfbd68ebf1" }, "downloads": -1, "filename": "statscraper-1.0.7.tar.gz", "has_sig": false, "md5_digest": "b6b42bb99945d2ee0a55eab736d8fa7f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44731, "upload_time": "2018-06-30T14:47:42", "url": "https://files.pythonhosted.org/packages/13/47/259792b661972be434669636d3ceba118341176f686abf8feb36b63fa9c8/statscraper-1.0.7.tar.gz" } ] }