{ "info": { "author": "Friedrich Lindenberg", "author_email": "friedrich@pudo.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7" ], "description": "# scrapekit\n\nDid you know the entire web was made of data? You probably did.\nScrapekit helps you get that data with simple Python scripts. Based on\n[requests](http://docs.python-requests.org/), the library will handles\ncaching, threading and logging.\n\nSee the [full documentation](http://scrapekit.readthedocs.org/).\n\n## Example\n\n```python\nfrom scrapekit import Scraper\n\nscraper = Scraper('example')\n\n@scraper.task\ndef get_index():\n url = 'http://databin.pudo.org/t/b2d9cf'\n doc = scraper.get(url).html()\n for row in doc.findall('.//tr'):\n yield row\n\n@scraper.task\ndef get_row(row):\n columns = row.findall('./td')\n print columns\n\npipeline = get_index | get_row\nif __name__ == '__main__':\n pipeline.run()\n \n```\n\n## Works well with\n\nScrapekit doesn't aim to provide all functionality necessary for\nscraping. Specifically, it doesn't address HTML parsing, data storage\nand data validation. For these needs, check the following libraries:\n\n* [lxml](http://lxml.de/) for HTML/XML parsing; much faster and more \n flexible than [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/).\n* [dataset](http://dataset.rtfd.org) is a sister library of scrapekit\n that simplifies storing semi-structured data in SQL databases.\n\n## Existing tools\n\n* [Scrapy](http://scrapy.org/) is a much more mature and comprehensive\n framework for developing scrapers. On the other hand, it requires you to\n develop scrapers within its class system. This can be too heavyweight\n for a simple script to grab data off a web site.\n* [scrapelib](http://scrapelib.readthedocs.org/) is a thin wrapper\n around requests that does throttling, retries and caching.\n* [MechanicalSoup](https://github.com/hickford/MechanicalSoup) binds \n BeautifulSoup and requests into an imperative, stateful API.\n\n## Credits and license\n\nScrapekit is licensed under the terms of the MIT license, which is also\nincluded in [LICENSE](LICENSE). It was developed through projects of\n[ICFJ](http://icfj.org), [ANCIR](http://investigativecenters.org) and\n[ICIJ](http://icij.org).\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/pudo/scrapekit", "keywords": "web scraping crawling http cache threading", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "scrapekit", "package_url": "https://pypi.org/project/scrapekit/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/scrapekit/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/pudo/scrapekit" }, "release_url": "https://pypi.org/project/scrapekit/0.2.1/", "requires_dist": null, "requires_python": null, "summary": "Light-weight tools for web scraping", "version": "0.2.1" }, "last_serial": 1214092, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "e7c5d57af5c3dea07ea1f5ef08721a93", "sha256": "1d9ab31bc7be7fd15e2d5dd7e6eca8fdc75b0dfcd1ae540aa0d765a9059a3a90" }, "downloads": -1, "filename": "scrapekit-0.0.1.tar.gz", "has_sig": false, "md5_digest": "e7c5d57af5c3dea07ea1f5ef08721a93", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5701, "upload_time": "2014-08-10T10:51:47", "url": "https://files.pythonhosted.org/packages/16/af/08c4bdf24946a1d76cbee58496ba6b9e456a0c498c4b6f6be4af88da4f07/scrapekit-0.0.1.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "4da376dbc7448e9d1ad84ad14557b497", "sha256": "368644c440833da46f5f09ffc55ba05e863b4c432383b8a48544ece417296f32" }, "downloads": -1, "filename": "scrapekit-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4da376dbc7448e9d1ad84ad14557b497", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8942, "upload_time": "2014-08-14T07:59:21", "url": "https://files.pythonhosted.org/packages/91/a7/25690ec69c92359d8515cc97fefeb61692bd49bc95d70ce904c1aa6e89b7/scrapekit-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "6d31d9c0bdd3c57809aaa1d3578a4650", "sha256": "3b2e29786fe3e3b772259824f3a00ffb07a5aebcf35a3951702c44fa5a5a5050" }, "downloads": -1, "filename": "scrapekit-0.1.1.tar.gz", "has_sig": false, "md5_digest": "6d31d9c0bdd3c57809aaa1d3578a4650", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12183, "upload_time": "2014-08-14T08:04:07", "url": "https://files.pythonhosted.org/packages/b4/9f/3222d68ef11d738ab1044cd0eb6dcaad1135012c852030ab3da987614657/scrapekit-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "7ca548731d4bba79fc2242422bbd05a3", "sha256": "5b5b25a2569e828aed7aa75e58e59cae8407366a5c459d8a273e486417b113f3" }, "downloads": -1, "filename": "scrapekit-0.2.0.tar.gz", "has_sig": false, "md5_digest": "7ca548731d4bba79fc2242422bbd05a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14287, "upload_time": "2014-08-26T14:00:44", "url": "https://files.pythonhosted.org/packages/66/28/0e70e6efb4ba3c6218b2e16bbbb77d2468cdcdb0db5286dcfdb96ca4ad3e/scrapekit-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "6509d85ad0dd67d816009b5d618472b4", "sha256": "409cf2aeee9ced382018ab8f84914c41115df5b023265480311d50a34230c8e5" }, "downloads": -1, "filename": "scrapekit-0.2.1.tar.gz", "has_sig": false, "md5_digest": "6509d85ad0dd67d816009b5d618472b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14314, "upload_time": "2014-09-05T13:05:07", "url": "https://files.pythonhosted.org/packages/0e/5c/0639b8a935fbcc03b457a7a54beda999f04bfe74fbecd9a1fa84f82eea6f/scrapekit-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6509d85ad0dd67d816009b5d618472b4", "sha256": "409cf2aeee9ced382018ab8f84914c41115df5b023265480311d50a34230c8e5" }, "downloads": -1, "filename": "scrapekit-0.2.1.tar.gz", "has_sig": false, "md5_digest": "6509d85ad0dd67d816009b5d618472b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14314, "upload_time": "2014-09-05T13:05:07", "url": "https://files.pythonhosted.org/packages/0e/5c/0639b8a935fbcc03b457a7a54beda999f04bfe74fbecd9a1fa84f82eea6f/scrapekit-0.2.1.tar.gz" } ] }