{ "info": { "author": "ddio", "author_email": "ddio@ddio.io", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# TW Rental House Utility for Scrapy\n\nThis package is built for crawling Taiwanese rental house related website using [Scrapy](https://scrapy.org/).\nAs behaviour of crawlers may differ from their goal, scale, and pipeline, this package provides only minimun feature set, which allow developer to list and decode a rental house web page into structured data, without knowning too much about detail HTML and API structure of each website. In addition, this package is also designed for extensibility, which allow developers to insert customized callback, manipulate data, and integrate with existing crawler structure.\n\nAlthough this package provide the ability to crawl rental house website, it's developer's responsibility to ensure crawling mechanism and usage of data. Please be friendly to target website, such as consider using [DOWNLOAD_DELAY](https://doc.scrapy.org/en/latest/topics/settings.html#std:setting-DOWNLOAD_DELAY) or [AUTO_THROTTLING](https://doc.scrapy.org/en/latest/topics/autothrottle.html) to prevent bulk requesting.\n\n## Requirement\n\n1. Python 3.5+\n\n## Installation\n\n```bash\npip install scrapy-tw-rental-house\n```\n\n## Basic Usage\n\nThis package currently support [591](http://rent.591.com.tw/). Each rental house website is a Scrapy Spider class. You can either crawl entire website using default setting , which will take couple days, or customize the behaviour base on your need.\n\nThe most basic usage would be creating a new Spider class that inherit Rental591Spider:\n\n```python\nfrom scrapy_twrh.spiders.rental591 import Rental591Spider\n\nclass MyAwesomeSpider(Rental591Spider):\n name='awesome'\n```\n\nAnd than start crawling by\n\n```bash\nscrapy crawl awesome\n```\n\nPlease see [example](https://github.com/g0v/tw-rental-house-data/tree/master/scrapy-package/examples) for detail usage.\n\n## Items\n\nAll spiders populates 2 type of Scrapy items: `GenericHouseItem` and `RawHouseItem`.\n\n`GenericHouseItem` contains normalized data field, spirders from different website will decode their data and fit into this schema in best effort.\n\n`RawHouseItem` contains unnormalized data field, which keep original and structured data in best effort.\n\nNote that both item are super set of schema. It developer's responsibility to check which field is provided when receiving an item.\nFor example, in `Rental591Spider`, for a single rental house, Scrapy will get:\n\n1. 1x `RawHouseItem` + 1x `GenericHouseItem` during listing all houses, which provide only minimun data field for `GenericHouseItem`\n2. 1x `RawHouseItem` + 2x `GenericHouseItem` during retrieving house detail. The 2nd `GenericHouseItem` contains only location info.\n\n## Handlers\n\nAll spiders in this package provide the following handlers:\n\n1. `start_list`, similiar to `start_requests` in Scrapy, control how crawler issue search/list request to find all rental houses.\n2. `parse_list`, similiar to `parse` in Scrapy, control how crawler handles response from `start_list` and generate request for detail house info page.\n3. `parse_detail`, control how crawler parse detail page.\n\nAll spiders implements their own default handler, say, `default_start_list`, `default_parse_list`, and `default_parse_detail`, and can be overwrite during `__init__`. Please see [example](https://github.com/g0v/tw-rental-house-data/tree/master/scrapy-package/examples) for how to control spider behavior using handlers.\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/g0v/tw-rental-house-data/tree/master/scrapy-package", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "scrapy-tw-rental-house", "package_url": "https://pypi.org/project/scrapy-tw-rental-house/", "platform": "", "project_url": "https://pypi.org/project/scrapy-tw-rental-house/", "project_urls": { "Homepage": "https://github.com/g0v/tw-rental-house-data/tree/master/scrapy-package" }, "release_url": "https://pypi.org/project/scrapy-tw-rental-house/0.1.2/", "requires_dist": [ "Scrapy (>=1)" ], "requires_python": "", "summary": "Scrapy spider for TW Rental House", "version": "0.1.2" }, "last_serial": 5831094, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "636a63ddd2c1dc3288d94d3637536171", "sha256": "73d7ef30bc9dc813d7bbe35a817b2328f3b483994e306e7eb087971f0646be84" }, "downloads": -1, "filename": "scrapy_tw_rental_house-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "636a63ddd2c1dc3288d94d3637536171", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37807, "upload_time": "2019-06-11T03:54:25", "url": "https://files.pythonhosted.org/packages/e2/59/7171d7928ee5534cf204b4b26b2bd87adf9aa29d748129d7ebf6d6d70253/scrapy_tw_rental_house-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6a6e680d1eb821a8683bd829397ee1d2", "sha256": "79266d5643ab286727b3a7426545da21aaf43788ffc3bca019df7fd6cd69757b" }, "downloads": -1, "filename": "scrapy-tw-rental-house-0.1.0.tar.gz", "has_sig": false, "md5_digest": "6a6e680d1eb821a8683bd829397ee1d2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18519, "upload_time": "2019-06-11T03:54:27", "url": "https://files.pythonhosted.org/packages/22/2b/8dfd4cbbf22532e894650d2b9536331b2917735da291482e131161670088/scrapy-tw-rental-house-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "36c53cf036221fd6b1a78ae4597a67d1", "sha256": "041079e6b1fdcf1e99dcbe5f8f43ccbe05d7d46b542ef7c50040542c55d76eca" }, "downloads": -1, "filename": "scrapy_tw_rental_house-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "36c53cf036221fd6b1a78ae4597a67d1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37807, "upload_time": "2019-06-11T03:59:11", "url": "https://files.pythonhosted.org/packages/cb/8d/c3478810aab78c9473880e9d942d07c50041c3f538ca3dce197595063412/scrapy_tw_rental_house-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e1dbe7922be60b497f72729060aff3bc", "sha256": "d480f651261d2d3f08d05d384d195a185e26cd0d4a1bccead1204a9716200731" }, "downloads": -1, "filename": "scrapy-tw-rental-house-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e1dbe7922be60b497f72729060aff3bc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18519, "upload_time": "2019-06-11T03:59:13", "url": "https://files.pythonhosted.org/packages/34/d6/aaeeb0591b61af51d2a3b506e555ddbfa0d98367043d1ba2f19b51ff6d3a/scrapy-tw-rental-house-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "52473a192c442c17a9d8f2a4ca7a7a90", "sha256": "891791b09746fa00286a6c4f84a155aad9a885f61755706c3728dd4b691790a6" }, "downloads": -1, "filename": "scrapy_tw_rental_house-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "52473a192c442c17a9d8f2a4ca7a7a90", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37924, "upload_time": "2019-09-15T08:17:45", "url": "https://files.pythonhosted.org/packages/53/67/0d216dcd5325bb53179d68baf7b0e9b5b2ea7b473670cce5df9068f96c56/scrapy_tw_rental_house-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8fa8f49df2d98191cab39d94ef055db7", "sha256": "98c854829fdb7a70f71031c8b66f409a061144ed7625575533612acf7458b1b0" }, "downloads": -1, "filename": "scrapy-tw-rental-house-0.1.2.tar.gz", "has_sig": false, "md5_digest": "8fa8f49df2d98191cab39d94ef055db7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18556, "upload_time": "2019-09-15T08:17:47", "url": "https://files.pythonhosted.org/packages/28/08/04cd399c4fdd4b58e0592297fa482eecefb2fc88f572314a8ebcb61e4ce5/scrapy-tw-rental-house-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "52473a192c442c17a9d8f2a4ca7a7a90", "sha256": "891791b09746fa00286a6c4f84a155aad9a885f61755706c3728dd4b691790a6" }, "downloads": -1, "filename": "scrapy_tw_rental_house-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "52473a192c442c17a9d8f2a4ca7a7a90", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 37924, "upload_time": "2019-09-15T08:17:45", "url": "https://files.pythonhosted.org/packages/53/67/0d216dcd5325bb53179d68baf7b0e9b5b2ea7b473670cce5df9068f96c56/scrapy_tw_rental_house-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8fa8f49df2d98191cab39d94ef055db7", "sha256": "98c854829fdb7a70f71031c8b66f409a061144ed7625575533612acf7458b1b0" }, "downloads": -1, "filename": "scrapy-tw-rental-house-0.1.2.tar.gz", "has_sig": false, "md5_digest": "8fa8f49df2d98191cab39d94ef055db7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18556, "upload_time": "2019-09-15T08:17:47", "url": "https://files.pythonhosted.org/packages/28/08/04cd399c4fdd4b58e0592297fa482eecefb2fc88f572314a8ebcb61e4ce5/scrapy-tw-rental-house-0.1.2.tar.gz" } ] }