{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3" ], "description": "Frontera scheduler for Scrapy\n=============================\n\nMore flexible and featured `Frontera `_ scheduler for scrapy, which don't force to reimplement\ncapabilities already present in scrapy, so it provides:\n\n- Scrapy handled request dupefilter\n- Scrapy handled disk and memory request queues\n- Only send to frontera requests marked to be processed by it (using request meta attribute ``cf_store`` to True), thus avoiding lot of conflicts.\n- Allows to set frontera settings from spider constructor, by loading frontera manager after spider instantiation.\n- Allows frontera components to access scrapy stat manager instance by adding STATS_MANAGER frontera setting\n- Better request/response converters, fully compatible with ScrapyCloud and Scrapy\n- Emulates dont_filter=True scrapy Request flag\n- Frontier fingerprint is same as scrapy request fingerprint (can be overriden by passing 'frontier_fingerprint' to request meta)\n- allow custom preprocessing or ignoring of request from frontier before actually being enqueued\n- Thoroughly tested, used and featured\n\nThe result is that crawler using this scheduler will not work differently than a crawler that doesn't use frontier, and\nreingeneering of a spider in order to be adapted to work with frontier is minimal. \n\n\nVersions:\n---------\n\nUp to version 0.1.8, frontera==0.3.3 and python2 are required. Version 0.2.x requires frontera==0.7.1 and is compatible with python3.\n\nInstallation:\n-------------\n\npip install scrapy-frontera\n\n\nUsage and features:\n-------------------\n\nNote: In the context of this doc, a producer spider is the spider that writes requests to the frontier, and the consumer is the one that reads\nthem from the frontier. They can be either the same spider or separated ones.\n\nIn your project settings.py::\n\n SCHEDULER = 'scrapy_frontera.scheduler.FronteraScheduler'\n\n DOWNLOADER_MIDDLEWARES = {\n 'scrapy_frontera.middlewares.SchedulerDownloaderMiddleware': 0,\n }\n\n SPIDER_MIDDLEWARES = {\n 'scrapy_frontera.middlewares.SchedulerSpiderMiddleware': 0,\n }\n\n # Set to True if you want start requests to be redirected to frontier\n # By default they go directly to scrapy downloader\n # FRONTERA_SCHEDULER_START_REQUESTS_TO_FRONTIER = False\n\n # Allows to redirect to frontier, the requests with the given callback names\n # Important: this setting doesn't affect start requests.\n # FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER = []\n\n # Spider attributes that need to be passed to the requests redirected to frontier\n # Some previous callbacks may have generated some state needed for following ones.\n # This setting allows to transmit that state between different jobs\n # FRONTERA_SCHEDULER_STATE_ATTRIBUTES = []\n\n # map specific requests to specific slot prefix by its callback name.\n # FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP = {}\n\n\nPlus the usual Frontera setup. For instance, for `hcf-backend `_::\n\n BACKEND = 'hcf_backend.HCFBackend'\n HCF_PROJECT_ID = 11111\n\n (etc...)\n\nYou can also set up spider specific frontera settings via the spider class attribute dict ``frontera_settings``. Example\nwith `hcf backend`::\n\n class MySpider(Spider):\n\n name = 'my-producer'\n\n frontera_settings = {\n 'HCF_AUTH': 'xxxxxxxxxx',\n 'HCF_PROJECT_ID': 11111,\n 'HCF_PRODUCER_FRONTIER': 'myfrontier',\n 'HCF_PRODUCER_NUMBER_OF_SLOTS': 8,\n }\n\nScrapy-frontera also accepts the spider attribute ``frontera_settings_json``. This is specially useful for consumers, which need per job\nsetup of reading slot.For example, you can configure a consumer spider in this way, for usage with `hcf backend `_::\n\n class MySpider(Spider):\n\n name = 'my-consumer'\n\n frontera_settings = {\n 'HCF_AUTH': 'xxxxxxxxxx',\n 'HCF_PROJECT_ID': 11111,\n 'HCF_CONSUMER_FRONTIER': 'myfrontier',\n }\n\n\nand invoke it via::\n\n scrapy crawl my-consumer -a frontera_settings_json='{\"HCF_CONSUMER_SLOT\": \"0\"}'\n\nSettings provided through ``frontera_settings_json`` overrides those provided using ``frontera_settings``, which in turn overrides those provided in the\nproject settings.py file.\n\nRequests will go through the Frontera pipeline only if the flag ``cf_store`` with value True is included in the request meta. If ``cf_store`` is not present\nor is False, requests will be processed as normal scrapy request. An alternative to ``cf_store`` flag are the scrapy settings ``FRONTERA_SCHEDULER_START_REQUESTS_TO_FRONTIER`` and ``FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER`` (see above about usage of these settings)\n\nRequests read from the frontier are directly enqueued by the scheduler. This means that they are not processed by spider middleware. Their\nprocessing entrypoint is downloader middleware ``process_request()`` pipeline. But if you need to preprocess requests incoming from the frontier\nin the spider, you can define the spider method ``preprocess_request_from_frontier(request: scrapy.Request)``. If defined, the scheduler will invoke\nit before actually enqueuing it. This method must returns either None or a request (same from the call, or another). This return value is what\nwill be actually enqueued, so if it is None, request is skipped (not enqueued).\n\nIf requests read from frontier doesn't already have an errback defined, the scheduler will automatically assign the consumer spider ``errback`` method,\nif it exists, to them. This is specially useful when consumer spider is not the same as the producer one.\n\nAnother useful setting is ``FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP``. This is a dict which allows to map requests with a specific callback, to a specific slot prefix, and optionally a number of slots, different than the default one assigned by frontera backend (this feature has to be supported by the specific frontera backend you will use, last versions of hcf-backend does supports it). For example::\n\n class MySpider(Spider):\n\n name = 'my-producer'\n\n frontera_settings = {\n 'HCF_AUTH': 'xxxxxxxxxx',\n 'HCF_PROJECT_ID': 11111,\n 'HCF_PRODUCER_FRONTIER': 'myfrontier',\n 'HCF_PRODUCER_SLOT_PREFIX': 'my-consumer'\n 'HCF_PRODUCER_NUMBER_OF_SLOTS': 8,\n }\n\n custom_settings = {\n 'FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP': {'parse': 'my-producer/4'},\n 'FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER': ['parse', 'parse_consumer']\n }\n\n def parse_consumer(self, response):\n assert False\n\n def parse(self, response):\n (...)\n\nUnder this configuration, requests with callback ``parse()`` will be saved on 4 slots with prefix ``my-producer``, while requests with callback ``parse_consumer()`` will use the configuration from hcf settings, that is, 8 slot with prefix ``my-consumer``.\n\nAn integrated tutorial is available at `shub-workflow Tutorial `_\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/scrapinghub/scrapy-frontera", "keywords": "", "license": "BSD", "maintainer": "Scrapinghub", "maintainer_email": "", "name": "scrapy-frontera", "package_url": "https://pypi.org/project/scrapy-frontera/", "platform": "", "project_url": "https://pypi.org/project/scrapy-frontera/", "project_urls": { "Homepage": "https://github.com/scrapinghub/scrapy-frontera" }, "release_url": "https://pypi.org/project/scrapy-frontera/0.2.8/", "requires_dist": [ "frontera (==0.7.1)", "scrapy" ], "requires_python": "", "summary": "Featured Frontera scheduler for Scrapy", "version": "0.2.8" }, "last_serial": 5347198, "releases": { "0.2.2": [ { "comment_text": "", "digests": { "md5": "c92a34c6567a2cd7b5717b296b5305e5", "sha256": "de4cdeb3eea6e42d9f23bcc3cfa9dd3d2d7291d4503b8990e93fd5d4179d44fc" }, "downloads": -1, "filename": "scrapy_frontera-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "c92a34c6567a2cd7b5717b296b5305e5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9471, "upload_time": "2018-08-30T17:10:18", "url": "https://files.pythonhosted.org/packages/e0/1e/8ac59bf00f36d9acb2875495d8d27196832aeb86acd8507763807e3dc70a/scrapy_frontera-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8a608d877d1b5444a77dc07ca23dc936", "sha256": "5b83e60e66b4218a24805206f591174e08c1b22d5548cd77058c116b3bc9e682" }, "downloads": -1, "filename": "scrapy-frontera-0.2.2.tar.gz", "has_sig": false, "md5_digest": "8a608d877d1b5444a77dc07ca23dc936", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7206, "upload_time": "2018-08-30T17:04:02", "url": "https://files.pythonhosted.org/packages/27/a2/efc3b1f078ec4df85687d3f2d8dd8a6faa198c30619a0b43414ef3c03184/scrapy-frontera-0.2.2.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "1d6ca3bd01ba77091523fe312fb0eea4", "sha256": "c93cf565122c9a413715e46367874432e99ee45f2cc1733c20d05f1fbb567352" }, "downloads": -1, "filename": "scrapy_frontera-0.2.4-py2-none-any.whl", "has_sig": false, "md5_digest": "1d6ca3bd01ba77091523fe312fb0eea4", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 11822, "upload_time": "2018-09-13T12:22:43", "url": "https://files.pythonhosted.org/packages/2a/d9/94f5068c6df01d6e84f263796e6a3c39189ebcb9e1029f51b6e746074db4/scrapy_frontera-0.2.4-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "42015e6c7a09de31d6afefb32d4606ae", "sha256": "5406f47db6c55e14ffa00d67b5a44287277fd224b3c642cc3daf395be235d3c8" }, "downloads": -1, "filename": "scrapy-frontera-0.2.4.tar.gz", "has_sig": false, "md5_digest": "42015e6c7a09de31d6afefb32d4606ae", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7028, "upload_time": "2018-09-13T12:22:45", "url": "https://files.pythonhosted.org/packages/c8/f2/4624d7d6e969a8e90d08e0528a41913636fa13a5c6808b2dfc8dc7ec1241/scrapy-frontera-0.2.4.tar.gz" } ], "0.2.4.1": [ { "comment_text": "", "digests": { "md5": "1d3cf9029e65b7f71ae1a9f2bcd94ae6", "sha256": "6d618560e4bbcb6084dbc9f46da3fe4370c50cc7d33817d2c7d2e7d3fed42a9d" }, "downloads": -1, "filename": "scrapy_frontera-0.2.4.1-py3-none-any.whl", "has_sig": false, "md5_digest": "1d3cf9029e65b7f71ae1a9f2bcd94ae6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10110, "upload_time": "2018-11-13T20:13:40", "url": "https://files.pythonhosted.org/packages/63/1c/5b7ed62c9a37d93365f822278bba253b1dda9aa98e177c10674b793528bb/scrapy_frontera-0.2.4.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9e5ef165c35c899c715ebe1445638efd", "sha256": "56f15535abe3d9a9bac0eb76a1db98dcc8f7781fb2f07e4bdce390a8815200ce" }, "downloads": -1, "filename": "scrapy-frontera-0.2.4.1.tar.gz", "has_sig": false, "md5_digest": "9e5ef165c35c899c715ebe1445638efd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7058, "upload_time": "2018-11-13T20:13:42", "url": "https://files.pythonhosted.org/packages/c6/5f/516bd4affdb776f02e303e79a7bd260fc9dc05333f40b4c4e4bef7b5e10a/scrapy-frontera-0.2.4.1.tar.gz" } ], "0.2.5": [ { "comment_text": "", "digests": { "md5": "78e42015e760bfab5f935cbc4ad346a3", "sha256": "04265a68f4fa7a32eeffe46e674b7dec7411b21b974bfe0ac081b445d792492d" }, "downloads": -1, "filename": "scrapy_frontera-0.2.5-py3-none-any.whl", "has_sig": false, "md5_digest": "78e42015e760bfab5f935cbc4ad346a3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10344, "upload_time": "2019-01-22T19:52:13", "url": "https://files.pythonhosted.org/packages/35/0b/3358a8f9f6356c61f212bff325114208908e45ad7f15cf94dba26c1b960a/scrapy_frontera-0.2.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c385113eeca04111c4d822b39ceb2967", "sha256": "5779965ca31ffa9b688da26e9c8c830e47ccf898416a15aba7c2e686ac7027e4" }, "downloads": -1, "filename": "scrapy-frontera-0.2.5.tar.gz", "has_sig": false, "md5_digest": "c385113eeca04111c4d822b39ceb2967", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7381, "upload_time": "2019-01-22T19:52:15", "url": "https://files.pythonhosted.org/packages/92/c4/bf51facdb05261bea74898ccd6a8110ae5a4947ce921b8c9abf070263392/scrapy-frontera-0.2.5.tar.gz" } ], "0.2.8": [ { "comment_text": "", "digests": { "md5": "fb29a06c465db1d6c3843a507d1532f5", "sha256": "41529eaf70bcea0b0e4421c2cb09258335a870e476d9521a85ca1f9d4206f7f2" }, "downloads": -1, "filename": "scrapy_frontera-0.2.8-py3-none-any.whl", "has_sig": false, "md5_digest": "fb29a06c465db1d6c3843a507d1532f5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11228, "upload_time": "2019-06-01T19:43:22", "url": "https://files.pythonhosted.org/packages/ba/72/91f5262452bfba35220bd43737186f956e3d01bbf6f585d4a9ea4080de6f/scrapy_frontera-0.2.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ecfcb36d14b05f7a4dafff3c6408e18f", "sha256": "81c2613ee798997543ac6904b923d58d503f033f380325e7f62dc192b8e46b5c" }, "downloads": -1, "filename": "scrapy-frontera-0.2.8.tar.gz", "has_sig": false, "md5_digest": "ecfcb36d14b05f7a4dafff3c6408e18f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8093, "upload_time": "2019-06-01T19:43:23", "url": "https://files.pythonhosted.org/packages/35/77/701815180fb07ac2e3de6f57abdcac8690884b11536f1d6126bb8247b191/scrapy-frontera-0.2.8.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "fb29a06c465db1d6c3843a507d1532f5", "sha256": "41529eaf70bcea0b0e4421c2cb09258335a870e476d9521a85ca1f9d4206f7f2" }, "downloads": -1, "filename": "scrapy_frontera-0.2.8-py3-none-any.whl", "has_sig": false, "md5_digest": "fb29a06c465db1d6c3843a507d1532f5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11228, "upload_time": "2019-06-01T19:43:22", "url": "https://files.pythonhosted.org/packages/ba/72/91f5262452bfba35220bd43737186f956e3d01bbf6f585d4a9ea4080de6f/scrapy_frontera-0.2.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ecfcb36d14b05f7a4dafff3c6408e18f", "sha256": "81c2613ee798997543ac6904b923d58d503f033f380325e7f62dc192b8e46b5c" }, "downloads": -1, "filename": "scrapy-frontera-0.2.8.tar.gz", "has_sig": false, "md5_digest": "ecfcb36d14b05f7a4dafff3c6408e18f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8093, "upload_time": "2019-06-01T19:43:23", "url": "https://files.pythonhosted.org/packages/35/77/701815180fb07ac2e3de6f57abdcac8690884b11536f1d6126bb8247b191/scrapy-frontera-0.2.8.tar.gz" } ] }