{ "info": { "author": "Kevin Lloyd Bernal", "author_email": "kevinoxy@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development", "Topic :: Software Development :: Libraries :: Application Frameworks", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: System :: Monitoring" ], "description": "\nscrapy-loader-upkeep \n====================\n\n.. image:: https://img.shields.io/pypi/pyversions/scrapy-loader-upkeep.svg\n :target: https://pypi.python.org/pypi/scrapy-loader-upkeep\n\n.. image:: https://img.shields.io/pypi/v/scrapy-loader-upkeep.svg\n :target: https://pypi.python.org/pypi/scrapy-loader-upkeep\n\n.. image:: https://travis-ci.org/BurnzZ/scrapy-loader-upkeep.svg?branch=master\n :target: https://travis-ci.org/BurnzZ/scrapy-loader-upkeep\n\n.. image:: https://codecov.io/gh/BurnzZ/scrapy-loader-upkeep/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/BurnzZ/scrapy-loader-upkeep\n\nOverview\n~~~~~~~~\nThis improves over the built-in `ItemLoader` of **Scrapy** by adding features\nthat focuses on the **maintainability** of the spider over time.\n\nThis allows developers to keep track of how often parsers are being used on a\ncrawl, allowing to safely remove obsolete css/xpath fallback rules.\n\nMotivation\n~~~~~~~~~~\nScrapy supports adding multiple css/xpath rules in its ``ItemLoader`` by default\nin order to provide a convenient way for developers to keep up with site changes.\n\nHowever, some sites change layouts more often than others, while some perform\nA/B tests for weeks/months where developers need to accommodate those changes.\n\nThese fallback css/xpath rules gets obsolete quickly and fills up the project\nwith potentially dead code, posing a threat to the spiders' long term maintenance.\n\nOriginal idea proposal: https://github.com/scrapy/scrapy/issues/3795\n\nUsage\n~~~~~\n.. code-block:: python\n\n from scrapy_loader_upkeep import ItemLoader\n\n class SiteItemLoader(ItemLoader):\n pass\n\nUsing it inside a spider callback would look like:\n\n.. code-block:: python\n\n def parse(self, response):\n loader = SiteItemLoader(response=response, stats=self.crawler.stats)\n\nNothing would change in the usage of this ``ItemLoader`` except for the part on\ninjecting stat dependency to it, which is necessary to keep track of the usage\nof the parser rules.\n\nThis only works for the following ``ItemLoader`` methods:\n\n- ``add_css()``\n- ``replace_css()``\n- ``add_xpath()``\n- ``replace_xpath()``\n\nBasic Spider Example\n~~~~~~~~~~~~~~~~~~~~\nThis is taken from the `examples/ \n`_\ndirectory.\n\n.. code-block:: bash\n\n $ scrapy crawl quotestoscrape_simple_has_missing\n\nThis should output in the stats:\n\n.. code-block:: python\n\n 2019-06-16 14:32:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:\n { ...\n 'parser/QuotesItemLoader/author/css/1': 10,\n 'parser/QuotesItemLoader/quote/css/1/missing': 10,\n 'parser/QuotesItemLoader/quote/css/2': 10\n ...\n }\n\nIn this example, we could see that the **1st css** rule for the ``quote`` field\nhas had instances of not being matched at all during the scrape.\n\nNew Feature\n~~~~~~~~~~~\n\nAs with the example above, we're limited only to the positional context of when\nthe ``add_css()``, ``add_xpath()``, etc were called during the execution.\n\nThere will be cases where developers will be maintaining a large spider with a\nlot of different parsers to handle varying layouts in the site. It would make\nsense to have a better context to what a parser does or is for.\n\nA new optional ``name`` parameter is supported to provide more context around a\ngiven parser. This supports the two (2) main types of creating fallback parsers:\n\n1. **multiple calls**\n\n.. code-block:: python\n\n loader.add_css('NAME', 'h1::text', name='Name from h1')\n loader.add_css('NAME', 'meta[value=\"title\"]::attr(content)', name=\"Name from meta tag\")\n\nwould result in something like:\n\n.. code-block:: python\n\n { ...\n 'parser/QuotesItemLoader/NAME/css/1/Name from h1': 8,\n 'parser/QuotesItemLoader/NAME/css/1/Name from h1/missing': 2,\n 'parser/QuotesItemLoader/NAME/css/2/Name from meta tag': 7,\n 'parser/QuotesItemLoader/NAME/css/2/Name from meta tag/missing': 3,\n ...\n }\n\n2. **grouped parsers in a single call**\n\n.. code-block:: python\n\n loader.add_css(\n 'NAME',\n [\n 'h1::text',\n 'meta[value=\"title\"]::attr(content)',\n ],\n name='NAMEs at the main content')\n loader.add_css(\n 'NAME',\n [\n 'footer .name::text',\n 'div.page-end span.name::text',\n ],\n name='NAMEs at the bottom of the page')\n\nwould result in something like:\n\n.. code-block:: python\n\n { ...\n 'parser/QuotesItemLoader/NAME/css/1/NAMEs at the main content': 8,\n 'parser/QuotesItemLoader/NAME/css/1/NAMEs at the main content/missing': 2,\n 'parser/QuotesItemLoader/NAME/css/2/NAMEs at the main content': 7,\n 'parser/QuotesItemLoader/NAME/css/2/NAMEs at the main content/missing': 3,\n 'parser/QuotesItemLoader/NAME/css/3/NAMEs at the bottom of the page': 8,\n 'parser/QuotesItemLoader/NAME/css/3/NAMEs at the bottom of the page/missing': 2,\n 'parser/QuotesItemLoader/NAME/css/4/NAMEs at the bottom of the page': 7,\n 'parser/QuotesItemLoader/NAME/css/4/NAMEs at the bottom of the page/missing': 3,\n ...\n }\n\nThe latter is useful in grouping fallback parsers together if they are quite\nrelated in terms of layout/arrangement in the page.\n\n\nRequirements\n~~~~~~~~~~~~\nPython 3.6+\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/BurnzZ/scrapy-loader-upkeep", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "scrapy-loader-upkeep", "package_url": "https://pypi.org/project/scrapy-loader-upkeep/", "platform": "", "project_url": "https://pypi.org/project/scrapy-loader-upkeep/", "project_urls": { "Homepage": "https://github.com/BurnzZ/scrapy-loader-upkeep" }, "release_url": "https://pypi.org/project/scrapy-loader-upkeep/0.1.0/", "requires_dist": [ "scrapy" ], "requires_python": ">=3.6", "summary": "An alternative to the built-in ItemLoader of Scrapy which focuses on maintainability of fallback parsers.", "version": "0.1.0" }, "last_serial": 5709324, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "6b8217f64684474c57d1dd4b87f066c5", "sha256": "e4307d949841ab6c01d122c7bff1f6e564fed54728c783fec9673d528090f609" }, "downloads": -1, "filename": "scrapy_loader_upkeep-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "6b8217f64684474c57d1dd4b87f066c5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 6411, "upload_time": "2019-08-21T12:16:13", "url": "https://files.pythonhosted.org/packages/9a/cb/c7d5146acabeb40de1cb334541c083c4c1eee80966d4e7a53f7bd5ff5ab0/scrapy_loader_upkeep-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2ff758f7f4740719515248394eaae2ec", "sha256": "6c1ce5e220b6c698e402ad4cb459a2fd8a9289a7ec49cd58f60b8b103a61ef68" }, "downloads": -1, "filename": "scrapy-loader-upkeep-0.1.0.tar.gz", "has_sig": false, "md5_digest": "2ff758f7f4740719515248394eaae2ec", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5213, "upload_time": "2019-08-21T12:16:15", "url": "https://files.pythonhosted.org/packages/9c/c5/2711fa0d11022b2e48ac2015d820cf3afb272343acd73845397b62f3aac1/scrapy-loader-upkeep-0.1.0.tar.gz" } ], "0.1a1": [ { "comment_text": "", "digests": { "md5": "7410c37da058b1125ccc838b594ab39f", "sha256": "2b09778a18d20b1d0e149092129d54054c9afc325e4329925f976a6e907e505f" }, "downloads": -1, "filename": "scrapy_loader_upkeep-0.1a1-py3-none-any.whl", "has_sig": false, "md5_digest": "7410c37da058b1125ccc838b594ab39f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 4573, "upload_time": "2019-07-11T14:34:36", "url": "https://files.pythonhosted.org/packages/e3/c6/e60292985fbdcac5a3ba93b0470188514b9cbc2ec89d7ca30fac89f65979/scrapy_loader_upkeep-0.1a1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ad5344245cd1e15b43a35a2e72aeda84", "sha256": "2c1df2a154e0972d965d8f1662f9f7fcdb8348f7bb394723eb065d9c76cb9be9" }, "downloads": -1, "filename": "scrapy-loader-upkeep-0.1a1.tar.gz", "has_sig": false, "md5_digest": "ad5344245cd1e15b43a35a2e72aeda84", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 4165, "upload_time": "2019-07-11T14:34:38", "url": "https://files.pythonhosted.org/packages/55/a2/1f3a914e9f65f92318db8d9c5954e9b7ad9862463de790282a678ef896f9/scrapy-loader-upkeep-0.1a1.tar.gz" } ], "0.1a2": [ { "comment_text": "", "digests": { "md5": "64482259e3b26ed3c4adb7ba517a7bfa", "sha256": "f2f6bb05f4f712fcb588111a5b28e0041a9a7e85bd35a798bc5d9f4531f5e541" }, "downloads": -1, "filename": "scrapy_loader_upkeep-0.1a2-py3-none-any.whl", "has_sig": false, "md5_digest": "64482259e3b26ed3c4adb7ba517a7bfa", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 5651, "upload_time": "2019-07-13T02:10:57", "url": "https://files.pythonhosted.org/packages/f5/4f/2b8040a2215436ccf18dc27dd95696e4273a4f5eb65dcc1cac068fe41817/scrapy_loader_upkeep-0.1a2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8aa8a5d8eda30fdcee7e36255c17f180", "sha256": "4381bfdfca67faab40a44867957fdd6ae2739750c724ff1faaff7387d4a5510f" }, "downloads": -1, "filename": "scrapy-loader-upkeep-0.1a2.tar.gz", "has_sig": false, "md5_digest": "8aa8a5d8eda30fdcee7e36255c17f180", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 4301, "upload_time": "2019-07-13T02:10:59", "url": "https://files.pythonhosted.org/packages/40/14/b7c35d088630c6eac5afb0aa4bb61ca9203766730cf1a8b0e6b8c3791f54/scrapy-loader-upkeep-0.1a2.tar.gz" } ], "0.1a3": [ { "comment_text": "", "digests": { "md5": "c235896b304206a4d087ff8b166ae218", "sha256": "5342c7fb57af1694a35bad12f6e4770a7d018ac763c3d3bf46902f193c12bf01" }, "downloads": -1, "filename": "scrapy_loader_upkeep-0.1a3-py3-none-any.whl", "has_sig": false, "md5_digest": "c235896b304206a4d087ff8b166ae218", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 6383, "upload_time": "2019-07-21T05:38:46", "url": "https://files.pythonhosted.org/packages/83/9c/520ea077f586f2b9bbb849f531f2441a52cc1a0eb790a1dbfb27d095f05e/scrapy_loader_upkeep-0.1a3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e620b465c93dd082b7db8bd37cf0dedc", "sha256": "b9abacae4264a57be40e1dc3ff2418e678ab53861c18de1b5da52bb33fbfb166" }, "downloads": -1, "filename": "scrapy-loader-upkeep-0.1a3.tar.gz", "has_sig": false, "md5_digest": "e620b465c93dd082b7db8bd37cf0dedc", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5187, "upload_time": "2019-07-21T05:38:48", "url": "https://files.pythonhosted.org/packages/25/94/4f5d49e794da37f54bee5a72c3acf56a7fb0a3b121063fcb3a298fb641fa/scrapy-loader-upkeep-0.1a3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6b8217f64684474c57d1dd4b87f066c5", "sha256": "e4307d949841ab6c01d122c7bff1f6e564fed54728c783fec9673d528090f609" }, "downloads": -1, "filename": "scrapy_loader_upkeep-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "6b8217f64684474c57d1dd4b87f066c5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 6411, "upload_time": "2019-08-21T12:16:13", "url": "https://files.pythonhosted.org/packages/9a/cb/c7d5146acabeb40de1cb334541c083c4c1eee80966d4e7a53f7bd5ff5ab0/scrapy_loader_upkeep-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2ff758f7f4740719515248394eaae2ec", "sha256": "6c1ce5e220b6c698e402ad4cb459a2fd8a9289a7ec49cd58f60b8b103a61ef68" }, "downloads": -1, "filename": "scrapy-loader-upkeep-0.1.0.tar.gz", "has_sig": false, "md5_digest": "2ff758f7f4740719515248394eaae2ec", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 5213, "upload_time": "2019-08-21T12:16:15", "url": "https://files.pythonhosted.org/packages/9c/c5/2711fa0d11022b2e48ac2015d820cf3afb272343acd73845397b62f3aac1/scrapy-loader-upkeep-0.1.0.tar.gz" } ] }