Extend a Scrapy ItemLoader with custom methods

点点圈 提交于 2020-04-30 13:55:13

问题


The Scrapy documentation lists all the built-in methods of ItemLoader instances and explains how to declare your own Item Loaders. However, any ItemLoaders you declare will apply to all processed items. You can modify their behavior a little with Item Loader Contexts, but this is frequently not granular enough.

Suppose I have a Scrapy project where the spiders and items all inherit the same base spider and item loaders, but the spiders all contain site-specific logic with a handful of common functions. Nowhere in the Scrapy documentation do I find mention of adding class methods to ItemLoaders so that instead of:

import mymodule

class MySpider(BaseSpiderName):
  def parse_item(self, response):
    product = ItemLoader(item=Product(), response=response)
    new_value = mymodule.myfunction(argument, ..., ...)
    product.add_value('my_field', new_value)

You could write:

# (no extra import)
class MySpider(BaseSpiderName):
  def parse_item(self, response):
    product = CustomItemLoader(item=Product(), response=response)
    product.custom_function(argument, ..., ...)

Even though this seems like an obvious way to extend ItemLoaders like you would for any other class, it's not documented and I don't see examples of how to do this in Scrapy anywhere I've checked (Google, StackOverflow). Is it possible/supported, and how would you declare them?


回答1:


Is it possible/supported, and how would you declare them?

It is possible. Which way to do it depends on the type of logic you are sharing.

You can declare your methods in a Scrapy-agnostic way, i.e. as you would do with any other Python class: subclass your CustomItemLoader class and define the method in that subclass:

from scrapy.loaders import ItemLoader

class CustomItemLoader(ItemLoader):

    def custom_function(self, *args, **kwargs):
        self.add_value('my_field', mymodule.myfunction(*args, **kwargs))

Alternatively, depending on the actual logic that you have in that function shared by some spiders, a simple processor that you pass to your add_* methods might be the way to go.



来源:https://stackoverflow.com/questions/54493641/extend-a-scrapy-itemloader-with-custom-methods

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!