问题
The Scrapy documentation lists all the built-in methods of ItemLoader instances and explains how to declare your own Item Loaders. However, any ItemLoaders you declare will apply to all processed items. You can modify their behavior a little with Item Loader Contexts, but this is frequently not granular enough.
Suppose I have a Scrapy project where the spiders and items all inherit the same base spider and item loaders, but the spiders all contain site-specific logic with a handful of common functions. Nowhere in the Scrapy documentation do I find mention of adding class methods to ItemLoaders so that instead of:
import mymodule
class MySpider(BaseSpiderName):
def parse_item(self, response):
product = ItemLoader(item=Product(), response=response)
new_value = mymodule.myfunction(argument, ..., ...)
product.add_value('my_field', new_value)
You could write:
# (no extra import)
class MySpider(BaseSpiderName):
def parse_item(self, response):
product = CustomItemLoader(item=Product(), response=response)
product.custom_function(argument, ..., ...)
Even though this seems like an obvious way to extend ItemLoaders like you would for any other class, it's not documented and I don't see examples of how to do this in Scrapy anywhere I've checked (Google, StackOverflow). Is it possible/supported, and how would you declare them?
回答1:
Is it possible/supported, and how would you declare them?
It is possible. Which way to do it depends on the type of logic you are sharing.
You can declare your methods in a Scrapy-agnostic way, i.e. as you would do with any other Python class: subclass your CustomItemLoader
class and define the method in that subclass:
from scrapy.loaders import ItemLoader
class CustomItemLoader(ItemLoader):
def custom_function(self, *args, **kwargs):
self.add_value('my_field', mymodule.myfunction(*args, **kwargs))
Alternatively, depending on the actual logic that you have in that function shared by some spiders, a simple processor that you pass to your add_*
methods might be the way to go.
来源:https://stackoverflow.com/questions/54493641/extend-a-scrapy-itemloader-with-custom-methods