correct way to nest Item data in scrapy

前端 未结 2 923
故里飘歌
故里飘歌 2020-12-05 08:36

What is the correct way to nest Item data?

For example, I want the output of a product:

{
\'price\': price,
\'title\': title,
\'meta\': {
    \'url\'         


        
2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-05 09:01

    UPDATE from comments: Looks like nested loaders is the updated approach. Another comment suggests this approach will cause errors during serialization.

    Best way to approach this is by creating a main and a meta item class/loader.

    from scrapy.item import Item, Field
    from scrapy.contrib.loader import ItemLoader
    from scrapy.contrib.loader.processor import TakeFirst
    
    
    class MetaItem(Item):
        url = Field()
        added_on = Field()
    
    
    class MainItem(Item):
        price = Field()
        title = Field()
        meta = Field(serializer=MetaItem)
    
    
    class MainItemLoader(ItemLoader):
        default_item_class = MainItem
        default_output_processor = TakeFirst()
    
    
    class MetaItemLoader(ItemLoader):
        default_item_class = MetaItem
        default_output_processor = TakeFirst()
    

    Sample usage:

    from scrapy.spider import Spider
    from qwerty.items import  MainItemLoader, MetaItemLoader
    from scrapy.selector import Selector
    
    
    class DmozSpider(Spider):
        name = "dmoz"
        allowed_domains = ["example.com"]
        start_urls = ["http://example.com"]
    
        def parse(self, response):
            mainloader = MainItemLoader(selector=Selector(response))
            mainloader.add_value('title', 'test')
            mainloader.add_value('price', 'price')
            mainloader.add_value('meta', self.get_meta(response))
            return mainloader.load_item()
    
        def get_meta(self, response):
            metaloader = MetaItemLoader(selector=Selector(response))
            metaloader.add_value('url', response.url)
            metaloader.add_value('added_on', 'now')
            return metaloader.load_item()
    

    After that, you can easily expand your items in the future by creating more "sub-items."

提交回复
热议问题