Scrapy: Define items dynamically

前端 未结 5 1503
别跟我提以往
别跟我提以往 2021-02-05 10:23

As I started to learn scrapy, i have come accross a requirement to dynamically build the Item attributes. I\'m just scraping a webpage which has a table structure and I wanted t

5条回答
  •  萌比男神i
    2021-02-05 11:00

    The custom __setitem__ solution didn't work for me when using item loaders in Scrapy 1.0.3 because the item loader accesses the fields attribute directly:

    value = self.item.fields[field_name].get(key, default)
    

    The custom __setitem__ is only called for item-level accesses like item['new field']. Since fields is just a dict, I realized I could simply create an Item subclass that uses a defaultdict to gracefully handle these situations.

    In the end, just two extra lines of code:

    from collections import defaultdict
    
    
    class FlexItem(scrapy.Item):
        """An Item that creates fields dynamically"""
        fields = defaultdict(scrapy.Field)
    

提交回复
热议问题