As I started to learn scrapy, i have come accross a requirement to dynamically build the Item attributes. I\'m just scraping a webpage which has a table structure and I wanted t
The custom __setitem__
solution didn't work for me when using item loaders in Scrapy 1.0.3 because the item loader accesses the fields attribute directly:
value = self.item.fields[field_name].get(key, default)
The custom __setitem__
is only called for item-level accesses like item['new field']
. Since fields
is just a dict, I realized I could simply create an Item subclass that uses a defaultdict
to gracefully handle these situations.
In the end, just two extra lines of code:
from collections import defaultdict
class FlexItem(scrapy.Item):
"""An Item that creates fields dynamically"""
fields = defaultdict(scrapy.Field)