Python Scrapy: How to get CSVItemExporter to write columns in a specific order

后端 未结 2 1175
半阙折子戏
半阙折子戏 2020-12-09 05:31

In Scrapy, I have my items specified in a certain order in items.py, & my spider has those items again in the same order. However, when I run the spider & save the r

2条回答
  •  借酒劲吻你
    2020-12-09 06:04

    This is related to Modifiying CSV export in scrapy

    The problem is that the exporter is instantiated without any keyword parameters, so the keywords like EXPORT_FIELDS are ignored. The solution is the same: you need to subclass the CSV item exporter to pass the keyword parameters.

    Following the above recipe, I created a new file xyzzy/feedexport.py (change "xyzzy" to whatever your scrapy class is named):

    """
    The standard CSVItemExporter class does not pass the kwargs through to the
    CSV writer, resulting in EXPORT_FIELDS and EXPORT_ENCODING being ignored
    (EXPORT_EMPTY is not used by CSV).
    """
    
    from scrapy.conf import settings
    from scrapy.contrib.exporter import CsvItemExporter
    
    class CSVkwItemExporter(CsvItemExporter):
    
        def __init__(self, *args, **kwargs):
            kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
            kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')
    
            super(CSVkwItemExporter, self).__init__(*args, **kwargs)
    

    and then added it into xyzzy/settings.py:

    FEED_EXPORTERS = {
        'csv': 'xyzzy.feedexport.CSVkwItemExporter'
    }
    

    Now the CSV exporter will honor the EXPORT_FIELD setting - also add to xyzzy/settings.py:

    # By specifying the fields to export, the CSV export honors the order
    # rather than using a random order.
    EXPORT_FIELDS = [
        'field1',
        'field2',
        'field3',
    ]
    

提交回复
热议问题