How can I use the fields_to_export attribute in BaseItemExporter to order my Scrapy CSV data?

前端 未结 2 1688
死守一世寂寞
死守一世寂寞 2020-12-05 00:42

I have made a simple Scrapy spider that I use from the command line to export my data into the CSV format, but the order of the data seem random. How can I order the CSV fie

2条回答
  •  独厮守ぢ
    2020-12-05 01:23

    You can now specify settings in the spider itself. https://doc.scrapy.org/en/latest/topics/settings.html#settings-per-spider

    To set the field order for exported feeds, set FEED_EXPORT_FIELDS. https://doc.scrapy.org/en/latest/topics/feed-exports.html#feed-export-fields

    The spider below dumps all links on a website (written against Scrapy 1.4.0):

    import scrapy
    from scrapy.http import HtmlResponse
    
    class DumplinksSpider(scrapy.Spider):
      name = 'dumplinks'
      allowed_domains = ['www.example.com']
      start_urls = ['http://www.example.com/']
      custom_settings = {
        # specifies exported fields and order
        'FEED_EXPORT_FIELDS': ["page", "page_ix", "text", "url"],
      }
    
      def parse(self, response):
        if not isinstance(response, HtmlResponse):
          return
    
        a_selectors = response.xpath('//a')
        for i, a_selector in enumerate(a_selectors):
          text = a_selector.xpath('normalize-space(text())').extract_first()
          url = a_selector.xpath('@href').extract_first()
          yield {
            'page_ix': i + 1,
            'page': response.url,
            'text': text,
            'url': url,
          }
          yield response.follow(url, callback=self.parse)  # see allowed_domains
    

    Run with this command:

    scrapy crawl dumplinks --loglevel=INFO -o links.csv
    

    Fields in links.csv are ordered as specified by FEED_EXPORT_FIELDS.

提交回复
热议问题