How can I use the fields_to_export attribute in BaseItemExporter to order my Scrapy CSV data?

前端未结

关注

 2  1688

死守一世寂寞 2020-12-05 00:42

I have made a simple Scrapy spider that I use from the command line to export my data into the CSV format, but the order of the data seem random. How can I order the CSV fie

2条回答

独厮守ぢ (楼主)

2020-12-05 01:23

You can now specify settings in the spider itself. https://doc.scrapy.org/en/latest/topics/settings.html#settings-per-spider

To set the field order for exported feeds, set FEED_EXPORT_FIELDS. https://doc.scrapy.org/en/latest/topics/feed-exports.html#feed-export-fields

The spider below dumps all links on a website (written against Scrapy 1.4.0):

import scrapy
from scrapy.http import HtmlResponse

class DumplinksSpider(scrapy.Spider):
  name = 'dumplinks'
  allowed_domains = ['www.example.com']
  start_urls = ['http://www.example.com/']
  custom_settings = {
    # specifies exported fields and order
    'FEED_EXPORT_FIELDS': ["page", "page_ix", "text", "url"],
  }

  def parse(self, response):
    if not isinstance(response, HtmlResponse):
      return

    a_selectors = response.xpath('//a')
    for i, a_selector in enumerate(a_selectors):
      text = a_selector.xpath('normalize-space(text())').extract_first()
      url = a_selector.xpath('@href').extract_first()
      yield {
        'page_ix': i + 1,
        'page': response.url,
        'text': text,
        'url': url,
      }
      yield response.follow(url, callback=self.parse)  # see allowed_domains

Run with this command:

scrapy crawl dumplinks --loglevel=INFO -o links.csv

Fields in links.csv are ordered as specified by FEED_EXPORT_FIELDS.

0 讨论(0)

查看其它2个回答