Export csv file from scrapy (not via command line)

前端 未结 3 1976
春和景丽
春和景丽 2020-12-14 08:56

I successfully tried to export my items into a csv file from the command line like:

   scrapy crawl spiderName -o filename.csv

My question

相关标签:
3条回答
  • 2020-12-14 09:02

    Up-to-date answer is:

    Use build-in exporter. You can set filename as key. Config may look like:

    filename = 'export' 
    class mySpider(scrapy.Spider):
      custom_settings = {
        'FEEDS': {
          f'{filename}.csv': {
            'format': 'csv',
            'overwrite': True
          }
        }
      }
    

    Documentation: https://docs.scrapy.org/en/latest/topics/feed-exports.html#std-setting-FEEDS

    0 讨论(0)
  • 2020-12-14 09:04

    Why not use an item pipeline?

    WriteToCsv.py

       import csv
       from YOUR_PROJECT_NAME_HERE import settings
    
       def write_to_csv(item):
           writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
           writer.writerow([item[key] for key in item.keys()])
    
       class WriteToCsv(object):
            def process_item(self, item, spider):
                write_to_csv(item)
                return item
    

    settings.py

       ITEM_PIPELINES = { 'project.pipelines_path.WriteToCsv.WriteToCsv' : A_NUMBER_HIGHER_THAN_ALL_OTHER_PIPELINES}
       csv_file_path = PATH_TO_CSV
    

    If you wanted items to be written to separate csv for separate spiders you could give your spider a CSV_PATH field. Then in your pipeline use your spiders field instead of path from setttigs.

    This works I tested it in my project.

    HTH

    http://doc.scrapy.org/en/latest/topics/item-pipeline.html

    0 讨论(0)
  • 2020-12-14 09:17

    That's what Feed Exports are for: http://doc.scrapy.org/en/latest/topics/feed-exports.html

    One of the most frequently required features when implementing scrapers is being able to store the scraped data properly and, quite often, that means generating a “export file” with the scraped data (commonly called “export feed”) to be consumed by other systems.

    Scrapy provides this functionality out of the box with the Feed Exports, which allows you to generate a feed with the scraped items, using multiple serialization formats and storage backends.

    0 讨论(0)
提交回复
热议问题