How do you split reading a large csv file into evenly-sized chunks in Python?

前端 未结 3 751
囚心锁ツ
囚心锁ツ 2020-12-01 03:16

In a basic I had the next process.

import csv
reader = csv.reader(open(\'huge_file.csv\', \'rb\'))

for line in reader:
    process_line(line)
3条回答
  •  独厮守ぢ
    2020-12-01 03:29

    There isn't a good way to do this for all .csv files. You should be able to divide the file into chunks using file.seek to skip a section of the file. Then you have to scan one byte at a time to find the end of the row. The you can process the two chunks independently. Something like the following (untested) code should get you started.

    file_one = open('foo.csv')
    file_two = open('foo.csv') 
    file_two.seek(0, 2)     # seek to the end of the file
    sz = file_two.tell()    # fetch the offset
    file_two.seek(sz / 2)   # seek back to the middle
    chr = ''
    while chr != '\n':
        chr = file_two.read(1)
    # file_two is now positioned at the start of a record
    segment_one = csv.reader(file_one)
    segment_two = csv.reader(file_two)
    

    I'm not sure how you can tell that you have finished traversing segment_one. If you have a column in the CSV that is a row id, then you can stop processing segment_one when you encounter the row id from the first row in segment_two.

提交回复
热议问题