发表新帖

发表新帖

How do you split reading a large csv file into evenly-sized chunks in Python?

前端未结

关注

 3  751

囚心锁ツ 2020-12-01 03:16

In a basic I had the next process.

import csv
reader = csv.reader(open(\'huge_file.csv\', \'rb\'))

for line in reader:
    process_line(line)

3条回答

独厮守ぢ (楼主)

2020-12-01 03:29
There isn't a good way to do this for all .csv files. You should be able to divide the file into chunks using file.seek to skip a section of the file. Then you have to scan one byte at a time to find the end of the row. The you can process the two chunks independently. Something like the following (untested) code should get you started.
```
file_one = open('foo.csv')
file_two = open('foo.csv') 
file_two.seek(0, 2)     # seek to the end of the file
sz = file_two.tell()    # fetch the offset
file_two.seek(sz / 2)   # seek back to the middle
chr = ''
while chr != '\n':
    chr = file_two.read(1)
# file_two is now positioned at the start of a record
segment_one = csv.reader(file_one)
segment_two = csv.reader(file_two)
```
I'm not sure how you can tell that you have finished traversing segment_one. If you have a column in the CSV that is a row id, then you can stop processing segment_one when you encounter the row id from the first row in segment_two.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题