Python generator to read large CSV file

时光怂恿深爱的人放手 提交于 2019-12-18 08:47:18

问题


I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.

It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).

I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:

  • Lazy Method for Reading Big File in Python?
  • Read large text files in Python, line by line without loading it in to memory

Also, unfortunately Pandas Dataframes are not an option in this case.

Any snippet I can start from?

Thanks


回答1:


You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:

import csv
import numpy as np

def getData(filename1, filename2):
    with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:
        reader1 = csv.reader(csv1)
        reader2 = csv.reader(csv2)
        for row1, row2 in zip(reader1, reader2):
            yield (np.array(row1, dtype=np.float),
                   np.array(row2, dtype=np.float)) 
                # This will give arrays of floats, for other types change dtype

for tup in getData("file1", "file2"):
    print(tup)


来源:https://stackoverflow.com/questions/38584494/python-generator-to-read-large-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!