Python, memory error, csv file too large [duplicate]

问题

I have a problem with a python module that cannot handle importing a big datafile (the file targets.csv weights nearly 1 Gb)

the error appens when this line is loaded:

targets = [(name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

traceback:

Traceback (most recent call last):
  File "C:\Users\gary\Documents\EPSON STUDIES\colors_text_D65.py", line 41, in <module>
    for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
MemoryError

I was wondering if there's a way to open the file targets.csv line by line? And also wondering it this would slow down the process?

This module is already pretty slow...

Thanks!

import geometry
import csv
import numpy as np
import random
import cv2

S = 0


img = cv2.imread("MAP.tif", -1)
height, width = img.shape

pixx = height * width
iterr = float(pixx / 1000)
accomplished = 0
temp = 0

ppm = file("epson gamut.ppm", 'w')

ppm.write("P3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n")
# PPM file header

all_colors = [(name, float(X), float(Y), float(Z))
              for name, X, Y, Z in csv.reader(open('XYZcolorlist_D65.csv'))]

# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == '255 255 255']
if len(support_i)>0:
    support = np.array(all_colors[support_i[0]][1:])
    del all_colors[support_i[0]]
else:
    support = None

tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]

print ("thrown out: "
       + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))

targets = [(name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

for target in targets:


    name, X, Y, Z, BG = target

    target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)

    tet_i, bcoords = geometry.containing_tet(tg, target_point)

    if tet_i == None:
        #print str("out")    
        ppm.write(str("255 255 255") + "\n")
        print "out"

        temp += 1

        if temp >= iterr:

            accomplished += temp 
            print str(100 * accomplished / (float(pixx))) + str(" %")
            temp = 0

        continue 
        # not in gamut

    else:

        A = bcoords[0]
        B = bcoords[1]
        C = bcoords[2]
        D = bcoords[3]

        R = random.uniform(0,1)

        names = [colors[i][0] for i in tg.tets[tet_i]]

        if R <= A:
            S = names[0] 

        elif R <= A+B:
            S = names[1]

        elif R <= A+B+C:
            S = names[2]

        else:
            S = names[3]

        ppm.write(str(S) + "\n")

        temp += 1

        if temp >= iterr:

            accomplished += temp 
            print str(100 * accomplished / (float(pixx))) + str(" %")
            temp = 0


print "done"
ppm.close()

回答1:

csv.reader() already reads the lines one at a time. However, you're collecting all of the lines into a list first. You should process the lines one at a time. One approach is to switch to a generator, for example:

targets = ((name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv')))

(Switching from square brackets to parens should change target from a list comprehension to a generator.)

来源：https://stackoverflow.com/questions/21591353/python-memory-error-csv-file-too-large

标签

python

csv

memory