Memory leakage issue in python list

丶灬走出姿态 提交于 2021-01-28 11:25:25

问题


The identities list contains an big array of approximately 57000 images. Now, I am creating a negative list with the help of itertools.product(). This store the whole list in memory which is very costly and my system hanged after 4 minutes.

How can i optimize the below code and avoid saving in memory?`

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        cross_product = itertools.product(samples_list[i], samples_list[j])
        cross_product = list(cross_product)

        for cross_sample in cross_product:
            negative = []
            negative.append(cross_sample[0])
            negative.append(cross_sample[1])
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

negatives = negatives.sample(positives.shape[0])

The memory 9.30 is going to be higher and higher and on one point the system has been completely hanged.

I also implemented the below answer and modified code according to his answer.

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            negative = [cross_sample[0], cross_sample[1]]
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

Third version of code

This CSV file is too big even if you open a file then it gives an alert that your program can not load all files. Regarding the process, it ten minutes, and then again system going to be hanged completely.

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            with open('/home/khawar/deepface/tests/results.csv', 'a+') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow([cross_sample[0], cross_sample[1]])
            negative = [cross_sample[0], cross_sample[1]]
            negatives.append(negative)

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

negatives = negatives.sample(positives.shape[0])

Memory screenshot.


回答1:


The product from itertools is a generator so naturally it dose not store the whole list in memory, but in the next line, cross_product = list(cross_product) you convert it to list object which store the whole data in your memory.

The idea of a generator is that you don't do all the calculation at the same time, as you do with your call list(itertools.product(samples_list[i], samples_list[j])). So what you want to do is generate the results one by one:

Try something like this:

for i in range(len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            # do something ...

So i guess i found your problem; you are appending all samples to negatives list first because of that your memory is going to be higher and higher, you need to write each row on realtime, one line at time;

Your data is csv right? so you can do this like:

import csv
for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):

            with open('results.csv', 'a+') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow([cross_sample[0], cross_sample[1]])

The idea is writing your rows realtime

Check this link too how to write the real time data into csv file in python

Some credits to @9mat, @cybot and this question How to get Cartesian product in Python using a generator?, how to write the real time data into csv file in python



来源:https://stackoverflow.com/questions/65855793/memory-leakage-issue-in-python-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!