Free up memory by deleting numpy arrays

假装没事ソ 提交于 2020-01-23 13:51:48

问题


I have written a fatigue analysis program with a GUI. The program takes strain information for unit loads for each element of a finite element model, reads in a load case using np.genfromtxt('loadcasefilename.txt') and then does some fatigue analysis and saves the result for each element in another array.

The load cases are about 32Mb as text files and there are 40 or so which get read and analysed in a loop. The loads for each element are interpolated by taking slices of the load case array.

The GUI and fatigue analysis run in separate threads. When you click 'Start' on the fatigue analysis it starts the loop over the load cases in the fatigue analysis.

This brings me onto my problem. If I have a lot of elements, the analysis will not finish. How early it quits depends on how many elements there are, which makes me think it might be a memory problem. I've tried fixing this by deleting the load case array at the end of each loop (after deleting all the arrays which are slices of it) and running gc.collect() but this has not had any success.

In MatLab, I'd use the 'pack' function to write the workspace to disk, clear it, and then reload it at the end of each loop. I know this isn't good practice but it would get the job done! Can I do the equivalent in Python somehow?

Code below:

for LoadCaseNo in range(len(LoadCases[0]['LoadCaseLoops'])):#range(1):#xxx
    #Get load case data
    self.statustext.emit('Opening current load case file...')
    LoadCaseFilePath=LoadCases[0]['LoadCasePaths'][LoadCaseNo][0]
    #TK: load case paths may be different
    try:
      with open(LoadCaseFilePath):
        pass
    except Exception as e:
        self.statustext.emit(str(e))


    LoadCaseLoops=LoadCases[0]['LoadCaseLoops'][LoadCaseNo,0]
    LoadCase=np.genfromtxt(LoadCaseFilePath,delimiter=',')

    LoadCaseArray=np.array(LoadCases[0]['LoadCaseLoops'])
    LoadCaseArray=LoadCaseArray/np.sum(LoadCaseArray,axis=0)
    #Loop through sections
    for SectionNo in  range(len(Sections)):#range(100):#xxx 
        SectionCount=len(Sections)
        #Get section data
        Elements=Sections[SectionNo]['elements']
        UnitStrains=Sections[SectionNo]['strains'][:,1:]
        Nodes=Sections[SectionNo]['nodes']
        rootdist=Sections[SectionNo]['rootdist']
        #Interpolate load case data at this section
        NeighbourFind=rootdist-np.reshape(LoadCase[0,1:],(1,-1))
        NeighbourFind[NeighbourFind<0]=1e100
        nearest=np.unravel_index(NeighbourFind.argmin(), NeighbourFind.shape)
        nearestcol=int(nearest[1])
        Distance0=LoadCase[0,nearestcol+1]
        Distance1=LoadCase[0,nearestcol+7]
        MxLow=LoadCase[1:,nearestcol+1]
        MxHigh=LoadCase[1:,nearestcol+7]
        MyLow=LoadCase[1:,nearestcol+2]
        MyHigh=LoadCase[1:,nearestcol+8]
        MzLow=LoadCase[1:,nearestcol+3]
        MzHigh=LoadCase[1:,nearestcol+9]
        FxLow=LoadCase[1:,nearestcol+4]
        FxHigh=LoadCase[1:,nearestcol+10]
        FyLow=LoadCase[1:,nearestcol+5]
        FyHigh=LoadCase[1:,nearestcol+11]
        FzLow=LoadCase[1:,nearestcol+6]
        FzHigh=LoadCase[1:,nearestcol+12]
        InterpFactor=(rootdist-Distance0)/(Distance1-Distance0)
        Mx=MxLow+(MxHigh-MxLow)*InterpFactor[0,0]
        My=MyLow+(MyHigh-MyLow)*InterpFactor[0,0]
        Mz=MzLow+(MzHigh-MzLow)*InterpFactor[0,0]
        Fx=-FxLow+(FxHigh-FxLow)*InterpFactor[0,0]
        Fy=-FyLow+(FyHigh-FyLow)*InterpFactor[0,0]
        Fz=FzLow+(FzHigh-FzLow)*InterpFactor[0,0]
        #Loop through section coordinates
        for ElementNo in range(len(Elements)):
            MaterialID=int(Elements[ElementNo,1])
            if Materials[MaterialID]['curvefit'][0,0]!=3:
                StrainHist=UnitStrains[ElementNo,0]*Mx+UnitStrains[ElementNo,1]*My+UnitStrains[ElementNo,2]*Fz

            elif Materials[MaterialID]['curvefit'][0,0]==3:

                StrainHist=UnitStrains[ElementNo,3]*Fx+UnitStrains[ElementNo,4]*Fy+UnitStrains[ElementNo,5]*Mz

            EndIn=len(StrainHist)
            Extrema=np.bitwise_or(np.bitwise_and(StrainHist[1:EndIn-1]<=StrainHist[0:EndIn-2] , StrainHist[1:EndIn-1]<=StrainHist[2:EndIn]),np.bitwise_and(StrainHist[1:EndIn-1]>=StrainHist[0:EndIn-2] , StrainHist[1:EndIn-1]>=StrainHist[2:EndIn]))
            Extrema=np.concatenate((np.array([True]),Extrema,np.array([True])),axis=0)
            Extrema=StrainHist[np.where(Extrema==True)]
            del StrainHist
            #Do fatigue analysis
        self.statustext.emit('Analysing load case '+str(LoadCaseNo+1)+' of '+str(len(LoadCases[0]['LoadCaseLoops']))+' - '+str(((SectionNo+1)*100)/SectionCount)+'% complete')
        del MxLow,MxHigh,MyLow,MyHigh,MzLow,MzHigh,FxLow,FxHigh,FyLow,FyHigh,FzLow,FzHigh,Mx,My,Mz,Fx,Fy,Fz,Distance0,Distance1
    gc.collect()

回答1:


There's obviously a retain cycle or other leak somewhere, but without seeing your code, it's impossible to say more than that. But since you seem to be more interested in workarounds than solutions…

In MatLab, I'd use the 'pack' function to write the workspace to disk, clear it, and then reload it at the end of each loop. I know this isn't good practice but it would get the job done! Can I do the equivalent in Python somehow?

No, Python doesn't have any equivalent to pack. (Of course if you know exactly what set of values you want to keep around, you can always np.savetxt or pickle.dump or otherwise stash them, then exec or spawn a new interpreter instance, then np.loadtxt or pickle.load or otherwise restore those values. But then if you know exactly what set of values you want to keep around, you probably aren't going to have this problem in the first place, unless you've actually hit an unknown memory leak in NumPy, which is unlikely.)


But it has something that may be better. Kick off a child process to analyze each element (or each batch of elements, if they're small enough that the process-spawning overhead matters), send the results back in a file or over a queue, then quit.

For example, if you're doing this:

def analyze(thingy):
    a = build_giant_array(thingy)
    result = process_giant_array(a)
    return result

total = 0
for thingy in thingies:
    total += analyze(thingy)

You can change it to this:

def wrap_analyze(thingy, q):
    q.put(analyze(thingy))

total = 0
for thingy in thingies:
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=wrap_analyze, args=(thingy, q))
    p.start()
    p.join()
    total += q.get()

(This assumes that each thingy and result is both smallish and pickleable. If it's a huge NumPy array, look into NumPy's shared memory wrappers, which are designed to make things much easier when you need to share memory directly between processes instead of passing it.)

But you may want to look at what multiprocessing.Pool can do to automate this for you (and to make it easier to extend the code to, e.g., use all your cores in parallel). Notice that it has a maxtasksperchild parameter, which you can use to recycle the pool processes every, say, 10 thingies, so they don't run out of memory.


But back to actually trying to solve things briefly:

I've tried fixing this by deleting the load case array at the end of each loop (after deleting all the arrays which are slices of it) and running gc.collect() but this has not had any success.

None of that should make any difference at all. If you're just reassigning all the local variables to new values each time through the loop, and aren't keeping references to them anywhere else, then they're just going to get freed up anyway, so you'll never have more than 2 at a (brief) time. And gc.collect() only helps if there are reference cycles. So, on the one hand, it's good news that these had no effect—it means there's nothing obviously stupid in your code. On the other hand, it's bad news—it means that whatever's wrong isn't obviously stupid.

Usually people see this because they keep growing some data structure without realizing it. For example, maybe you vstack all the new rows onto the old version of giant_array instead of onto an empty array, then delete the old version… but it doesn't matter, because each time through the loop, giant_array isn't 5*N, it's 5*N, then 10*N, then 15*N, and so on. (That's just an example of something stupid I did not long ago… Again, it's hard to give more specific examples while knowing nothing about your code.)



来源:https://stackoverflow.com/questions/27418943/free-up-memory-by-deleting-numpy-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!