Alternatives to pandas apply due to MemoryError

两盒软妹~` 提交于 2020-01-06 07:33:10

问题


I have a function that I wish to apply to a dataframe:

def DetermineMid(data, ts):

    if data['U'] == 0 and data['D'] > 0:
        mid = data['C'] + ts / 2

    elif data['U'] > 0 and data['D'] == 0:
        mid = data['C'] - ts / 2

    else:
        diff = data['A'] - data['B']

        if diff == 0:
            mid = data['C'] + 1

        else:
            mid = data['C']

    return mid

My df columns are A, B, C, D, U.

My call is as follows:

df = df.apply(DetermineMid, args=(5, ), axis=1).

On smaller dataframes this works just fine, but for this dataframe:

DatetimeIndex: 2561527 entries, 2016-11-30 17:00:01 to 2017-11-29 16:00:00 Data columns (total 6 columns):
Z float64
A float64
B float64
C float64
U int64
D int64
dtypes: float64(5), int64(2)
memory usage: 156.3 MB
None

I receive a MemoryError. Am I using apply incorrectly? I would have thought apply is just iterating through the rows and creating a value mid based on row values, then dropping all the old values as I do not care about them anymore.

Is there a better way to do that?


回答1:


Use np.select i.e

m1= (df['U']==0) & (df['D']>0)

m2 = (df['U']>0) & (df['D']==0)

m3 = (df['A']-df['B'] == 0 )

np.select([m1,m2,m3], [df['C']+ts/2, df['C']-ts/2, df['C']+1 ],df['C'])


来源:https://stackoverflow.com/questions/48061793/alternatives-to-pandas-apply-due-to-memoryerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!