Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False) [duplicate]

穿精又带淫゛_ 提交于 2020-06-14 07:45:12

问题


I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place).

I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).

If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?

Context:

I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:

in-place:

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

vs

copy:

top = df.sort('some_column', ascending=0).iloc[0]

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

Thanks for any insights in advance!


回答1:


In general, there is no difference between inplace=True and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df object, so reassignment is not necessary.

Furthermore, note that as of v0.21, df.sort is deprecated, use sort_values instead.



来源:https://stackoverflow.com/questions/47245583/pandas-manipulating-a-dataframe-inplace-vs-not-inplace-inplace-true-vs-false

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!