DataFrame.apply in python pandas alters both original and duplicate DataFrames

匿名 (未验证) 提交于 2019-12-03 08:48:34

问题:

I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.

Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:

In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]  In [68]: d = DataFrame(d)  In [69]: d  Out[69]:     a  b 0  3  5 1  1  1 

Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:

In [70]: e = d  In [71]: e['a'] = e['a'].apply(lambda x: x + 1) 

The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:

In [72]: e # duplicate DataFrame Out[72]:     a  b 0  4  5 1  2  1  In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied Out[73]:      a  b 0  4  5 1  2  1 

I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.

I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']] ), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.

回答1:

This is not a pandas-specific issue. In Python, assignment never copies anything:

>>> a = [1,2,3] >>> b = a >>> b[0] = 'WHOA!' >>> a ['WHOA!', 2, 3] 

If you want a new DataFrame, make a copy with e = d.copy().

Edit: I should clarify that assignment to a bare name never copies anything. Assignment to an item or attribute (e.g., a[1] = x or a.foo = bar) is converted into method calls under the hood and may do copying depending on what kind of object a is.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!