I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.
Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:
In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}] In [68]: d = DataFrame(d) In [69]: d Out[69]: a b 0 3 5 1 1 1 Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:
In [70]: e = d In [71]: e['a'] = e['a'].apply(lambda x: x + 1) The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:
In [72]: e # duplicate DataFrame Out[72]: a b 0 4 5 1 2 1 In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied Out[73]: a b 0 4 5 1 2 1 I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.
I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']] ), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.