Deep copy of Pandas dataframes and dictionaries

独自空忆成欢 提交于 2020-05-08 16:03:06

问题


I'm creating a small Pandas dataframe:

df = pd.DataFrame(data={'colA': [["a", "b", "c"]]})

I take a deepcopy of that df. I'm not using the Pandas method but general Python, right?

import copy
df_copy = copy.deepcopy(df)

A df_copy.head() gives the following:

Then I put these values into a dictionary:

mydict = df_copy.to_dict()

That dictionary looks like this:

Finally, I remove one item of the list:

mydict['colA'][0].remove("b")

I'm surprized that the values in df_copy are updated. I'm very confused that the values in the original dataframe are updated too! Both dataframes look like this now:

I understand Pandas doesn't really do deepcopy, but this wasn't a Pandas method. My questions are:

1) how can I build a dictionary from a dataframe that doesn't update the dataframe?

2) how can I take a copy of a dataframe which would be completely independent?

thanks for your help!

Cheers, Nicolas


回答1:


When copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy. The same rule works, when you are creating a dictionary from DataFrame.

And copy.deepcopy doesn't solve this problem because what it really does, when applied on an object, its __dict__ is looked up for a __deepcopy__ method, that is called in turn. In the case of a DataFrame instance - __deepcopy__ is not work recursively. To take a copy of DataFrame, which would be completely independent - in your case you may use the following (notice that it's not a recommended practice - putting mutable objects inside a DataFrame is an antipattern):

df_copy = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df.values))

For a dictionary, you may use same trick:

mydict = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df_copy.values)).to_dict()
mydict['colA'][0].remove("b")

There's also some standard hacky way of deep-copying python objects:

import pickle
df_copy = pickle.loads(pickle.dumps(df))  

Hope I've answered your question. Feel free to ask for any clarifications, if needed.



来源:https://stackoverflow.com/questions/59683237/deep-copy-of-pandas-dataframes-and-dictionaries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!