问题
I'm creating a small Pandas dataframe:
df = pd.DataFrame(data={'colA': [["a", "b", "c"]]})
I take a deepcopy of that df. I'm not using the Pandas method but general Python, right?
import copy
df_copy = copy.deepcopy(df)
A df_copy.head() gives the following:
Then I put these values into a dictionary:
mydict = df_copy.to_dict()
That dictionary looks like this:
Finally, I remove one item of the list:
mydict['colA'][0].remove("b")
I'm surprized that the values in df_copy are updated. I'm very confused that the values in the original dataframe are updated too! Both dataframes look like this now:
I understand Pandas doesn't really do deepcopy, but this wasn't a Pandas method. My questions are:
1) how can I build a dictionary from a dataframe that doesn't update the dataframe?
2) how can I take a copy of a dataframe which would be completely independent?
thanks for your help!
Cheers, Nicolas
回答1:
When copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy. The same rule works, when you are creating a dictionary from DataFrame.
And copy.deepcopy doesn't solve this problem because what it really does, when applied on an object, its __dict__ is looked up for a __deepcopy__ method, that is called in turn. In the case of a DataFrame instance - __deepcopy__ is not work recursively. To take a copy of DataFrame, which would be completely independent - in your case you may use the following (notice that it's not a recommended practice - putting mutable objects inside a DataFrame is an antipattern):
df_copy = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df.values))
For a dictionary, you may use same trick:
mydict = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df_copy.values)).to_dict()
mydict['colA'][0].remove("b")
There's also some standard hacky way of deep-copying python objects:
import pickle
df_copy = pickle.loads(pickle.dumps(df))
Hope I've answered your question. Feel free to ask for any clarifications, if needed.
来源:https://stackoverflow.com/questions/59683237/deep-copy-of-pandas-dataframes-and-dictionaries