When does pandas do pass-by-reference Vs pass-by-value when passing dataframe to a function?

烈酒焚心 提交于 2019-12-06 09:26:21

问题


def dropdf_copy(df):
    df = df.drop('y',axis=1)

def dropdf_inplace(df):
    df.drop('y',axis=1,inplace=True)    

def changecell(df):
    df['y'][0] = 99


x = pd.DataFrame({'x': [1,2],'y': [20,31]})

x
Out[204]: 
   x   y
0  1  20
1  2  31

dropdf_copy(x)

x
Out[206]: 
   x   y
0  1  20
1  2  31

changecell(x)

x
Out[208]: 
   x   y
0  1  99
1  2  31

In the above example dropdf() doesnt modify the original dataframe x while changecell() modifies x. I know if I add the minor change to changecell() it wont change x.

def changecell(df):
    df = df.copy()
    df['y'][0] = 99

I dont think its very elegant to inlcude df = df.copy() in every function I write.

Questions

1) Under what circumstances does pandas change the original dataframe and when it does not? Can someone give me a clear generalizable rule? I know it may have something to do with mutability Vs immutability but its not clearly explained in stackoverflow.

2) Does numpy behave simillary or its different? What about other python objects?

PS: I have done research in stackoverflow but couldnt find a clear generalizable rule for this problem.


回答1:


By default python does pass by reference. Only if a explicit copy is made in the function like assignment or a copy() function is used the original object passed is unchanged.

Example with explicit copy :

#1. Assignment 
def dropdf_copy1(df):

    df = df.drop('y',axis=1)
#2. copy()
def dropdf_copy2(df):
    df = df.copy() 
    df.drop('y',axis=1,inplace = True)

If explicit copy is not done then original object passed is changed.

def dropdf_inplace(df):
    df.drop('y',axis=1,inplace = True)



回答2:


Nothing to deal with pandas. It'a problem of local/global variables on mutable values. in dropdf, you set df as a local variable.

The same with lists:

def global_(l):
    l[0]=1

def local_(l):
    l=l+[0]  

in the second function, it will be the same if you wrote :

def local_(l):
    l2=l+[0]

so you don't affect l.

Here the python tutor exemple which shoes what happen.



来源:https://stackoverflow.com/questions/47003629/when-does-pandas-do-pass-by-reference-vs-pass-by-value-when-passing-dataframe-to

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!