R: Pass data.frame by reference to a function

一个人想着一个人 提交于 2019-12-19 18:29:04

问题


I pass a data.frame as parameter to a function that want to alter the data inside:

x <- data.frame(value=c(1,2,3,4))
f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      d$value[i] <-0
    }
  }
  print(d)
}

When I execute f(x) I can see how the data.frame inside gets modified:

> f(x)
  value
1     1
2     0
3     3
4     0

However, the original data.frame I passed is unmodified:

> x
  value
1     1
2     2
3     3
4     4

Usually I have overcame this by returning the modified one:

f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      d$value[i] <-0
    }
  }
  d
}

And then call the method reassigning the content:

> x <- f(x)
> x
  value
1     1
2     0
3     3
4     0

However, I wonder what is the effect of this behaviour in a very large data.frame, is a new one grown for the method execution? Which is the R-ish way of doing this?

Is there a way to modify the original one without creating another one in memory?


回答1:


Actually in R (almost) each modification is performed on a copy of the previous data (copy-on-writing behavior).
So for example inside your function, when you do d$value[i] <-0 actually some copies are created. You usually won't notice that since it's well optimized, but you can trace it by using tracemem function.

That being said, if your data.frame is not really big you can stick with your function returning the modified object, since it's just one more copy afterall.

But, if your dataset is really big and doing a copy everytime can be really expensive, you can use data.table, that allows in-place modifications, e.g. :

library(data.table)
d <- data.table(value=c(1,2,3,4))
f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      set(d,i,1L,0) # special function of data.table (see also ?`:=` )
    }
  }
  print(d)
}

f(d)
print(d)

# results :
> f(d)
   value
1:     1
2:     0
3:     3
4:     0
> 
> print(d)
   value
1:     1
2:     0
3:     3
4:     0

N.B.

In this specific case, the loop can be replaced with a "vectorized" and more efficient version e.g. :

d[d$value %% 2 == 0,'value'] <- 0

but maybe your real loop code is much more convoluted and cannot be vectorized easily.



来源:https://stackoverflow.com/questions/33186633/r-pass-data-frame-by-reference-to-a-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!