Don't want original data.table to be modified when passed to a function

柔情痞子 提交于 2020-01-21 12:42:21

问题


I am a fan of data.table, as of writing re-usable functions for all current and future needs.

Here's a challenge I run into while working on the answer to this problem: Best way to plot automatically all data.table columns using ggplot2

We pass data.table to a function for plotting and then the original data.table gets modified, even though we made a copy of it to prevent that.

Here's a simple code to illustrate:

plotYofX <- function(.dt,x,y) {
  dt <- .dt
  dt[, (c(x,y)) := lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y])
}


> dtDiamonds <- data.table(ggplot2::diamonds[2:5,1:3]); 
> dtDiamonds
   carat     cut color
   <num>   <ord> <ord>
1:  0.21 Premium     E
2:  0.23    Good     E
3:  0.29 Premium     I
4:  0.31    Good     J

> plotYofX(dtDiamonds,1,2); 
> dtDiamonds
    carat   cut color
    <num> <num> <ord>
1:  0.21     4     E
2:  0.23     2     E
3:  0.29     4     I
4:  0.31     2     J

I've seen many postings on various issues related to using := inside the function, but could not find any to help me to resolve this seemingly very easy issue. (Of course, I don't what to convert it back to data.frame to achieve the desired outcome)


回答1:


Try:

dt <- copy(.dt)

It should work well.




回答2:


Thanks to comments/answers above: this would be the easiest solution to this particular function (i.e. no need to introduce any additional .dt variable at all);

plotYofX <- function(dt,x,y) {
  dt[,  lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y]) 

}

However, it was also important to learn that when working with data.table, one should be particularly careful in not making any "copies" of it with regular <- sign, but use copy(dt) instead - so that not corrupt the original data.table!
This is further discussed in detail here: Understanding exactly when a data.table is a reference to (vs a copy of) another data.table




回答3:


Just leaving out the := function seemed to succeed. Of course I wrapped the ggplot value in print(.) as would be standard practice when working inside a function and wanting output.:

plotYofX <- function(.dt,x,y) {
  dt <- .dt
  dt[,  lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  print( ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y]) )
}

> png(); plotYofX(dtDiamonds,1,2); dev.off()
quartz 
     2 
>  dtDiamonds
   carat     cut color
1:  0.21 Premium     E
2:  0.23    Good     E
3:  0.29 Premium     I
4:  0.31    Good     J



来源:https://stackoverflow.com/questions/44661961/dont-want-original-data-table-to-be-modified-when-passed-to-a-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!