问题
I am a fan of data.table
, as of writing re-usable functions for all current and future needs.
Here's a challenge I run into while working on the answer to this problem: Best way to plot automatically all data.table columns using ggplot2
We pass data.table to a function for plotting and then the original data.table gets modified, even though we made a copy of it to prevent that.
Here's a simple code to illustrate:
plotYofX <- function(.dt,x,y) {
dt <- .dt
dt[, (c(x,y)) := lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y])
}
> dtDiamonds <- data.table(ggplot2::diamonds[2:5,1:3]);
> dtDiamonds
carat cut color
<num> <ord> <ord>
1: 0.21 Premium E
2: 0.23 Good E
3: 0.29 Premium I
4: 0.31 Good J
> plotYofX(dtDiamonds,1,2);
> dtDiamonds
carat cut color
<num> <num> <ord>
1: 0.21 4 E
2: 0.23 2 E
3: 0.29 4 I
4: 0.31 2 J
I've seen many postings on various issues related to using :=
inside the function, but could not find any to help me to resolve this seemingly very easy issue. (Of course, I don't what to convert it back to data.frame
to achieve the desired outcome)
回答1:
Try:
dt <- copy(.dt)
It should work well.
回答2:
Thanks to comments/answers above: this would be the easiest solution to this particular function (i.e. no need to introduce any additional .dt
variable at all);
plotYofX <- function(dt,x,y) {
dt[, lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y])
}
However, it was also important to learn that when working with data.table
, one should be particularly careful in not making any "copies" of it with regular <-
sign, but use copy(dt)
instead - so that not corrupt the original data.table
!
This is further discussed in detail here: Understanding exactly when a data.table is a reference to (vs a copy of) another data.table
回答3:
Just leaving out the := function seemed to succeed. Of course I wrapped the ggplot value in print(.)
as would be standard practice when working inside a function and wanting output.:
plotYofX <- function(.dt,x,y) {
dt <- .dt
dt[, lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
print( ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y]) )
}
> png(); plotYofX(dtDiamonds,1,2); dev.off()
quartz
2
> dtDiamonds
carat cut color
1: 0.21 Premium E
2: 0.23 Good E
3: 0.29 Premium I
4: 0.31 Good J
来源:https://stackoverflow.com/questions/44661961/dont-want-original-data-table-to-be-modified-when-passed-to-a-function