问题
I'm trying to put together several files and need to do a bunch of merges on column names that are created inside a loop. I can do this fine using data.frame()
but am having issues using similar code with a data.table()
:
library(data.table)
df1 <- data.frame(id = 1:20, col1 = runif(20))
df2 <- data.frame(id = 1:20, col1 = runif(20))
newColNum <- 5
newColName <- paste('col',newColNum ,sep='')
df1[,newColName] <- runif(20)
df2 <- merge(df2, df1[,c('id',newColName)], by = 'id', all.x = T) # Works fine
######################
dt1 <- data.table(id = 1:20, col1 = runif(20))
dt2 <- data.table(id = 1:20, col1 = runif(20))
newColNum <- 5
newColName <- paste('col',newColNum ,sep='')
dt1[,newColName] <- runif(20)
dt2 <- merge(dt2, dt1[,c('id',newColName)], by = 'id', all.x = T) # Doesn't work
Any suggestions?
回答1:
This really has nothing to do with merge()
, and everything to do with how the j
(i.e. column) index is, by default, interpreted by [.data.table()
.
You can make the whole statement work by setting with=FALSE
, which causes the j
index to be interpreted as it would be in a data.frame
:
dt2 <- merge(dt2, dt1[,c('id',newColName), with=FALSE], by = 'id', all.x = T)
head(dt2, 3)
# id col1 col5
# 1: 1 0.4954940 0.07779748
# 2: 2 0.1498613 0.12707070
# 3: 3 0.8969374 0.66894157
More precisely, from ?data.table
:
with: By default 'with=TRUE' and 'j' is evaluated within the frame of 'x'. The column names can be used as variables. When 'with=FALSE', 'j' is a vector of names or positions to select.
Note that this could be avoided by storing the columns in a variable like so:
cols = c('id', newColName)
dt1[ , ..cols]
..
signals to "look up one level"
回答2:
Try dt1[,list(id,get(newColName))]
in your merge.
来源:https://stackoverflow.com/questions/19794456/r-data-table-merge-on-named-subset-of-another-data-table