R: Updating NAs in a data table with values of another data table

问题

There are two data tables of the following structure:

 DT1 <- data.table(ID=c("A","B","C"), P0=c(1,10,100), key="ID")
 DT2 <- data.table(ID=c("B","B","B","A","A","A","C","C","C"), t=rep(seq(0:2),3), P=c(NA,30,50,NA,4,6,NA,200,700))

In data tableDT2all NAs in column P shall be updated by values P0 out of data table DT1.

If DT2 is ordered by ID like DT1, the problem can be solved like this:

 setorder(DT2,ID)
 idxr <- which(DT2[["t"]]==1)
 set(DT2, i=idxr, j="P", value=DT1[["P0"]])

But how can the data tables be "merged" without ordering DT2 before?

回答1:

We can join the two datasets on 'ID', for NA values in 'P', we assign 'P' as 'P0', and then remove the 'P0' by assigning it to 'NULL'.

library(data.table)#v1.9.6+
DT2[DT1, on='ID'][is.na(P), P:= P0][, P0:= NULL][]

Or as @DavidArenburg mentioned, we can use ifelse condition after joining on 'ID' to replace the NA elements in 'P'.

DT2[DT1, P := ifelse(is.na(P), i.P0, P), on = 'ID']

回答2:

Here's another option of joining by condition

DT2[is.na(P), P := DT1[.SD, P0]]
DT2
#    ID t   P
# 1:  B 1  10
# 2:  B 2  30
# 3:  B 3  50
# 4:  A 1   1
# 5:  A 2   4
# 6:  A 3   6
# 7:  C 1 100
# 8:  C 2 200
# 9:  C 3 700

来源：https://stackoverflow.com/questions/33981797/r-updating-nas-in-a-data-table-with-values-of-another-data-table

标签

merge

data.table

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!