merge data frames to eliminate missing observations

前端未结

关注

 3  2037

I have two data frames. One (df1) contains all columns and rows of interest, but includes missing observations. The other (df2) includes values t

相关标签:

3条回答

后悔当初

2020-12-19 08:27

This will do:

m <- merge(df1, df2, by="county", all=TRUE)

dotx <- m[,grepl("\\.x",names(m))]

doty <- m[,grepl("\\.y",names(m))]

dotx[is.na(dotx)] <- doty[is.na(dotx)]

names(dotx) <- sapply(strsplit(names(dotx),"\\."), `[`, 1)

result <- cbind(m[,!grepl("\\.x",names(m)) & !grepl("\\.y",names(m))], dotx)

Checking:

> result
  county year1 year2 year3
1     aa    10    20    30
2     bb     1     2     3
3     cc     5    10    15
4     dd   100   150   200

0 讨论(0)

清歌不尽

2020-12-19 08:31

aggregate can do this:

aggregate(. ~ county,
          data=merge(df1, df2, all=TRUE), # Merged data, including NAs
          na.action=na.pass,              # Aggregate rows with missing values...
          FUN=sum, na.rm=TRUE)            # ...but instruct "sum" to ignore them.
##   county year2 year3 year1
## 1     aa    20    30    10
## 2     bb     2     3     1
## 3     cc    10    15     5
## 4     dd   150   200   100

0 讨论(0)

清酒与你

2020-12-19 08:41

Another option unsing reshape2 and working in the long format :

library(reshape2)
## reshape to long format
df1.m <- melt(df1)
df2.m <- melt(df2)
## get common values
idx <- df1.m$county %in% df2.m$county & 
       df1.m$variable%in% df2.m$variable
## replace NA values 
df1.m[idx,]$value <- ifelse(is.na(df1.m[idx,]$value),
                            df2.m$value , 
                            df1.m[idx,]$value)
## get the wide format
dcast(data=df1.m,county~variable)

  county year1 year2 year3
1     aa    10    20    30
2     bb     1     2     3
3     cc     5    10    15
4     dd   100   150   200

0 讨论(0)