R: Updating a data frame with another data frame

后端 未结 4 1126
别那么骄傲
别那么骄傲 2020-12-18 12:51

Let\'s say our initial data frame looks like this:

df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,NA,NA,NA),C=c(1,2,3,NA,NA,NA))

> df1
  Index A  B  C
         


        
相关标签:
4条回答
  • 2020-12-18 12:58

    Not sure what the general case or conditions would be, but this works for this instance without dplyr

    df3 <- as.matrix(df1)
    df3[which(is.na(df3))] <- as.matrix(df2)
    df3 <- as.data.frame(df3)
    df3
    
      A B C
    1 1 1 1
    2 2 2 2
    3 3 3 3
    4 4 4 5
    5 5 4 5
    6 6 4 5
    
    0 讨论(0)
  • 2020-12-18 13:02

    We can use join from data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), join on with 'df1' using "Index" and assign (:=), the values in 'B' and 'C' with 'i.B' and 'i.C'.

    library(data.table)
    setDT(df1)[df2, c('B', 'C') := .(i.B, i.C), on = "Index"]
    df1
    #   Index A B C
    #1:     1 1 1 1
    #2:     2 2 2 2
    #3:     3 3 3 3
    #4:     4 4 4 5
    #5:     5 5 4 5
    #6:     6 6 4 5
    
    0 讨论(0)
  • 2020-12-18 13:03

    merge then aggregate:

    aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )
    
    #  Index B C A
    #1     1 1 1 1
    #2     2 2 2 2
    #3     3 3 3 3
    #4     4 4 5 4
    #5     5 4 5 5
    #6     6 4 5 6
    

    Or in dplyr speak:

    df1 %>% 
        full_join(df2) %>%
        group_by(Index) %>%
        summarise_each(funs(na.omit))
    
    #Joining by: c("Index", "B", "C")
    #Source: local data frame [6 x 4]
    #
    #  Index     A     B     C
    #  (dbl) (int) (dbl) (dbl)
    #1     1     1     1     1
    #2     2     2     2     2
    #3     3     3     3     3
    #4     4     4     4     5
    #5     5     5     4     5
    #6     6     6     4     5
    
    0 讨论(0)
  • 2020-12-18 13:16

    For those interested, I've extended this problem to:

    - handle updating a data frame with another data frame with new columns

    - replace any existing entries regardless if they're NA or not.

    Heres the solution I found using the aggregate function from @thelatemail :)

    df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,3,3,3),C=c(1,2,3,3,3,3))
    
    df2 = data.frame(Index=c(4,5,6),B=c(4,4,4),C=c(5,5,5),D=c(6,6,6),E=c(7,7,7))
    
    df3 = full_join(df1,df2)
    
    # Create a function na.omit.last 
    na.omit.last = function(x){
      x <- na.omit(x)
      x <- last(x)
    }
    
    # For the columns not in df1 
    dfA = aggregate(. ~ Index, df3, na.omit,na.action = na.pass)
    dfA = dfA[,-(1:ncol(df1))] 
    dfA = data.frame(lapply(dfA,as.numeric))
    
    dfB = aggregate(. ~ Index, df3[,1:ncol(df1)], na.omit.last, na.action = na.pass)
    
    # If there are more columns in df2 append dfA
    if (ncol(df2) > ncol(df1)) {
      df3 = cbind(dfB,dfA)
    }  else {
        df3 = dfB
      }
    
    print(df3)
    
    0 讨论(0)
提交回复
热议问题