Creating a function to replace NAs from one data.frame with values from another

后端 未结 3 559
鱼传尺愫
鱼传尺愫 2020-12-30 02:56

I regularly have situations where I need to replace missing values from a data.frame with values from some other data.frame that is at a different level of aggregation. So,

3条回答
  •  清歌不尽
    2020-12-30 03:59

    My preference would be to pull out the code from merge that does the matching and do it myself so that I could keep the ordering of the original data frame intact, both row-wise and column-wise. I also use matrix indexing to avoid any loops, though to do so I create a new data frame with the revised fillCols and replace the columns of the original with it; I thought I could fill it in directly but apparently you can't use matrix ordering to replace parts of a data.frame, so I wouldn't be surprised if a loop over the names would be faster in some situations.

    With matrix indexing:

    fillNaDf <- function(naDf, fillDf, mergeCols, fillCols) {
      fillB <- do.call(paste, c(fillDf[, mergeCols, drop = FALSE], sep="\r"))
      naB <- do.call(paste, c(naDf[, mergeCols, drop = FALSE], sep="\r"))
      na.ind <- is.na(naDf[,fillCols])
      fill.ind <- cbind(match(naB, fillB)[row(na.ind)[na.ind]], col(na.ind)[na.ind])
      naX <- naDf[,fillCols]
      fillX <- fillDf[,fillCols]
      naX[na.ind] <- fillX[fill.ind]
      naDf[,colnames(naX)] <- naX
      naDf
    }
    

    With a loop:

    fillNaDf2 <- function(naDf, fillDf, mergeCols, fillCols) {
      fillB <- do.call(paste, c(fillDf[, mergeCols, drop = FALSE], sep="\r"))
      naB <- do.call(paste, c(naDf[, mergeCols, drop = FALSE], sep="\r"))
      m <- match(naB, fillB)
      for(col in fillCols) {
        fix <- which(is.na(naDf[,col]))
        naDf[fix, col] <- fillDf[m[fix],col]
      }
      naDf
    }
    

提交回复
热议问题