R: Comparing fields in matrix

时光总嘲笑我的痴心妄想 提交于 2019-12-13 00:52:38

问题


I've got two data frames I want to compare: If a specific location in both data frames meet a requirement assign "X" to that specific location in a seperate data frame.

How can I get the expected output in an efficient way? The real data frame contains 1000 columns with thousands to millions of rows. I think data.table would be the quickest option, but I don't have a grasp of how data.table works yet

Expected output:

> print(result)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] "A"  "A"  "O"  "X"  "X"  "X"  "X"  "O"  "O" 
# [2,] "A"  "A"  "O"  "X"  "X"  "X"  "X"  "O"  "O" 
# [3,] "A"  "A"  "O"  "X"  "X"  "X"  "X"  "O"  "X" 

My code:

df1 <- structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 
            2, 2, 2, 2, 3, 3, 3, 2, 0, 1), .Dim = c(3L, 9L), .Dimnames = list(
              c("A", "B", "C"), NULL))
df2 <- structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 
            2, 2, 2, 2, 1, 3, 3, 4, 4, 2), .Dim = c(3L, 9L), .Dimnames = list(
              c("A", "B", "C"), NULL))

result <- matrix("O", nrow(df1), ncol(df1))


for (i in 1:nrow(df1)) 
{
  for (j in 3:ncol(df1)) 
  {
    result[i,1] = c("A")
    result[i,2] = c("A")
    if (is.na(df1[i,j]) || is.na(df2[i,j])){
      result[i,j] <- c("N")
    }
    if (!is.na(df1[i,j]) && !is.na(df2[i,j]) && !is.na(df2[i,j]))
    {

      if (df1[i,j] %in% c("0","1","2") & df2[i,j] %in% c("0","1","2")) {
        result[i,j] <- c("X") 
      }
    }
  }
}   


print(result)

Edit

I like both @David's and @Heroka's solutions. On a small dataset, Heroka's solution is 125x as fast as the original, and David's is 29 times as fast. Here's the benchmark:

> mbm
Unit: milliseconds
             expr        min          lq       mean      median          uq        max neval
         original 1058.81826 1110.481659 1131.81711 1112.848211 1124.775989 1428.18079   100
           Heroka    8.46317    8.711986    9.03517    8.914616    9.067793   18.06716   100
 DavidAarenburg()   35.58350   36.660565   39.85823   37.061160   38.175700   53.83976   100

Thanks alot guys!


回答1:


You have matrices, not dataframes.

One approach might be to use ifelse (and %in% a numeric variable, saves about 50% of the time to avoid the time-conversion.:

  result <- ifelse(is.na(df1)|is.na(df2),"N",
                   ifelse(df1 %in% 0:2 & df2 %in% 0:2,"X","O"))
  result[,1:2] <- "A"
  result

With thanks to @DavidArenburg, more improvement in speed

result <- matrix("O",nrow=nrow(df1),ncol=ncol(df1))
result[is.na(df1) | is.na(df2)] <- "N"
result[df1 < 3 & df2 < 3] <- "X"
result[, 1:2] <- "A"


来源:https://stackoverflow.com/questions/34001333/r-comparing-fields-in-matrix

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!