Using identical() in R with multiple vectors

前端 未结 6 609
没有蜡笔的小新
没有蜡笔的小新 2020-12-15 15:48

Suppose that I have five vectors:

A<-1:10
B<-1:10
C<-1:10
D<-1:10
E<-1:12

I could test two at a time using identical( ).

6条回答
  •  天命终不由人
    2020-12-15 16:12

    I had the same problem but decided to implement a solution based on Reduce and one based on a double for loop.

    Functions:

    all_elements_the_same = function(list) {
    
      #func to compare with
      comparison_func = function(x, y) {
        if (!identical(x, y)) stop() #stop function if it finds a non-identical pair
        y #return second element
      }
    
      #run comparisons
      trial = try({
        Reduce(f = comparison_func, x = list, init = list[[1]])
      }, silent = T)
    
      #return
      if (class(trial) == "try-error") return(F)
      T
    }
    
    all_elements_the_same2 = function(list, ignore_names = F) {
      #double loop solution
      for (i in seq_along(list)) {
        for (j in seq_along(list)) {
          #skip if comparing to self or if comparison already done
          if (i >= j) next
    
          #check
          if (!identical(list[[i]], list[[j]])) return(F)
        }
      }
      T
    }
    

    Test objects:

    l_testlist_ok = list(1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
    l_testlist_bad = list(1:3, 1:3, 1:4, 1:3, 1:3, 1:3)
    l_testlist_bad2 = list(1:3, 1:3, 1:4, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
    

    Test functionality:

    > all_elements_the_same(l_testlist_ok)
    [1] TRUE
    > all_elements_the_same(l_testlist_bad)
    [1] FALSE
    > all_elements_the_same(l_testlist_bad2)
    [1] FALSE
    > all_elements_the_same2(l_testlist_ok)
    [1] TRUE
    > all_elements_the_same2(l_testlist_bad)
    [1] FALSE
    > all_elements_the_same2(l_testlist_bad2)
    [1] FALSE
    

    Test time use:

    > library(microbenchmark)
    > microbenchmark(all_elements_the_same(l_testlist_ok),
    + all_elements_the_same(l_testlist_bad),
    + all_elements_the_same(l_testlist_bad2),
    + all_elements_the_same2(l_testlist_ok),
    + all_elements_the_same2(l_testlist_bad),
    + all_elements_the_same2(l_testlist_bad2), times = 1e4)
    Unit: microseconds
                                        expr    min      lq       mean  median      uq      max neval
        all_elements_the_same(l_testlist_ok) 19.310  25.454  28.309016  26.917  28.380 1003.228 10000
       all_elements_the_same(l_testlist_bad) 93.624 100.938 108.890823 103.863 106.497 3130.807 10000
      all_elements_the_same(l_testlist_bad2) 93.331 100.938 107.963741 103.863 106.497 1181.404 10000
       all_elements_the_same2(l_testlist_ok) 48.275  53.541  57.334095  55.881  57.930  926.866 10000
      all_elements_the_same2(l_testlist_bad)  6.144   7.315   8.437603   7.900   8.778  998.839 10000
     all_elements_the_same2(l_testlist_bad2)  6.144   7.315   8.564780   8.192   8.778 1323.594 10000
    

    So apparently, the try part slows it down considerably. It may still save time to use the Reduce variant if one has very large objects, but for smaller objects, double for loop seems the way to go.

提交回复
热议问题