Fastest way to sort each row of a large matrix in R

后端 未结 3 1171
北恋
北恋 2020-12-10 15:50

I have a large matrix:

set.seed(1)
a <- matrix(runif(9e+07),ncol=300)

I want to sort each row in the matrix:

> system         


        
3条回答
  •  暖寄归人
    2020-12-10 16:32

    The package grr contains an alternate sort method that can be used to speed up this particular operation (I have reduced the matrix size somewhat so that this benchmark doesn't take forever) :

    > set.seed(1)
    > a <- matrix(runif(9e+06),ncol=300)
    > microbenchmark::microbenchmark(sorted <- t(apply(a,1,sort))
    +                                ,sorted2 <- t(apply(a,1,sort.int, method='quick'))
    +                                ,sorted3 <- t(apply(a,1,grr::sort2)),times=3,unit='s')
    Unit: seconds
                                                      expr       min       lq     mean   median       uq      max neval
                            sorted <- t(apply(a, 1, sort)) 1.7699799 1.865829 1.961853 1.961678 2.057790 2.153902     3
     sorted2 <- t(apply(a, 1, sort.int, method = "quick")) 1.6162934 1.619922 1.694914 1.623551 1.734224 1.844898     3
                     sorted3 <- t(apply(a, 1, grr::sort2)) 0.9316073 1.003978 1.050569 1.076348 1.110049 1.143750     3
    

    The difference becomes dramatic when the matrix contains characters:

    > set.seed(1)
    > a <- matrix(sample(letters,size = 9e6,replace = TRUE),ncol=300)
    > microbenchmark::microbenchmark(sorted <- t(apply(a,1,sort))
    +                                ,sorted2 <- t(apply(a,1,sort.int, method='quick'))
    +                                ,sorted3 <- t(apply(a,1,grr::sort2)),times=3)
    Unit: seconds
                                                      expr       min        lq      mean    median        uq      max neval
                            sorted <- t(apply(a, 1, sort)) 15.436045 15.479742 15.552009 15.523440 15.609991 15.69654     3
     sorted2 <- t(apply(a, 1, sort.int, method = "quick")) 15.099618 15.340577 15.447823 15.581536 15.621925 15.66231     3
                     sorted3 <- t(apply(a, 1, grr::sort2))  1.728663  1.733756  1.780737  1.738848  1.806774  1.87470     3
    

    Results are identical for all three.

    > identical(sorted,sorted2,sorted3)
    [1] TRUE
    

提交回复
热议问题