Fastest way to sort each row of a large matrix in R

后端 未结 3 1170
北恋
北恋 2020-12-10 15:50

I have a large matrix:

set.seed(1)
a <- matrix(runif(9e+07),ncol=300)

I want to sort each row in the matrix:

> system         


        
3条回答
  •  [愿得一人]
    2020-12-10 16:18

    Well, I'm not aware of that many ways to sort faster in R, and the problem is that you're only sorting 300 values, but many times. Still, you can eek some extra performance out of sort by directly calling sort.int and using method='quick':

    set.seed(1)
    a <- matrix(runif(9e+07),ncol=300)
    
    # Your original code
    system.time(sorted <- t(apply(a,1,sort))) # 31 secs
    
    # sort.int with method='quick'
    system.time(sorted2 <- t(apply(a,1,sort.int, method='quick'))) # 27 secs
    
    # using a for-loop is slightly faster than apply (and avoids transpose):
    system.time({sorted3 <- a; for(i in seq_len(nrow(a))) sorted3[i,] <- sort.int(a[i,], method='quick') }) # 26 secs
    

    But a better way should be to use the parallel package to sort parts of the matrix in parallel. However, the overhead of transferring data seems to be too big, and on my machine it starts swapping since I "only" have 8 GB memory:

    library(parallel)
    cl <- makeCluster(4)
    system.time(sorted4 <- t(parApply(cl,a,1,sort.int, method='quick'))) # Forever...
    stopCluster(cl)
    

提交回复
热议问题