Fast alternative to split in R

后端 未结 2 1212
野的像风
野的像风 2020-12-20 12:40

I\'m partitioning a data frame with split() in order to use parLapply() to call a function on each partition in parallel. The data frame has 1.3 m

2条回答
  •  猫巷女王i
    2020-12-20 13:40

    Split(x,f) is slow if x is a factor AND f contains a lot of different elements

    So, this code if fast:

    system.time(split(seq_len(1300000), sample(250000, 1300000, TRUE)))
    

    But, this is very slow:

    system.time(split(factor(seq_len(1300000)), sample(250000, 1300000, TRUE)))
    

    And this is fast again because there are only 25 groups

    system.time(split(factor(seq_len(1300000)), sample(25, 1300000, TRUE)))
    

提交回复
热议问题