Fast alternative to split in R

后端未结

关注

 2  1212

野的像风 2020-12-20 12:40

I\'m partitioning a data frame with split() in order to use parLapply() to call a function on each partition in parallel. The data frame has 1.3 m

2条回答

猫巷女王i (楼主)

2020-12-20 13:40
Split(x,f) is slow if x is a factor AND f contains a lot of different elements

So, this code if fast:
```
system.time(split(seq_len(1300000), sample(250000, 1300000, TRUE)))
```
But, this is very slow:
```
system.time(split(factor(seq_len(1300000)), sample(250000, 1300000, TRUE)))
```
And this is fast again because there are only 25 groups
```
system.time(split(factor(seq_len(1300000)), sample(25, 1300000, TRUE)))
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...