Fast alternative to split in R
I'm partitioning a data frame with split() in order to use parLapply() to call a function on each partition in parallel. The data frame has 1.3 million rows and 20 cols. I'm splitting/partitioning by two columns, both character type. Looks like there are ~47K unique IDs and ~12K unique codes, but not every pairing of ID and code are matched. The resulting number of partitions is ~250K. Here is the split() line: system.time(pop_part <- split(pop, list(pop$ID, pop$code))) The partitions will then be fed into parLapply() as follows: cl <- makeCluster(detectCores()) system.time(par_pop <-