Using plyr, doMC, and summarise() with very big dataset?
问题 I have a fairly large dataset (~1.4m rows) that I'm doing some splitting and summarizing on. The whole thing takes a while to run, and my final application depends on frequent running, so my thought was to use doMC and the .parallel=TRUE flag with plyr like so (simplified a bit): library(plyr) require(doMC) registerDoMC() df <- ddply(df, c("cat1", "cat2"), summarize, count=length(cat2), .parallel = TRUE) If I set the number of cores explicitly to two (using registerDoMC(cores=2) ) my 8 GB of