I want to split a data frame into several smaller ones. This looks like a very trivial question, however I cannot find a solution from web search.
Splitting the data frame seems counter-productive. Instead, use the split-apply-combine paradigm, e.g., generate some data
df = data.frame(grp=sample(letters, 100, TRUE), x=rnorm(100))
then split only the relevant columns and apply the scale()
function to x in each group, and combine the results (using split<-
or ave
)
df$z = 0
split(df$z, df$grp) = lapply(split(df$x, df$grp), scale)
## alternative: df$z = ave(df$x, df$grp, FUN=scale)
This will be very fast compared to splitting data.frames, and the result remains usable in downstream analysis without iteration. I think the dplyr syntax is
library(dplyr)
df %>% group_by(grp) %>% mutate(z=scale(x))
In general this dplyr solution is faster than splitting data frames but not as fast as split-apply-combine.