I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag
how about this one:
library(plyr)
ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))
seems like an excelent job for plyr package and ddply() function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:
df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)