SparkR: dplyr-style split-apply-combine on DataFrame
问题 Under the previous RDD paradigm, I could specify a key and then map an operation to RDD elements corresponding to each key. I don't see a clear way to do this with DataFrame in SparkR as of 1.5.1. What I would like to do is something like a dplyr operation: new.df <- old.df %>% group_by("column1") %>% do(myfunc(.)) I currently have a large SparkR DataFrame of the form: timestamp value id 2015-09-01 05:00:00.0 1.132 24 2015-09-01 05:10:00.0 null 24 2015-09-01 05:20:00.0 1.129 24 2015-09-01 05