I have a large dataset that chokes split()
in R. I am able to use dplyr
group_by (which is a preferred way anyway) but I am unable to persist the r
Since dplyr 0.5.0.9000
, the shortest solution that uses group_by()
is probably to follow do
with a pull
:
df %>% group_by(V1) %>% do(data=(.)) %>% pull(data)
Note that, unlike split
, this doesn't name the resulting list elements. If this is desired, then you would probably want something like
df %>% group_by(V1) %>% do(data = (.)) %>% with( set_names(data, V1) )
To editorialize a little, I agree with the folks saying that split()
is the better option. Personally, I always found it annoying that I have to type the name of the data frame twice (e.g., split( potentiallylongname, potentiallylongname$V1 )
), but the issue is easily sidestepped with the pipe:
df %>% split( .$V1 )