I have a data frame with a dot-separated character column:
> set.seed(310366) > tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep="")) > tst x y 1 1 BAR.baz 2 2 FOO.foo 3 3 BAZ.baz 4 4 BAZ.foo 5 5 BAZ.bar 6 6 FOO.baz 7 7 BAR.bar 8 8 BAZ.baz
and I want to split that column into two new columns containing the parts on either side of the dot. str_split_fixed
from package stringr
can do the job quite nicely. All my values are definitely two parts separated by a dot so I can do:
> require(stringr) > str_split_fixed(tst$y,"\\.",2) [,1] [,2] [1,] "BAR" "baz" [2,] "FOO" "foo" [3,] "BAZ" "baz" [4,] "BAZ" "foo" [5,] "BAZ" "bar" [6,] "FOO" "baz" [7,] "BAR" "bar"
Now I could just cbind
that to my data frame but I thought I'd figure out how to do that in a dplyr
pipeline. First I thought mutate
could do it in one:
> tst %.% mutate(parts=str_split_fixed(y,"\\.",2)) Error: wrong result size (20), expected 10 or 1
I can get mutate
to do it in two:
> tst %.% mutate(part1=str_split_fixed(y,"\\.",2)[,1], part2=str_split_fixed(y,"\\.",2)[,2]) x y part1 part2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar 6 6 FOO.baz FOO baz
but that's running the string split twice.
"Best" I can do so far in a dplyr
way is this (which I only discovered while writing this question...):
> tst %.% do(cbind(.,data.frame(parts=str_split_fixed(.$y,"\\.",2)))) x y parts.1 parts.2 1 1 BAR.baz BAR baz 2 2 FOO.foo FOO foo 3 3 BAZ.baz BAZ baz 4 4 BAZ.foo BAZ foo 5 5 BAZ.bar BAZ bar
which isn't bad, but loses a lot of the readability of piped things in R. Is there a simple approach using mutate
that I've missed?