Adding multiple columns in a dplyr mutate call

匿名 (未验证) 提交于 2019-12-03 01:17:01

问题:

I have a data frame with a dot-separated character column:

> set.seed(310366) > tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep="")) > tst     x       y 1   1 BAR.baz 2   2 FOO.foo 3   3 BAZ.baz 4   4 BAZ.foo 5   5 BAZ.bar 6   6 FOO.baz 7   7 BAR.bar 8   8 BAZ.baz

and I want to split that column into two new columns containing the parts on either side of the dot. str_split_fixed from package stringr can do the job quite nicely. All my values are definitely two parts separated by a dot so I can do:

> require(stringr) > str_split_fixed(tst$y,"\\.",2)       [,1]  [,2]   [1,] "BAR" "baz"  [2,] "FOO" "foo"  [3,] "BAZ" "baz"  [4,] "BAZ" "foo"  [5,] "BAZ" "bar"  [6,] "FOO" "baz"  [7,] "BAR" "bar"

Now I could just cbind that to my data frame but I thought I'd figure out how to do that in a dplyr pipeline. First I thought mutate could do it in one:

> tst %.% mutate(parts=str_split_fixed(y,"\\.",2)) Error: wrong result size (20), expected 10 or 1

I can get mutate to do it in two:

> tst %.% mutate(part1=str_split_fixed(y,"\\.",2)[,1], part2=str_split_fixed(y,"\\.",2)[,2])     x       y part1 part2 1   1 BAR.baz   BAR   baz 2   2 FOO.foo   FOO   foo 3   3 BAZ.baz   BAZ   baz 4   4 BAZ.foo   BAZ   foo 5   5 BAZ.bar   BAZ   bar 6   6 FOO.baz   FOO   baz

but that's running the string split twice.

"Best" I can do so far in a dplyr way is this (which I only discovered while writing this question...):

> tst %.% do(cbind(.,data.frame(parts=str_split_fixed(.$y,"\\.",2))))     x       y parts.1 parts.2 1   1 BAR.baz     BAR     baz 2   2 FOO.foo     FOO     foo 3   3 BAZ.baz     BAZ     baz 4   4 BAZ.foo     BAZ     foo 5   5 BAZ.bar     BAZ     bar

which isn't bad, but loses a lot of the readability of piped things in R. Is there a simple approach using mutate that I've missed?

回答1:

You can use separate() from tidyr in combination with dplyr:

tst %>% separate(y, c("y1", "y2"), sep = "\\.", remove=FALSE)      x       y  y1  y2 1   1 BAR.baz BAR baz 2   2 FOO.foo FOO foo 3   3 BAZ.baz BAZ baz 4   4 BAZ.foo BAZ foo 5   5 BAZ.bar BAZ bar 6   6 FOO.baz FOO baz 7   7 BAR.bar BAR bar 8   8 BAZ.baz BAZ baz 9   9 FOO.bar FOO bar 10 10
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!