问题
I would like to split some text in a data frame column and save it into a data frame together with the row number or an id column.
I normally used plyr to do that, but this is no longer working in dplyr.
If I understand it correctly, it is more a bug in plyr and my code works since it is a bug.
So I am looking for the correct way to do this.
This is a minimal example in plyr:
library(plyr)
set.seed(1)
df <- data.frame(a=seq(2),
b=c(paste(sample(letters,3), collapse=';'),
paste(sample(letters,3), collapse=';')),
stringsAsFactors=FALSE)
ddply(df,.(a),summarise,unlist(strsplit(b,';')))
It turns the original data frame:
a b
1 1 g;j;n
2 2 x;f;v
Into this:
a ..1
1 1 g
2 1 j
3 1 n
4 2 x
5 2 f
6 2 v
What would be the correct dplyr solution?
回答1:
I'm biased in favor of cSplit
from the "splitstackshape" package, but you might be interested in unnest
from "tidyr" in conjunction with "dplyr":
library(dplyr)
library(tidyr)
df %>%
mutate(b = strsplit(b, ";")) %>%
unnest(b)
# a b
# 1 1 g
# 2 1 j
# 3 1 n
# 4 2 x
# 5 2 f
# 6 2 v
回答2:
You could do this using cSplit
from splitstackshape
library(splitstackshape)
cSplit(df, 'b', ';', 'long')
# a b
#1: 1 g
#2: 1 j
#3: 1 n
#4: 2 x
#5: 2 f
#6: 2 v
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
separate(df, b, c('b1', 'b2', 'b3'), sep=";") %>%
gather(Var, b, -a) %>%
select(-Var) %>%
arrange(a)
Or another option would be to use do
df %>%
group_by(a) %>%
do(data.frame(b=unlist(strsplit(.$b, ';'))))
来源:https://stackoverflow.com/questions/28937369/splitting-text-in-column-and-add-row-number