Regular expression on separate function of Tidyr

问题

I need separate two columns with tidyr.

The column have text like: I am Sam. I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...].

The problem is I need split it in two columns: Column one I am, and column two: Sam.

I can't find a regular expression two separate with the second blank space.

Could you help me please?

回答1:

We can use extract from tidyr. We match one or more characters and place it in a capture group ((.*)) followed by one or more space (\\s+) and another capture group that contains only non-white space characters (\\S+) to separate the original column into two columns.

library(tidyr)
extract(df1, Col1, into = c("Col1", "Col2"), "(.*)\\s+(\\S+)")
#   Col1 Col2
#1  I am  Sam
#2 He is  Sam

data

df1 <- data.frame(Col1 = c("I am Sam", "He is Sam"), stringsAsFactors=FALSE)

回答2:

As an alternative, given:

library(tidyr)
df <- data.frame(txt = "I am Sam")

you can use

separate(, txt, c("a", "b"), sep="(?<=\\s\\S{1,100})\\s") 
#      a   b
# 1 I am Sam

with separate using stringi::stri_split_regex (ICU engine), or

separate(df, txt, c("a", "b"), sep="^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)

with the older (?) separate using base:strsplit (Perl engine). See also

strsplit("I am Sam", "^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
# [[1]]
# [1] "I am" "Sam"

But it might be a bit "esoterique"...

来源：https://stackoverflow.com/questions/37240306/regular-expression-on-separate-function-of-tidyr

标签

tidyr