Regular expression on separate function of Tidyr

China☆狼群 提交于 2019-12-12 02:42:28

问题


I need separate two columns with tidyr.

The column have text like: I am Sam. I mean the text always have only two white spaces, and the text can have all other symbols: [a-z][0-9][!\ºª, etc...].

The problem is I need split it in two columns: Column one I am, and column two: Sam.

I can't find a regular expression two separate with the second blank space.

Could you help me please?


回答1:


We can use extract from tidyr. We match one or more characters and place it in a capture group ((.*)) followed by one or more space (\\s+) and another capture group that contains only non-white space characters (\\S+) to separate the original column into two columns.

library(tidyr)
extract(df1, Col1, into = c("Col1", "Col2"), "(.*)\\s+(\\S+)")
#   Col1 Col2
#1  I am  Sam
#2 He is  Sam

data

df1 <- data.frame(Col1 = c("I am Sam", "He is Sam"), stringsAsFactors=FALSE)



回答2:


As an alternative, given:

library(tidyr)
df <- data.frame(txt = "I am Sam")

you can use

separate(, txt, c("a", "b"), sep="(?<=\\s\\S{1,100})\\s") 
#      a   b
# 1 I am Sam

with separate using stringi::stri_split_regex (ICU engine), or

separate(df, txt, c("a", "b"), sep="^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE) 

with the older (?) separate using base:strsplit (Perl engine). See also

strsplit("I am Sam", "^.*?\\s(*SKIP)(*FAIL)|\\s", perl=TRUE)
# [[1]]
# [1] "I am" "Sam" 

But it might be a bit "esoterique"...



来源:https://stackoverflow.com/questions/37240306/regular-expression-on-separate-function-of-tidyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!