How should I split and retain elements using strsplit?

风流意气都作罢 提交于 2019-11-29 00:59:18

问题


What a strsplit function in R does is, match and delete a given regular expression to split the rest of the string into vectors.

>strsplit("abc123def", "[0-9]+")
[[1]]
[1] "abc" ""    ""    "def" 

But how should I split the string the same way using regular expression, but also retain the matches? I need something like the following.

>FUNCTION("abc123def", "[0-9]+")
[[1]]
[1] "abc" "123" "def" 

Using strapply("abc123def", "[0-9]+|[a-z]+") works here, but what if the rest of the string other than the matches cannot be captured by a regular expression?


回答1:


Fundamentally, it seems to me that what you want is not to split on [0-9]+ but to split on the transition between [0-9]+ and everything else. In your string, that transition is not pre-existing. To insert it, you could pre-process with gsub and back-referencing:

test <- "abc123def"
strsplit( gsub("([0-9]+)","~\\1~",test), "~" )

[[1]]
[1] "abc" "123" "def"



回答2:


You could use lookaround assertions.

> test <- "abc123def"
> strsplit(test, "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)", perl=T)
[[1]]
[1] "abc" "123" "def"



回答3:


You can use strapply from gsubfn package.

test <- "abc123def"
strapply(X=test,
         pattern="([^[:digit:]]*)(\\d+)(.+)",
         FUN=c,
         simplify=FALSE)

[[1]]
[1] "abc" "123" "def"


来源:https://stackoverflow.com/questions/11013628/how-should-i-split-and-retain-elements-using-strsplit

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!