Regex issue in gsub

I have defined

vec <- "5f 110y, Fast"

and

gsub("[\\s0-9a-z]+,", "", vec)

gives "5f Fast"

I would have expected it to give "Fast" since everything before the comma should get matched by the regex.

Can anyone explain to me why this is not the case?

You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s, \d, \w.

So, the regex in your case, "[\\s0-9a-z]+,", matches 1 or more \, s, digits and lowercase ASCII letters, and then a single ,.

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"

Or, use a PCRE regex with \s and perl=TRUE argument:

> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"

To make \s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE).

Could you please try folllowing and let me know if this helps you.

vec <- c("5f 110y, Fast")
gsub(".*,","",vec)

gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec)

A tidyverse solution would be to use str_replace with you original regex:

library(stringr)

str_replace(vec, "[\\s0-9a-z]+,", "")

Try a different regex:

gsub("[[:blank:][:digit:][:lower:]]+,", "", vec)
#[1] " Fast"

Or, to remove the space after the comma,

gsub("[[:blank:][:digit:][:lower:]]+, ", "", vec)
#[1] "Fast"

来源：https://stackoverflow.com/questions/51421537/regex-issue-in-gsub

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!