Regex issue in gsub

一曲冷凌霜 提交于 2019-12-03 20:39:28

问题


I have defined

vec <- "5f 110y, Fast"

and

gsub("[\\s0-9a-z]+,", "", vec)

gives "5f Fast"

I would have expected it to give "Fast" since everything before the comma should get matched by the regex.

Can anyone explain to me why this is not the case?


回答1:


You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s, \d, \w.

So, the regex in your case, "[\\s0-9a-z]+,", matches 1 or more \, s, digits and lowercase ASCII letters, and then a single ,.

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"

Or, use a PCRE regex with \s and perl=TRUE argument:

> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"

To make \s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE).




回答2:


Could you please try folllowing and let me know if this helps you.

vec <- c("5f 110y, Fast")
gsub(".*,","",vec)

OR

gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec)



回答3:


A tidyverse solution would be to use str_replace with you original regex:

library(stringr)

str_replace(vec, "[\\s0-9a-z]+,", "")



回答4:


Try a different regex:

gsub("[[:blank:][:digit:][:lower:]]+,", "", vec)
#[1] " Fast"

Or, to remove the space after the comma,

gsub("[[:blank:][:digit:][:lower:]]+, ", "", vec)
#[1] "Fast"


来源:https://stackoverflow.com/questions/51421537/regex-issue-in-gsub

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!