问题
I have defined
vec <- "5f 110y, Fast"
and
gsub("[\\s0-9a-z]+,", "", vec)
gives "5f Fast
"
I would have expected it to give "Fast
" since everything before the comma should get matched by the regex.
Can anyone explain to me why this is not the case?
回答1:
You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s
, \d
, \w
.
So, the regex in your case, "[\\s0-9a-z]+,"
, matches 1 or more \
, s
, digits and lowercase ASCII letters, and then a single ,
.
You may use POSIX character classes instead, like [:space:]
(any whitespaces) or [:blank:]
(horizontal whitespaces):
> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"
Or, use a PCRE regex with \s
and perl=TRUE
argument:
> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"
To make \s
match all Unicode whitespaces, add (*UCP)
PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE)
.
回答2:
Could you please try folllowing and let me know if this helps you.
vec <- c("5f 110y, Fast")
gsub(".*,","",vec)
OR
gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec)
回答3:
A tidyverse
solution would be to use str_replace
with you original regex:
library(stringr)
str_replace(vec, "[\\s0-9a-z]+,", "")
回答4:
Try a different regex:
gsub("[[:blank:][:digit:][:lower:]]+,", "", vec)
#[1] " Fast"
Or, to remove the space after the comma,
gsub("[[:blank:][:digit:][:lower:]]+, ", "", vec)
#[1] "Fast"
来源:https://stackoverflow.com/questions/51421537/regex-issue-in-gsub