Regex issue in gsub

折月煮酒 提交于 2019-11-30 17:27:46

You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s, \d, \w.

So, the regex in your case, "[\\s0-9a-z]+,", matches 1 or more \, s, digits and lowercase ASCII letters, and then a single ,.

You may use POSIX character classes instead, like [:space:] (any whitespaces) or [:blank:] (horizontal whitespaces):

> gsub("[[:space:]0-9a-z]+,", "", vec)
[1] " Fast"

Or, use a PCRE regex with \s and perl=TRUE argument:

> gsub("[\\s0-9a-z]+,", "", vec, perl=TRUE)
[1] " Fast"

To make \s match all Unicode whitespaces, add (*UCP) PCRE verb at the pattern start: gsub("(*UCP)[\\s0-9a-z]+,", "", vec, perl=TRUE).

Could you please try folllowing and let me know if this helps you.

vec <- c("5f 110y, Fast")
gsub(".*,","",vec)

OR

gsub("[[:alnum:]]+ [[:alnum:]]+,","",vec)

A tidyverse solution would be to use str_replace with you original regex:

library(stringr)

str_replace(vec, "[\\s0-9a-z]+,", "")

Try a different regex:

gsub("[[:blank:][:digit:][:lower:]]+,", "", vec)
#[1] " Fast"

Or, to remove the space after the comma,

gsub("[[:blank:][:digit:][:lower:]]+, ", "", vec)
#[1] "Fast"
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!