is dash a special character in R regex?

妖精的绣舞 提交于 2019-12-23 07:07:56

问题


Despite reading the help page of R regex

Finally, to include a literal -, place it first or last (or, for perl = TRUE only, precede it by a backslash).

I can't understand the difference between

grepl(pattern=paste("^thing1\\-",sep=""),x="thing1-thing2")

and

grepl(pattern=paste("^thing1-",sep=""),x="thing1-thing2")

Both return TRUE. Should I escape or not here? What is the best practice?


回答1:


The hyphen is mostly a normal character in regular expressions.

You do not need to escape the hyphen outside of a character class; it has no special meaning.

Within a character class [ ] you can place a hyphen as the first or last character in the range. If you place the hyphen anywhere else you need to escape it in order to add it to your class.

Examples:

grepl('^thing1-', x='thing1-thing2')
[1] TRUE
grepl('[-a-z]+', 'foo-bar')
[1] TRUE
grepl('[a-z-]+', 'foo-bar')
[1] TRUE
grepl('[a-z\\-\\d]+', 'foo-bar')
[1] TRUE

Note: It is more common to find a hyphen placed first or last within a character class.




回答2:


To see what it means for - to have a special meaning inside of a character class (and how putting it last gives it its literal meaning), try the following:

grepl("[w-y]", "x")
# [1] TRUE
grepl("[w-y]", "-")
# [1] FALSE
grepl("[wy-]", "-")
# [1] TRUE
grepl("[wy-]", "x")
# [1] FALSE



回答3:


They are both matching the exact same text in these instances. I.e.:

x <- "thing1-thing2"
regmatches(x,regexpr("^thing1\\-",x))
#[1] "thing1-"
regmatches(x,regexpr("^thing1-",x))
#[1] "thing1-"

Using a - is a special character in certain situations though, for specifying ranges of values, such as characters between a and z when specifed inside [], e.g.:

regmatches(x,regexpr("[a-z]+",x))
#[1] "thing"


来源:https://stackoverflow.com/questions/24154248/is-dash-a-special-character-in-r-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!