Regex difference between word boundary end and edge

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 08:50:45

问题


The R help file for regex says

The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word

What is the difference between an end and an edge (of a word)?


回答1:


The difference between the \b and \< / \> is that \b can be used in PCRE regex patterns (when you specify perl=TRUE) and ICU regex patterns (stringr package).

> s = "no where nowhere"
> sub("\\<no\\>", "", s)
[1] " where nowhere"
> sub("\\<no\\>", "", s, perl=T) ## \> and \< do not work with PCRE
[1] "no where nowhere"
> sub("\\bno\\b", "", s, perl=T) ## \b works with PCRE
[1] " where nowhere"

> library(stringr)
> str_replace(s, "\\bno\\b", "")
[1] " where nowhere"
> str_replace(s, "\\<no\\>", "")
[1] "no where nowhere"

The advantage of \< (always stands for the beginning of a word) and \> (always matches the end of a word) is that they are unambiguous. The \b may match both positions.

One more thing to consider (refrence):

POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).



来源:https://stackoverflow.com/questions/36183288/regex-difference-between-word-boundary-end-and-edge

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!