Word boundary regex issue

此生再无相见时 提交于 2019-12-12 16:09:58

问题


I'm having issues using word boundaries \b in my regular expression. I'm using R but the issue exists as well when I try http://regexr.com. The pattern I'm using is \bs\.l\.\b, and while I expected lines 1 and 3 below to match this pattern, only line 2 matches:

aaa s.l. bbb
aaa s.l.bbb
aaa s.l., bbb

See http://regexr.com/3f154 as well.


回答1:


The word boundaries match in the following positions:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

Now, you want to match s.l. that is preceded with a word boundary, and not followed with a word char. You need to replace the trailing \b with a (?!\w) lookaround:

\bs\.l\.(?!\w)

See the regex demo

Use perl=TRUE if you are using base R functions, and it will work as is in stringr functions powered with ICU regex library.




回答2:


. is not a word character, so there is no word boundary between the . characters and the space or comma.



来源:https://stackoverflow.com/questions/41537513/word-boundary-regex-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!