Regex word boundary issue when angle brackets are adjacent to the boundary

给你一囗甜甜゛ 提交于 2019-12-28 06:53:08

问题


Regex:

\b< low="" number="" low="">\b

Example string:

 <b22>Aquí se muestran algunos síntomas < low="" number="" low=""> tienen el siguiente aspecto.</b22> 

I'm not sure why the word boundary between síntomas and < is not being found. Same problem exists on the other side between > and tienen

Suggestions on how I might more properly match this boundary?

When I give it the following input, the Regex matches as expected:

Aquí se muestran algunos síntomas< low="" number="" low="">tienen el siguiente aspecto.

removing the edge conditions \b \bPHRASE\b are not an option because it cannot match parts of words

Update

This did the trick: (Thanks to Igor, Mosty, DK and NickC)

Regex(String.Format(@"(?<=[\s\.\?\!]){0}(?=[\s\.\?\!])", innerStringToMatch);

I needed to improve my boundary matching to [\s\.\?\!] and make these edge matches positive lookahead and lookbehind.


回答1:


\b is a zero-length match which can occur between two characters in the string, where one is a word character and the other is not a word character. Word character is defined as [A-Za-z0-9_]*. < is not a word character, that's why \b doesn't match.

You can try the following regex instead ((?: ) is a non-capturing parentheses group):

(?:\b|\s+)< low="" number="" low="">(?:\b|\s+)

*) Actually, this is not correct for all regex engines. To be precise, \b matches between \w and \W, where \w matches any word character. As Tim Pietzcker pointed out in the comment to this answer, the meaning of "word character" differs between implementations, but I don't know any where \w matches < or >.




回答2:


I think you're trying to do the following:

\s< low="" number="" low="">\s


来源:https://stackoverflow.com/questions/9087521/regex-word-boundary-issue-when-angle-brackets-are-adjacent-to-the-boundary

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!