问题
What are non-word boundary in regex (\\B), compared to word-boundary?
回答1:
A word boundary (\b) is a zero width match that can match:
- Between a word character (
\w) and a non-word character (\W) or - Between a word character and the start or end of the string.
In Javascript the definition of \w is [A-Za-z0-9_] and \W is anything else.
The negated version of \b, written \B, is a zero width match where the above does not hold. Therefore it can match:
- Between two word characters.
- Between two non-word characters.
- Between a non-word character and the start or end of the string.
- The empty string.
For example if the string is "Hello, world!" then \b matches in the following places:
H e l l o , w o r l d !
^ ^ ^ ^
And \B matches those places where \b doesn't match:
H e l l o , w o r l d !
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
回答2:
The basic purpose of non-word-boundary is to created a regex that says:
if we are at the beginning/end of a
word char(\w=[a-zA-Z0-9_]) make sure the previous/next character is also aword char,e.g.:
"a\B."~"a\w":"ab","a4","a_", ... but not"a ","a."if we are at the beginning/end of a
non-word char(\W=[^a-zA-Z0-9_]) make sure the previous/next character is also anon-word char,e.g.:
"-\B."~"-\W":"-.","- ","--", ... but not"-a","-1"
For word-boundary it's similar but instead of making sure that the adjacent characters are of the same class (word char/non-word car) they need to differ, hence the name word's boundary.
来源:https://stackoverflow.com/questions/4541573/what-are-non-word-boundary-in-regex-b-compared-to-word-boundary