问题
Based off Regex Until But Not Including, I'm trying to match all characters up until a word boundary.
For example - matching apple in the following string:
apple<
I'm doing that using:
- a negated set
[^] - with a word boundary
\b - and a plus
+repeater
Like this:
/a[^\b]+/
Which should look for an "a" and then grab one or more matches for any character that is not a word boundary. So I would expect it to stop before < which is at the end of the word
Demo in Regexr
Demo in StackSnippets
var input = [ "apple<", "apple/" ];
var myRegex = /a[^\b]+/;
for (var i = 0; i < input.length; i++) {
console.log(myRegex.exec(input[i]));
}
Couple other regex strings I tried:
I can use a negated word boundary or a negated set with a regular word boundary:
/a[\B]+//a[^\b]+/
I can specify several possible word ending characters and use them in a negated set:
/a[^|"<>\-\\\/;:,.]+/
I can also look for a postive set and just restrict it to return for regular letters:
/a[\w]+//a[a-zA-Z]+/
But I'd like to know how to do it for a word boundary if that's possible.
Here's a MDN's listing of word boundary and the characters that it constitutes
回答1:
Word boundaries (\b) are not characters, but the empty string between a sequence of letters and any non-letter character. Moreover, since Unicode support is still lacking in JavaScript, "letter" mean only ASCII letters.
Because of that, you
- generally shouldn't use
\bunless your data is some kind of computer language that can't possibly include Unicode - can't apply quantifiers to
\b(an empty string times 10 is still one empty string) - can't negate
\b(it's not a character set, so it has no complement) - can't include
\bin a character set (in square brackets) since, again, it's not a character or character set
Since \b doesn't actually add any characters to the match, you can safely append it to your regex:
/.+?\b/
will match all characters up until the first word boundary. It's in fact a superset of:
/\w+/
which is probably what you want, since you're interested only in the words, not the stuff in between.
回答2:
You have to include the word boundary as part of your regex like this:
/[A-Za-z]+\b/
Working demo
You could also use:
\w+\b
Although this will include the underscore as part of your word
回答3:
If this rewording of the question is accurate: match all words beginning with 'a', then you might have begun the search with existing SO answers like this one. Distilling that down you could use a character class for a word \w and to make it a bit more bulletproof by including a preceding word boundary \b match to prevent matching partial words including an 'a' such as 'baggage': /\ba\w+/gi
var input = [ "apple<", "apple/", "baggage;" ];
var myRegexWord = /\ba\w+/i;
var myRegexPartial = /a\w+/;
for (var i = 0; i < input.length; i++) {
console.log(myRegexWord.exec(input[i]));
console.log(myRegexPartial.exec(input[i]));
}
来源:https://stackoverflow.com/questions/29778716/match-all-characters-up-until-a-word-boundary