问题
I have regexp for check if some text containing word (with ignoring boundary)String regexp = ".*\\bSOME_WORD_HERE\\b.*";
but this regexp return false
when "SOME_WORD" starts with # (hashtag).
Example, without #
String text = "some text and test word";
String matchingWord = "test";
boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*");
// now contains == true;
But with hashtag `contains` was false. Example:
text = "some text and #test word";
matchingWord = "#test";
contains = text.matches(".*\\b" + matchingWord + "\\b.*");
//contains == fasle; but I expect true
回答1:
The \b#
pattern matches a #
that is preceded with a word character: a letter, digit or underscore.
If you need to match #
that is not preceded with a word char, use a negative lookbehind (?<!\w)
. Similarly, to make sure the trailing \b
matches if a non-word char is there, use (?!\w)
negative lookahead:
text.matches("(?s).*(?<!\\w)" + matchingWord + "(?!\\w).*");
Using Pattern.quote(matchingWord)
is a good idea if your matchingWord
can contain special regex metacharacters.
Alternatively, if you plan to match your search words in between whitespace or start/end of string, you can use (?<!\S)
as the initial boundary and (?!\S)
as the trailing one
text.matches("(?s).*(?<!\\S)" + matchingWord + "(?!\\S).*");
And one more thing: the .*
in the .matches
is not the best regex solution. A regex like "(?<!\\S)" + matchingWord + "(?!\\S)"
with Matcher#find()
will be processed in a much more optimized way, but you will need to initialize the Matcher
object for that.
回答2:
Not exactly a solution as it's not using regexp anymore, but you can do it easily using contains :
text = "some text and #test word";
matchingWord = "#test";
contains = text.contains(matchingWord);
// contains == true
来源:https://stackoverflow.com/questions/41396917/matching-a-word-with-pound-symbol-in-a-regex