问题
I have regexp for check if some text containing word (with ignoring boundary)String regexp = ".*\\bSOME_WORD_HERE\\b.*";
but this regexp return false when "SOME_WORD" starts with # (hashtag).
Example, without #
String text = "some text and test word";
String matchingWord = "test";
boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*");
// now contains == true;
But with hashtag `contains` was false. Example:
text = "some text and #test word";
matchingWord = "#test";
contains = text.matches(".*\\b" + matchingWord + "\\b.*");
//contains == fasle; but I expect true
回答1:
The \b# pattern matches a # that is preceded with a word character: a letter, digit or underscore.
If you need to match # that is not preceded with a word char, use a negative lookbehind (?<!\w). Similarly, to make sure the trailing \b matches if a non-word char is there, use (?!\w) negative lookahead:
text.matches("(?s).*(?<!\\w)" + matchingWord + "(?!\\w).*");
Using Pattern.quote(matchingWord) is a good idea if your matchingWord can contain special regex metacharacters.
Alternatively, if you plan to match your search words in between whitespace or start/end of string, you can use (?<!\S) as the initial boundary and (?!\S) as the trailing one
text.matches("(?s).*(?<!\\S)" + matchingWord + "(?!\\S).*");
And one more thing: the .* in the .matches is not the best regex solution. A regex like "(?<!\\S)" + matchingWord + "(?!\\S)" with Matcher#find() will be processed in a much more optimized way, but you will need to initialize the Matcher object for that.
回答2:
Not exactly a solution as it's not using regexp anymore, but you can do it easily using contains :
text = "some text and #test word";
matchingWord = "#test";
contains = text.contains(matchingWord);
// contains == true
来源:https://stackoverflow.com/questions/41396917/matching-a-word-with-pound-symbol-in-a-regex