Javascript Regex Word Boundary with optional non-word character

做~自己de王妃 提交于 2019-11-30 09:49:39

问题


I am looking to find a keyword match in a string. I am trying to use word boundary, but this may not be the best case for that solution. The keyword could be any word, and could be preceded with a non-word character. The string could be any string at all and could include all three of these words in the array, but I should only match on the keyword:

['hello', '#hello', '@hello'];

Here is my code, which includes an attempt found in post:

let userStr = 'why hello there, or should I say #hello there?';

let keyword = '#hello';

let re = new RegExp(`/(#\b${userStr})\b/`);

re.exec(keyword);
  • This would be great if the string always started with #, but it does not.
  • I then tried this /(#?\b${userStr})\b/, but if the string does start with #, it tries to match ##hello.
  • The matchThis str could be any of the 3 examples in the array, and the userStr may contain several variations of the matchThis but only one will be exact

回答1:


You need to account for 3 things here:

  • The main point is that a \b word boundary is a context-dependent construct, and if your input is not always alphanumeric-only, you need unambiguous word boundaries
  • You need to double escape special chars inside constructor RegExp notation
  • As you pass a variable to a regex, you need to make sure all special chars are properly escaped.

Use

let userStr = 'why hello there, or should I say #hello there?';
let keyword = '#hello';
let re_pattern = `(?:^|\\W)(${keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')})(?!\\w)`;
let res = [], m;

// To find a single (first) match
console.log((m=new RegExp(re_pattern).exec(userStr)) ? m[1] : "");

// To find multiple matches:
let rx = new RegExp(re_pattern, "g");
while (m=rx.exec(userStr)) {
    res.push(m[1]);
}
console.log(res);

Pattern description

  • (?:^|\\W) - a non-capturing string matching the start of string or any non-word char
  • (${keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')}) - Group 1: a keyword value with escaped special chars
  • (?!\\w) - a negative lookahead that fails the match if there is a word char immediately to the right of the current location.



回答2:


Check whether the keyword already begins with a special character. If it does, don't include it in the regular expression.

var re;
if ("#@".indexOf(keyword[0]) == -1) {
    re = new RegExp(`[@#]?\b${keyword}\b`);
} else {
    re = new RegExp(`\b${keyword}\b`);
}


来源:https://stackoverflow.com/questions/46942341/javascript-regex-word-boundary-with-optional-non-word-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!