How to ban words with diacritics using a blacklist array and regex?

前端未结

关注

 5  1271

醉话见心 2021-01-11 19:16

I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don\'t know how to check against w

5条回答

清歌不尽 (楼主)

2021-01-11 19:42
You need a Unicode aware word boundary. The easiest way is to use XRegExp package.

Although its \b is still ASCII based, there is a \p{L} (or a shorter pL version) construct that matches any Unicode letter from the BMP plane. To build a custom word boundary using this contruct is easy:
```
\b                     word            \b
  ---------------------------------------
 |                       |               |
([^\pL0-9_]|^)         word       (?=[^\pL0-9_]|$)
```
The leading word boundary can be represented with a (non)capturing group ([^\pL0-9_]|^) that matches (and consumes) either a character other than a Unicode letter from the BMP plane, a digit and _ or a start of the string before the word.

The trailing word boundary can be represented with a positive lookahead (?=[^\pL0-9_]|$) that requires a character other than a Unicode letter from the BMP plane, a digit and _ or the end of string after the word.

See the snippet below that will detect băţ as a banned word, and băţy as an allowed word.
```
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new XRegExp('(?:^|[^\\pL0-9_])(?:' + bannedWords.join("|") + ')(?=$|[^\\pL0-9_])', 'i');

$(function () {
  $("input").on("change", function () {
    var valid = !regex.test(this.value);
    //alert(valid);
    console.log("The word is", valid ? "allowed" : "banned");
  });
});
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...