How to ban words with diacritics using a blacklist array and regex?

前端 未结 5 1271
醉话见心
醉话见心 2021-01-11 19:16

I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don\'t know how to check against w

5条回答
  •  清歌不尽
    2021-01-11 19:42

    You need a Unicode aware word boundary. The easiest way is to use XRegExp package.

    Although its \b is still ASCII based, there is a \p{L} (or a shorter pL version) construct that matches any Unicode letter from the BMP plane. To build a custom word boundary using this contruct is easy:

    \b                     word            \b
      ---------------------------------------
     |                       |               |
    ([^\pL0-9_]|^)         word       (?=[^\pL0-9_]|$)
    

    The leading word boundary can be represented with a (non)capturing group ([^\pL0-9_]|^) that matches (and consumes) either a character other than a Unicode letter from the BMP plane, a digit and _ or a start of the string before the word.

    The trailing word boundary can be represented with a positive lookahead (?=[^\pL0-9_]|$) that requires a character other than a Unicode letter from the BMP plane, a digit and _ or the end of string after the word.

    See the snippet below that will detect băţ as a banned word, and băţy as an allowed word.

    var bannedWords = ["bad", "mad", "testing", "băţ"];
    var regex = new XRegExp('(?:^|[^\\pL0-9_])(?:' + bannedWords.join("|") + ')(?=$|[^\\pL0-9_])', 'i');
    
    $(function () {
      $("input").on("change", function () {
        var valid = !regex.test(this.value);
        //alert(valid);
        console.log("The word is", valid ? "allowed" : "banned");
      });
    });
    
    
    

提交回复
热议问题