How to ban words with diacritics using a blacklist array and regex?

前端 未结 5 1266
醉话见心
醉话见心 2021-01-11 19:16

I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don\'t know how to check against w

5条回答
  •  半阙折子戏
    2021-01-11 19:41

    Chiu's comment is right: 'aaáaa'.match(/\b.+?\b/g) yelds quite counter-intuitive [ "aa", "á", "aa" ], because "word character" (\w) in JavaScript regular expressions is just a shorthand for [A-Za-z0-9_] ('case-insensitive-alpha-numeric-and-underscore'), so word boundary (\b) matches any place between chunk of alpha-numerics and any other character. This makes extracting "Unicode words" quite hard.

    For non-unicase writing systems it is possible to identify "word character" by its dual nature: ch.toUpperCase() != ch.toLowerCase(), so your altered snippet could look like this:

    var bannedWords = ["bad", "mad", "testing", "băţ", "bať"];
    var bannedWordsRegex = new RegExp('-' + bannedWords.join("-|-") + '-', 'i');
    
    $(function() {
      $("input").on("input", function() {
        var invalid = bannedWordsRegex.test(dashPaddedWords(this.value));
        $('#log').html(invalid ? 'bad' : 'good');
      });
      $("input").trigger("input").focus();
    
      function dashPaddedWords(str) {
        return '-' + str.replace(/./g, wordCharOrDash) + '-';
      };
    
      function wordCharOrDash(ch) {
        return isWordChar(ch) ? ch : '-'
      };
    
      function isWordChar(ch) {
        return ch.toUpperCase() != ch.toLowerCase();
      };
    });
    
    
    

提交回复
热议问题