I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don\'t know how to check against w
Chiu's comment is right: 'aaáaa'.match(/\b.+?\b/g) yelds quite counter-intuitive [ "aa", "á", "aa" ], because "word character" (\w) in JavaScript regular expressions is just a shorthand for [A-Za-z0-9_] ('case-insensitive-alpha-numeric-and-underscore'), so word boundary (\b) matches any place between chunk of alpha-numerics and any other character. This makes extracting "Unicode words" quite hard.
For non-unicase writing systems it is possible to identify "word character" by its dual nature: ch.toUpperCase() != ch.toLowerCase(), so your altered snippet could look like this:
var bannedWords = ["bad", "mad", "testing", "băţ", "bať"];
var bannedWordsRegex = new RegExp('-' + bannedWords.join("-|-") + '-', 'i');
$(function() {
$("input").on("input", function() {
var invalid = bannedWordsRegex.test(dashPaddedWords(this.value));
$('#log').html(invalid ? 'bad' : 'good');
});
$("input").trigger("input").focus();
function dashPaddedWords(str) {
return '-' + str.replace(/./g, wordCharOrDash) + '-';
};
function wordCharOrDash(ch) {
return isWordChar(ch) ? ch : '-'
};
function isWordChar(ch) {
return ch.toUpperCase() != ch.toLowerCase();
};
});