Filtering a list of strings based on user locale

情到浓时终转凉″ 提交于 2019-12-22 03:44:12

问题


When working on a JavaScript project with AngularJS 1.6, I have a list of strings which I'd like to filter. For instance, assume my list contains árbol, cigüeña, nido and tubo.

When filtering strings in Spanish, if I filtered for "u", I'd expect both cigüeña and tubo to appear, which would be the most natural result for a Spaniard. However, this is not the case in German - u and ü are different letters and thus a German will not want to see cigüeña on the list. So I am looking for a way to make my list filtering aware of the user's locale.

I happen to have an object containing lots of diacritics, such that:

diacritics["á"] = "a";
diacritics["ü"] = "u";
// and so on...

This is what my filtering code looks like:

function matches(word, search) {
    var cleanWord = removeDiacritics(word.toLowerCase());
    var cleanSearch = removeDiacritics(search.toLowerCase());
    return cleanWord.indexOf(cleanSearch) > -1;
}

function removeDiacritics(word) {
    function match(a) {
        return diacritics[a] || a;
    }
    return text.replace(/[^\u0000-\u007E]/g, match);
}

The above code just removes all diacritics, so I thought to make it aware of the user's locale. Thus, I changed the match() function to this:

function match(a) {
    if (diacritics[a] && a.localeCompare(diacritics[a] === 0) {
        return diacritics[a];
    }
    return a;
}

Unfortunately, this doesn't work. The localeCompare function returns the same values when comparing "u" and "ü" with the German and Spanish locales, so that was not the answer here. I've gone over the reference for the localeCompare method and tried the usage and sensitivity options, but they don't seem to help much here.

How could I tweak my code for this to work? Is there any library which can handle this properly for me?


回答1:


I'd go about getting the user's locale directly from the browser via navigator (src), an object representing the user agent:

var language = navigator.language;

This will assign language the locale code of the user's browser, in my case en-US. I found this site helpful for finding locale code's to test other regions of the world.

My strFromLocale function is comparable to your removeDiacritics function:

function strFromLocale(str) {
    function match(letter) {
        function letterMatch(letter, normalizedLetter) {
            var location = new Intl.Collator(language, {usage: 'search', sensitivity: 'base' }).compare(letter, normalizedLetter);
            return (location == 0)
        }
        normalizedLetter = letter.normalize('NFD').replace(/[\u0300-\u036f]/gi, "")
        if ( letterMatch(letter, normalizedLetter) ) {
            return normalizedLetter;
        } else {
            return letter;
        }
    }
    return str.replace(/[^\u0000-\u007E]/g, match);
}

Note the line with Intl.Collator (src). This line compares the diacritic with the normalized letter of the diacritic and checks the given language's alphabet for positional differences. Therefore:

/* English */
new Intl.Collator('en-US', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü');
>>> 0

/* Swedish */
new Intl.Collator('sv', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü');
>>> -1

/* German */
new Intl.Collator('de', {usage: 'search', sensitivity: 'base' }).compare('u', 'ü');
>>> -1

As you can see in the letterMatch function, it returns true if and only if the result of Intl.Collator is 0, indicating that there are no positional differences of the letter within the alphabet of that language meaning it is safe to replace.

With that, here are some tests of the strFromLocale function:

var language = navigator.language; // en-US
strFromLocale("cigüeña");
>>> ciguena

var language = 'sv' // Swedish
strFromLocale("cigüeña");
>>> cigüena

var language = 'de' // German
strFromLocale("cigüeña");
>>> cigüena

var language = 'es-mx' // Spanish - Mexico
strFromLocale("cigüeña");
>>> cigueña



回答2:


You are probably looking for the ECMA 6 Intl library. This will allow you to adjust sort order based on locale e.g.:

// in German, ä sorts with a
console.log(new Intl.Collator('de').compare('ä', 'z'));
// → a negative value

// in Swedish, ä sorts after z
console.log(new Intl.Collator('sv').compare('ä', 'z'));
// → a positive value

The sensitivity: 'base' option will automatically sort with/without diacritics.

// in German, ä has a as the base letter
console.log(new Intl.Collator('de', { sensitivity: 'base' }).compare('ä', 'a'));
// → 0

// in Swedish, ä and a are separate base letters
console.log(new Intl.Collator('sv', { sensitivity: 'base' }).compare('ä', 'a'));
// → a positive value

You can then sort your list into the correct order prior to populating your UI Widget.



来源:https://stackoverflow.com/questions/47329596/filtering-a-list-of-strings-based-on-user-locale

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!