Efficiently replace all accented characters in a string?

后端 未结 21 2832
别跟我提以往
别跟我提以往 2020-11-22 04:35

For a poor man\'s implementation of near-collation-correct sorting on the client side I need a JavaScript function that does efficient single character rep

21条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-11-22 05:11

    https://stackoverflow.com/a/37511463

    With ES2015/ES6 String.Prototype.Normalize(),

    const str = "Crème Brulée"
    str.normalize('NFD').replace(/[\u0300-\u036f]/g, "")
    > 'Creme Brulee'
    

    Two things are happening here:

    1. normalize()ing to NFD Unicode normal form decomposes combined graphemes into the combination of simple ones. The è of Crème ends up expressed as e + ̀.
    2. Using a regex character class to match the U+0300 → U+036F range, it is now trivial to globally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.

    See comment for performance testing.

    Alternatively, if you just want sorting

    Intl.Collator has sufficient support ~85% right now, a polyfill is also available here but I haven't tested it.

    const c = new Intl.Collator();
    ['creme brulee', 'crème brulée', 'crame brulai', 'crome brouillé',
    'creme brulay', 'creme brulfé', 'creme bruléa'].sort(c.compare)
    [ 'crame brulai','creme brulay','creme bruléa','creme brulee',
    'crème brulée','creme brulfé','crome brouillé' ]
    
    
    ['creme brulee', 'crème brulée', 'crame brulai', 'crome brouillé'].sort((a,b) => a>b)
    ["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]
    

提交回复
热议问题