Remove accents/diacritics in a string in JavaScript

前端 未结 29 3054
轻奢々
轻奢々 2020-11-21 13:29

How do I remove accentuated characters from a string? Especially in IE6, I had something like this:

accentsTidy = function(s){
    var r=s.toLowerCase();
           


        
29条回答
  •  南旧
    南旧 (楼主)
    2020-11-21 14:03

    thanks to all
    I use this version and say why (because I misses those explanations at the begining, so I try to help the next reader if he is as dull as me ...)

    Remark : I wanted an efficient solution, so :

    • only one regexp compilation (if needed)
    • only one string scan for each string
    • an efficient way to find the translated characters etc ...

    My version is :
    (there is no new technical trick inside it, only some selected ones + explanations why)

    makeSortString = (function() {
        var translate_re = /[¹²³áàâãäåaaaÀÁÂÃÄÅAAAÆccç©CCÇÐÐèéê?ëeeeeeÈÊË?EEEEE€gGiìíîïìiiiÌÍÎÏ?ÌIIIlLnnñNNÑòóôõöoooøÒÓÔÕÖOOOØŒr®Ršs?ߊS?ùúûüuuuuÙÚÛÜUUUUýÿÝŸžzzŽZZ]/g;
        var translate = {
    "¹":"1","²":"2","³":"3","á":"a","à":"a","â":"a","ã":"a","ä":"a","å":"a","a":"a","a":"a","a":"a","À":"a","Á":"a","Â":"a","Ã":"a","Ä":"a","Å":"a","A":"a","A":"a",
    "A":"a","Æ":"a","c":"c","c":"c","ç":"c","©":"c","C":"c","C":"c","Ç":"c","Ð":"d","Ð":"d","è":"e","é":"e","ê":"e","?":"e","ë":"e","e":"e","e":"e","e":"e","e":"e",
    "e":"e","È":"e","Ê":"e","Ë":"e","?":"e","E":"e","E":"e","E":"e","E":"e","E":"e","€":"e","g":"g","G":"g","i":"i","ì":"i","í":"i","î":"i","ï":"i","ì":"i","i":"i",
    "i":"i","i":"i","Ì":"i","Í":"i","Î":"i","Ï":"i","?":"i","Ì":"i","I":"i","I":"i","I":"i","l":"l","L":"l","n":"n","n":"n","ñ":"n","N":"n","N":"n","Ñ":"n","ò":"o",
    "ó":"o","ô":"o","õ":"o","ö":"o","o":"o","o":"o","o":"o","ø":"o","Ò":"o","Ó":"o","Ô":"o","Õ":"o","Ö":"o","O":"o","O":"o","O":"o","Ø":"o","Œ":"o","r":"r","®":"r",
    "R":"r","š":"s","s":"s","?":"s","ß":"s","Š":"s","S":"s","?":"s","ù":"u","ú":"u","û":"u","ü":"u","u":"u","u":"u","u":"u","u":"u","Ù":"u","Ú":"u","Û":"u","Ü":"u",
    "U":"u","U":"u","U":"u","U":"u","ý":"y","ÿ":"y","Ý":"y","Ÿ":"y","ž":"z","z":"z","z":"z","Ž":"z","Z":"z","Z":"z"
        };
        return function(s) {
            return(s.replace(translate_re, function(match){return translate[match];}) );
        }
    })();
    

    and I use it this way :

    var without_accents = makeSortString("wïthêüÄTrèsBïgüeAk100t");
    // I let you guess the result,
    // no I was kidding you : I give you the result : witheuatresbigueak100t
    

    Comments :

    • Tthe instruction inside it is done once (after, makeSortString != undefined)
    • function(){...} is stored once in makeSortString, so the "big" translate_re and translate objects are stored once
    • When you call makeSortString('something') it call directly the inside function which calls only s.replace(...) : it is efficient
    • s.replace uses regexp (the special syntax of var translate_re= .... is in fact equivalent to var translate_re = new RegExp("[¹....Z]","g"); but the compilation of the regexp is done once for all, and the scan of the s String is done one for a call of the function (not for every character as it would be in a loop)
    • For each character found s.replace calls function(match) where parameter match contains the character found, and it call the corresponding translated character (translate[match])
    • Translate[match] is probably efficient too as the javascript translate object is probably implemented by javascript with a hashtab or something equivalent and allow the program to find the translated character almost directly and not for instance through a loop on a array of all characters to find the right one (which would be awfully unefficient).

提交回复
热议问题