accent insensitive regex

后端 未结 2 1208
刺人心
刺人心 2020-12-19 03:24

My code:

jQuery.fn.extend({
 highlight: function(search){
  var regex = new RegExp(\'(<[^>]*>)|(\'+ search.replace(/[.+]i/,\"$0\") +\')\',\'ig\');

         


        
相关标签:
2条回答
  • 2020-12-19 03:45

    You need to come up with a table of alternative characters and dynamically generate a regex based on that. For example:

    var alt = {
      'c': '[cCç]',
      'a': '[aAãÃá]',
      /* etc. */
    };
    
    highlight: function (search) {
      var pattern = '';
      for (var i = 0; i < search.length; i++) {
        var ch = search[i];
        if (alt.hasOwnProperty(ch))
          pattern += alt[ch];
        else
          pattern += ch;
      }
    
      ...
    }
    

    Then for search = 'cao' this will generate a pattern [cCç][aAãÃá]o.

    0 讨论(0)
  • 2020-12-19 03:56

    The sole correct way to do this is to first run it through Unicode Normalization Form D, canonical decomposition.

    You then strip our any Marks that result (\pM characters, or perhaps \p{Diacritic}, depending), and run your match against the de/un-marked version.

    Do not under any circumstances hardcode a bunch of literals. Eek!

    Boa sorte!

    0 讨论(0)
提交回复
热议问题