Concrete Javascript Regex for Accented Characters (Diacritics)

后端 未结 9 1131
庸人自扰
庸人自扰 2020-11-22 17:22

I\'ve looked on Stack Overflow (replacing characters.. eh, how JavaScript doesn\'t follow the Unicode standard concerning RegExp, etc.) and haven\'t really found a concrete

9条回答
  •  广开言路
    2020-11-22 17:49

    /^[\pL\pM\p{Zs}.-]+$/u
    

    Explanation:

    • \pL - matches any kind of letter from any language
    • \pM - atches a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)
    • \p{Zs} - matches a whitespace character that is invisible, but does take up space
    • u - Pattern and subject strings are treated as UTF-8

    Unlike other proposed regex (such as [A-Za-zÀ-ÖØ-öø-ÿ]), this will work with all language specific characters, e.g. Šš is matched by this rule, but not matched by others on this page.

    Unfortunately, natively JavaScript does not support these classes. However, you can use xregexp, e.g.

    const XRegExp = require('xregexp');
    
    const isInputRealHumanName = (input: string): boolean => {
      return XRegExp('^[\\pL\\pM-]+ [\\pL\\pM-]+$', 'u').test(input);
    };
    
    

提交回复
热议问题