JavaScript Unicode normalization

前端 未结 4 2046
执念已碎
执念已碎 2020-12-16 14:33

I\'m under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can

相关标签:
4条回答
  • 2020-12-16 14:46

    ECMAScript 6 introduces String.prototype.normalize() which takes care of Unicode normalization for you.

    unorm is a JavaScript polyfill for this method, so that you can already use String.prototype.normalize() today even though not a single engine supports it natively at the moment.

    For more information on how and when to use Unicode normalization in JavaScript, see JavaScript has a Unicode problem – Accounting for lookalikes.

    0 讨论(0)
  • 2020-12-16 14:52

    If you're using node.js, there is a unorm library for this.

    https://github.com/walling/unorm

    0 讨论(0)
  • 2020-12-16 14:53

    No, there is no Unicode Normalization feature used automatically on—or even available to—JavaScript as per ECMAScript 5. All characters remain unchanged as their original code points, potentially in a non-Normal Form.

    eg try:

    <script type="text/javascript">
        var a= 'café';          // caf\u00E9
        var b= 'café';          // cafe\u0301
        alert(a+' '+a.length);  // café 4
        alert(b+' '+b.length);  // café 5
        alert(a==b);            // false
    </script>
    

    Update: ECMAScript 6 will introduce Unicode normalization for JavaScript strings.

    0 讨论(0)
  • 2020-12-16 14:54

    I've updated @bobince 's answer:

    var cafe4= 'caf\u00E9';
    var cafe5= 'cafe\u0301';
    
    
    console.log (
      cafe4+' '+cafe4.length,                  // café 4
      cafe5+' '+cafe5.length,                  // café 5
      cafe4 === cafe5,                         // false
      cafe4.normalize() === cafe5.normalize()  // true
    );
    
    0 讨论(0)
提交回复
热议问题