JavaScript Unicode normalization

前端未结

关注

 4  2046

I\'m under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can

相关标签:

4条回答

离开以前

2020-12-16 14:46

ECMAScript 6 introduces String.prototype.normalize() which takes care of Unicode normalization for you.

unorm is a JavaScript polyfill for this method, so that you can already use String.prototype.normalize() today even though not a single engine supports it natively at the moment.

For more information on how and when to use Unicode normalization in JavaScript, see JavaScript has a Unicode problem – Accounting for lookalikes.

0 讨论(0)
发布评论:

提交评论
- 加载中...
执笔经年

2020-12-16 14:52

If you're using node.js, there is a unorm library for this.

https://github.com/walling/unorm

0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-12-16 14:53
No, there is no Unicode Normalization feature used automatically on—or even available to—JavaScript as per ECMAScript 5. All characters remain unchanged as their original code points, potentially in a non-Normal Form.

eg try:
```
<script type="text/javascript">
    var a= 'café';          // caf\u00E9
    var b= 'café';          // cafe\u0301
    alert(a+' '+a.length);  // café 4
    alert(b+' '+b.length);  // café 5
    alert(a==b);            // false
</script>
```
Update: ECMAScript 6 will introduce Unicode normalization for JavaScript strings.
0 讨论(0)
发布评论:

提交评论
- 加载中...

孤街浪徒

2020-12-16 14:54

I've updated @bobince 's answer:

var cafe4= 'caf\u00E9';
var cafe5= 'cafe\u0301';


console.log (
  cafe4+' '+cafe4.length,                  // café 4
  cafe5+' '+cafe5.length,                  // café 5
  cafe4 === cafe5,                         // false
  cafe4.normalize() === cafe5.normalize()  // true
);

0 讨论(0)