I\'m under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can
ECMAScript 6 introduces String.prototype.normalize()
which takes care of Unicode normalization for you.
unorm is a JavaScript polyfill for this method, so that you can already use String.prototype.normalize()
today even though not a single engine supports it natively at the moment.
For more information on how and when to use Unicode normalization in JavaScript, see JavaScript has a Unicode problem – Accounting for lookalikes.
If you're using node.js
, there is a unorm
library for this.
https://github.com/walling/unorm
No, there is no Unicode Normalization feature used automatically on—or even available to—JavaScript as per ECMAScript 5. All characters remain unchanged as their original code points, potentially in a non-Normal Form.
eg try:
<script type="text/javascript">
var a= 'café'; // caf\u00E9
var b= 'café'; // cafe\u0301
alert(a+' '+a.length); // café 4
alert(b+' '+b.length); // café 5
alert(a==b); // false
</script>
Update: ECMAScript 6 will introduce Unicode normalization for JavaScript strings.
I've updated @bobince 's answer:
var cafe4= 'caf\u00E9';
var cafe5= 'cafe\u0301';
console.log (
cafe4+' '+cafe4.length, // café 4
cafe5+' '+cafe5.length, // café 5
cafe4 === cafe5, // false
cafe4.normalize() === cafe5.normalize() // true
);