Splitting a JavaScript string into \"characters\" can be done trivially but there are problems if you care about Unicode (and you should care about Unicode).
JavaScr
In ECMAScript 6 you'll be able to use a string as an iterator to get code points, or you could search a string for /./ug
, or you could call getCodePointAt(i)
repeatedly.
Unfortunately for
..of
syntax and regexp flags can't be polyfilled and calling a polyfilled getCodePoint()
would be super slow (O(n²)), so we can't realistically use this approach for a while yet.
So doing it the manual way:
String.prototype.toCodePoints= function() {
chars = [];
for (var i= 0; i=0xD800 && c1<0xDC00 && i+1=0xDC00 && c2<0xE000) {
chars.push(0x10000 + ((c1-0xD800)<<10) + (c2-0xDC00));
i++;
continue;
}
}
chars.push(c1);
}
return chars;
}
For the inverse to this see https://stackoverflow.com/a/3759300/18936