Is it possible in Javascript to detect if a string contains multibyte characters? If so, is it possible to tell which ones?
The problem I\'m running into is this (ap
JavaScript strings are UCS-2 encoded but can represent Unicode code points outside the Basic Multilingual Pane (U+0000
- U+D7FF
and U+E000
- U+FFFF
) using two 16 bit numbers (a UTF-16 surrogate pair), the first of which must be in the range U+D800
- U+DFFF
.
Based on this, it's easy to detect whether a string contains any characters that lie outside the Basic Multilingual Plane (which is what I think you're asking: you want to be able to identify whether a string contains any characters that lie outside the range of code points that JavaScript represents as a single character):
function containsSurrogatePair(str) {
return /[\uD800-\uDFFF]/.test(str);
}
alert( containsSurrogatePair("foo") ); // false
alert( containsSurrogatePair("f