How can I tell if a string contains multibyte characters in Javascript?

后端 未结 1 1141
没有蜡笔的小新
没有蜡笔的小新 2020-12-04 17:18

Is it possible in Javascript to detect if a string contains multibyte characters? If so, is it possible to tell which ones?

The problem I\'m running into is this (ap

1条回答
  •  攒了一身酷
    2020-12-04 17:26

    JavaScript strings are UCS-2 encoded but can represent Unicode code points outside the Basic Multilingual Pane (U+0000 - U+D7FF and U+E000 - U+FFFF) using two 16 bit numbers (a UTF-16 surrogate pair), the first of which must be in the range U+D800 - U+DFFF.

    Based on this, it's easy to detect whether a string contains any characters that lie outside the Basic Multilingual Plane (which is what I think you're asking: you want to be able to identify whether a string contains any characters that lie outside the range of code points that JavaScript represents as a single character):

    function containsSurrogatePair(str) {
        return /[\uD800-\uDFFF]/.test(str);
    }
    
    alert( containsSurrogatePair("foo") ); // false
    alert( containsSurrogatePair("f

    0 讨论(0)
提交回复
热议问题