Are 6 octet UTF-8 sequences valid?

后端 未结 3 725
不思量自难忘°
不思量自难忘° 2021-01-05 00:46

Can UTF-8 encode 5 or 6 byte sequences, allowing all Unicode characters to be encoded? I\'m getting conflicting standards. I need to be able to support every Unico

3条回答
  •  时光取名叫无心
    2021-01-05 01:06

    Both UTF-8 and UTF-16 allow all Unicode characters to be encoded. What UTF-8 is not allowed to do is to encode upper and lower surrogate halves (which UTF-16 uses) or values above U+10FFFF, which aren't legal Unicode.

    Note that the BMP ends at U+FFFF.

提交回复
热议问题