Isn’t on big endian machines UTF-8's byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

后端 未结 2 850
野性不改
野性不改 2020-12-03 00:51

UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order.

2条回答
  •  不思量自难忘°
    2020-12-03 01:28

    To answer c): UTF-16 and UTF-32 represent characters as 16-bit or 32-bit words, so they are not byte-oriented.

    For UTF-8, the smallest unit is a byte, thus it is byte-oriented. The alogrithm reads or writes one byte at a time. A byte is represented the same way on all machines.

    For UTF-16, the smallest unit is a 16-bit word, and for UTF-32, the smallest unit is a 32-bit word. The algorithm reads or writes one word at a time (2 bytes, or 4 bytes). The order of the bytes in each word is different on big-endian and little-endian machines.

提交回复
热议问题