Isn’t on big endian machines UTF-8's byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

后端 未结 2 852
野性不改
野性不改 2020-12-03 00:51

UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order.

2条回答
  •  无人及你
    2020-12-03 01:30

    The byte order is different on big endian vs little endian machines for words/integers larger than a byte.

    e.g. on a big-endian machine a short integer of 2 bytes stores the 8 most significant bits in the first byte, the 8 least significant bits in the second byte. On a little-endian machine the 8 most significant bits will the second byte, the 8 least significant bits in the first byte.

    So, if you write the memory content of such a short int directly to a file/network, the byte ordering within the short int will be different depending on the endianness.

    UTF-8 is byte oriented, so there's not an issue regarding endianness. the first byte is always the first byte, the second byte is always the second byte etc. regardless of endianness.

提交回复
热议问题