What's the difference between UTF-8 and UTF-8 without BOM?

前端 未结 21 1805
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-21 05:45

What\'s different between UTF-8 and UTF-8 without a BOM? Which is better?

21条回答
  •  后悔当初
    2020-11-21 06:17

    From http://en.wikipedia.org/wiki/Byte-order_mark:

    The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.

    Always using a BOM in your file will ensure that it always opens correctly in an editor which supports UTF-8 and BOM.

    My real problem with the absence of BOM is the following. Suppose we've got a file which contains:

    abc
    

    Without BOM this opens as ANSI in most editors. So another user of this file opens it and appends some native characters, for example:

    abg-αβγ
    

    Oops... Now the file is still in ANSI and guess what, "αβγ" does not occupy 6 bytes, but 3. This is not UTF-8 and this causes other problems later on in the development chain.

提交回复
热议问题