问题
What does it mean when I save a text file as "Unicode" in notepad? is it Utf-8, Utf-16 or Utf-32? Thanks in advance.
回答1:
In Notepad, as in Windows software in general, “Unicode” as an encoding name means UTF-16 Little Endian (UTF-16LE). (I first thought it’s not real UTF-16, because Notepad++ recognizes it as UCS-2 and shows the content as garbage, but re-checking with BabelPad, I concluded that Notepad can encode even non-BMP characters correctly.)
Similarly, “Unicode big endian” means UTF-16 Big Endian. And “ANSI” means the system’s native legacy encoding, e.g. the 8-bit windows-1252 encoding in Western versions of Windows.
回答2:
All of these formats are "Unicode". But usually editors on Mac and Windows mean UTF-8 with that because it is ASCII compatible below code 128 IIRC. UTF-8 can represent more codes than just 256 (which fits in a single byte of 8 bits) by using a special character which means that the following byte also belongs to the same character.
If you look at the output in terminal, say with vi, and if you see a space between every two characters then you are looking at UTF-16 because there every two bytes make up one character. What you should see is that the characters don't have spaces between them, that's an indication for UTF-8.
来源:https://stackoverflow.com/questions/13894898/unicode-file-in-notepad