How can I detect the encoding/codepage of a text file

后端 未结 20 1694
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  不要未来只要你来
    2020-11-21 23:13

    You can't detect the codepage, you need to be told it. You can analyse the bytes and guess it, but that can give some bizarre (sometimes amusing) results. I can't find it now, but I'm sure Notepad can be tricked into displaying English text in Chinese.

    Anyway, this is what you need to read: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

    Specifically Joel says:

    The Single Most Important Fact About Encodings

    If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that "plain" text is ASCII. There Ain't No Such Thing As Plain Text.

    If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

提交回复
热议问题