How can I detect the encoding/codepage of a text file

后端 未结 20 1693
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  半阙折子戏
    2020-11-21 23:21

    If you can link to a C library, you can use libenca. See http://cihar.com/software/enca/. From the man page:

    Enca reads given text files, or standard input when none are given, and uses knowledge about their language (must be supported by you) and a mixture of parsing, statistical analysis, guessing and black magic to determine their encodings.

    It's GPL v2.

提交回复
热议问题