How can I detect the encoding/codepage of a text file

后端 未结 20 1686
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  萌比男神i
    2020-11-21 23:16

    Thanks @Erik Aronesty for mentioning uchardet.

    Meanwhile the (same?) tool exists for linux: chardet.
    Or, on cygwin you may want to use: chardetect.

    See: chardet man page: https://www.commandlinux.com/man-page/man1/chardetect.1.html

    This will heuristically detect (guess) the character encoding for each given file and will report the name and confidence level for each file's detected character encoding.

提交回复
热议问题