In our application, we receive text files (.txt
, .csv
, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the
Thanks @Erik Aronesty for mentioning uchardet
.
Meanwhile the (same?) tool exists for linux: chardet
.
Or, on cygwin you may want to use: chardetect
.
See: chardet man page: https://www.commandlinux.com/man-page/man1/chardetect.1.html
This will heuristically detect (guess) the character encoding for each given file and will report the name and confidence level for each file's detected character encoding.