In our application, we receive text files (.txt
, .csv
, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the
If you can link to a C library, you can use libenca
. See http://cihar.com/software/enca/. From the man page:
Enca reads given text files, or standard input when none are given, and uses knowledge about their language (must be supported by you) and a mixture of parsing, statistical analysis, guessing and black magic to determine their encodings.
It's GPL v2.