With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly
What is the best way to programatically determine the correct cha
Here are my favorites:
TikaEncodingDetector
Dependency:
org.apache.any23
apache-any23-encoding
1.1
Sample:
public static Charset guessCharset(InputStream is) throws IOException {
return Charset.forName(new TikaEncodingDetector().guessEncoding(is));
}
GuessEncoding
Dependency:
org.codehaus.guessencoding
guessencoding
1.4
jar
Sample:
public static Charset guessCharset2(File file) throws IOException {
return CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8);
}