Java : How to determine the correct charset encoding of a stream

前端 未结 15 1899
花落未央
花落未央 2020-11-22 02:06

With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly

What is the best way to programatically determine the correct cha

15条回答
  •  余生分开走
    2020-11-22 02:35

    In plain Java:

    final String[] encodings = { "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16" };
    
    List lines;
    
    for (String encoding : encodings) {
        try {
            lines = Files.readAllLines(path, Charset.forName(encoding));
            for (String line : lines) {
                // do something...
            }
            break;
        } catch (IOException ioe) {
            System.out.println(encoding + " failed, trying next.");
        }
    }
    

    This approach will try the encodings one by one until one works or we run out of them. (BTW my encodings list has only those items because they are the charsets implementations required on every Java platform, https://docs.oracle.com/javase/9/docs/api/java/nio/charset/Charset.html)

提交回复
热议问题