Java : How to determine the correct charset encoding of a stream

前端 未结 15 1901
花落未央
花落未央 2020-11-22 02:06

With reference to the following thread: Java App : Unable to read iso-8859-1 encoded file correctly

What is the best way to programatically determine the correct cha

15条回答
  •  耶瑟儿~
    2020-11-22 02:12

    Here are my favorites:

    TikaEncodingDetector

    Dependency:

    
      org.apache.any23
      apache-any23-encoding
      1.1
    
    

    Sample:

    public static Charset guessCharset(InputStream is) throws IOException {
      return Charset.forName(new TikaEncodingDetector().guessEncoding(is));    
    }
    

    GuessEncoding

    Dependency:

    
      org.codehaus.guessencoding
      guessencoding
      1.4
      jar
    
    

    Sample:

      public static Charset guessCharset2(File file) throws IOException {
        return CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8);
      }
    

提交回复
热议问题