Java Text File Encoding

后端 未结 4 1895
逝去的感伤
逝去的感伤 2020-12-17 18:54

I have a text file and it can be ANSI (with ISO-8859-2 charset), UTF-8, UCS-2 Big or Little Endian.

Is there any way to detect the encoding of the file to read it pr

4条回答
  •  清酒与你
    2020-12-17 19:13

    Yes, there's a number of methods to do character encoding detection, specifically in Java. Take a look at jchardet which is based on the Mozilla algorithm. There's also cpdetector and a project by IBM called ICU4j. I'd take a look at the latter, as it seems to be more reliable than the other two. They work based on statistical analysis of the binary file, ICU4j will also provide a confidence level of the character encoding it detects so you can use this in the case above. It works pretty well.

提交回复
热议问题