How to check the charset of string in Java?

后端 未结 5 1737
梦谈多话
梦谈多话 2020-12-07 00:29

In my application I\'m getting the user info from LDAP and sometimes the full username comes in a wrong charset. For example:

ТеÑÑ61 ТеÑÑовиÑ61
         


        
5条回答
  •  忘掉有多难
    2020-12-07 00:59

    Strings in java, AFAIK, do not retain their original encoding - they are always stored internally in some Unicode form. You want to detect the charset of the original stream/bytes - this is why I think your String.toBytes() call is too late.

    Ideally if you could get the input stream you are reading from, you can run it through something like this: http://code.google.com/p/juniversalchardet/

    There are plenty of other charset detectors out there as well

提交回复
热议问题