Convert Unicode to ASCII without changing the string length (in Java)

后端 未结 5 1111
庸人自扰
庸人自扰 2020-12-01 16:58

What is the best way to convert a string from Unicode to ASCII without changing it\'s length (that is very important in my case)? Also the characters without any conversion

5条回答
  •  星月不相逢
    2020-12-01 17:38

    Caveat: I don't know Java. Just a bit about character sets.

    You are not stating which character set you are using exactly.

    But no matter which you use, it's impossible to convert a Unicode string to ASCII and retain the original length and character positions, simply because a Unicode character set will use multiple bytes for some characters (obviously).

    The only exception I know of would be a UTF-8 string that contains only ASCII characters: This string will already be identical in both UTF-8 and ASCII, because UTF-8 uses multibyte characters only when necessary. (I don't know about the other Unicode flavours, there may be other dynamic ones).

    The only workaround I can see is adding a space to any special character that was replaced by an ASCII one, but that will screw up the string (Göteborg in UTF8 would have to become Go teborg to keep the length).

    Maybe you want to elaborate on what you want to / need to achieve, so people here can suggest workarounds.

提交回复
热议问题