Is there a way to get rid of accents and convert a whole string to regular letters?

前端 未结 12 2173
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-22 04:58

Is there a better way for getting rid of accents and making those letters regular apart from using String.replaceAll() method and replacing letters one by one?

12条回答
  •  轮回少年
    2020-11-22 05:27

    I think the best solution is converting each char to HEX and replace it with another HEX. It's because there are 2 Unicode typing:

    Composite Unicode
    Precomposed Unicode
    

    For example "Ồ" written by Composite Unicode is different from "Ồ" written by Precomposed Unicode. You can copy my sample chars and convert them to see the difference.

    In Composite Unicode, "Ồ" is combined from 2 char: Ô (U+00d4) and ̀ (U+0300)
    In Precomposed Unicode, "Ồ" is single char (U+1ED2)
    

    I have developed this feature for some banks to convert the info before sending it to core-bank (usually don't support Unicode) and faced this issue when the end-users use multiple Unicode typing to input the data. So I think, converting to HEX and replace it is the most reliable way.

提交回复
热议问题