How to change utf-8mb4 to UTF-8 in groovy?

ε祈祈猫儿з 提交于 2020-01-17 12:26:04

问题


Currently, I have a problem with receiving string input from user mobile. The string was inputed and contained utf8mb4 characters (smiley, emoji, etc). This caused error in my backend (mysql) since it only accepts utf-8 input.

Now, how can I just replace all utf-8mb4 input to utf-8?

def utf8mb4string = '👳👳👳👳👳👳👳';
// parse the utf8mb4string to utf8
// logic here
//possible utf8 result maybe: '�������' 

I have also found similar question here How would I convert UTF-8mb4 to UTF-8? but no clear answer yet especially implementation in Groovy.


回答1:


You can't store characters (like "man with turban") from outside the basic multi-lingual plane (BMP) with MySQL's poorly-named "utf8" encoding. You need to specify "utf8mb4" instead.

If you don't care to store those characters, and want to replace or discard them, you'd have to iterate over the string, and build a new string (in Java):

IntStream converted = utf8mb4string.codePoints().map(cp -> Character.isBmpCodePoint(cp) ? cp : '\uFFFD');
String str = converted.collect(StringBuilder::new, (buf, ch) -> buf.append((char) ch), StringBuilder::append).toString();

Or, in Groovy syntax:

def transform = { String it ->
  char ch = it.charAt(0)
  if (Character.isHighSurrogate(ch))
    return '\uFFFD'
  else if (Character.isLowSurrogate(ch))
    return ''
  else
    return it;
}
utf8mb4string.collect(transform).join()


来源:https://stackoverflow.com/questions/29635294/how-to-change-utf-8mb4-to-utf-8-in-groovy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!