问题
Currently, I have a problem with receiving string input from user mobile. The string was inputed and contained utf8mb4 characters (smiley, emoji, etc). This caused error in my backend (mysql) since it only accepts utf-8 input.
Now, how can I just replace all utf-8mb4 input to utf-8?
def utf8mb4string = '👳👳👳👳👳👳👳';
// parse the utf8mb4string to utf8
// logic here
//possible utf8 result maybe: '�������'
I have also found similar question here How would I convert UTF-8mb4 to UTF-8? but no clear answer yet especially implementation in Groovy.
回答1:
You can't store characters (like "man with turban") from outside the basic multi-lingual plane (BMP) with MySQL's poorly-named "utf8" encoding. You need to specify "utf8mb4" instead.
If you don't care to store those characters, and want to replace or discard them, you'd have to iterate over the string, and build a new string (in Java):
IntStream converted = utf8mb4string.codePoints().map(cp -> Character.isBmpCodePoint(cp) ? cp : '\uFFFD');
String str = converted.collect(StringBuilder::new, (buf, ch) -> buf.append((char) ch), StringBuilder::append).toString();
Or, in Groovy syntax:
def transform = { String it ->
char ch = it.charAt(0)
if (Character.isHighSurrogate(ch))
return '\uFFFD'
else if (Character.isLowSurrogate(ch))
return ''
else
return it;
}
utf8mb4string.collect(transform).join()
来源:https://stackoverflow.com/questions/29635294/how-to-change-utf-8mb4-to-utf-8-in-groovy