I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pai
Here's a couple things:
Character.isSurrogate(char c):
A char value is a surrogate code unit if and only if it is either a low-surrogate code unit or a high-surrogate code unit.
Checking for pairs seems pointless, why not just remove all surrogates?
x == false is equivalent to !x
StringBuilder is better in cases where you don't need synchronization (like a variable that never leaves local scope).
I suggest this:
public static String removeSurrogates(String query) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < query.length(); i++) {
char c = query.charAt(i);
// !isSurrogate(c) in Java 7
if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
sb.append(firstChar);
}
}
return sb.toString();
}
if statementYou asked about this statement:
if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
sb.append(firstChar);
}
One way to understand it is to break each operation into its own function, so you can see that the combination does what you'd expect:
static boolean isSurrogate(char c) {
return Character.isHighSurrogate(c) || Character.isLowSurrogate(c);
}
static boolean isNotSurrogate(char c) {
return !isSurrogate(c);
}
...
if (isNotSurrogate(c)) {
sb.append(firstChar);
}