How to remove surrogate characters in Java?

前端未结

关注

 5  760

时光说笑 2020-12-14 04:16

I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pai

5条回答

南方客 (楼主)

2020-12-14 04:27
Java strings are stored as sequences of 16-bit chars, but what they represent is sequences of unicode characters. In unicode terminology, they are stored as code units, but model code points. Thus, it's somewhat meaningless to talk about removing surrogates, which don't exist in the character / code point representation (unless you have rogue single surrogates, in which case you have other problems).

Rather, what you want to do is to remove any characters which will require surrogates when encoded. That means any character which lies beyond the basic multilingual plane. You can do that with a simple regular expression:
```
return query.replaceAll("[^\u0000-\uffff]", "");
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...