发表新帖

发表新帖

remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java

后端未结

关注

 6  1564

再見小時候 2020-12-13 14:42

I have to handle this scenario in Java:

I\'m getting a request in XML form from a client with declared encoding=utf-8. Unfortunately it may contain not utf-8 charact

6条回答

隐瞒了意图╮ (楼主)

2020-12-13 15:33
UTF-8 is an encoding; Unicode is a character set. But the GBP symbol is most definitely in the Unicode character set and therefore most certainly representable in UTF-8.

If you do in fact mean UTF-8, and you are actually trying to remove byte sequences that are not the valid encoding of a character in UTF-8, then...
```
CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);
utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
ByteBuffer bytes = ...;
CharBuffer parsed = utf8Decoder.decode(bytes);
...
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题