remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java

后端 未结 6 1564
再見小時候
再見小時候 2020-12-13 14:42

I have to handle this scenario in Java:

I\'m getting a request in XML form from a client with declared encoding=utf-8. Unfortunately it may contain not utf-8 charact

6条回答
  •  隐瞒了意图╮
    2020-12-13 15:33

    UTF-8 is an encoding; Unicode is a character set. But the GBP symbol is most definitely in the Unicode character set and therefore most certainly representable in UTF-8.

    If you do in fact mean UTF-8, and you are actually trying to remove byte sequences that are not the valid encoding of a character in UTF-8, then...

    CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
    utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);
    utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
    ByteBuffer bytes = ...;
    CharBuffer parsed = utf8Decoder.decode(bytes);
    ...
    

提交回复
热议问题