Java DOM transforming and parsing arbitrary strings with invalid XML characters?

前端 未结 3 599
南笙
南笙 2021-01-19 06:02

First of all I want to mention that this is not a duplicate of How to parse invalid (bad / not well-formed) XML? because I don\'t have a given invalid (or not well-formed) X

3条回答
  •  遇见更好的自我
    2021-01-19 06:44

    One technique is to encode the whole string as Base64-encoded-UTF8.

    But if the "special" characters are rare, that's a significant sacrifice in readability and file size.

    Another technique is to represent special characters as processing instructions, for example for codepoint 0.

    Another would be to use backslash escaping, for example \u0000 for codepoint 0, and of course \ for backslash itself. This has the advantage that you can probably find existing library routines that do this for you (for example JSON conversion libraries). I can't imagine why your requirements say you can't use such libraries; but if you really can't, then it's not hard to write the code yourself.

提交回复
热议问题