I have searched stackoverflow on this problem and did find a few topics, but I feel like there isn\'t really a solid answer for me on this.
I have a form that users
You could HTML-parse the text and have it re-escaped with the respective numeric entities only (like:
→
). In any case — simply using un-sanitized user input is a bad idea.
All of the numeric entities are allowed in XML, only the named ones known from HTML do not work (with the exception of &
, "
, <
, >
, '
).
Most of the time though, you can just write the actual character (ö
→ ö
) to the XML file so there is no need to use an entity reference at all. If you are using a DOM API to manipulate your XML (and you should!) this is your safest bet.
Finally (this is the lazy developer solution) you could build a broken XML file (i.e. not well-formed, with entity errors) and just pass it through tidy for the necessary fix-ups. This may work or may fail depending on just how broken the whole thing is. In my experience, tidy is pretty smart, though, and lets you get away with a lot.