I need to dynamically construct an XPath query for an element attribute, where the attribute value is provided by the user. I\'m unsure how to go about cleaning or sanitizi
I'd create a single-element XML document using a DOM, use the DOM to set the element's text to the provided value, and then grab the text out of the DOM's string representation of the XML. This will guarantee that all of the character escaping is done properly, and not just the character escaping that I'm happening to think about offhand.
Edit: The reason I would use the DOM in situations like this is that the people who wrote the DOM have read the XML recommendation and I haven't (at least, not with the level of care they have). To pick a trivial example, the DOM will report a parse error if the text contains a character that XML doesn't allow (like #x8), because the DOM's authors have implemented section 2.2 of the XML recommendation.
Now, I might say, "well, I'll just get the list of invalid characters from the XML recommendation, and strip them out of the input." Sure. Let's just look the XML recommendation and...um, what the heck are the Unicode surrogate blocks? What kind of code do I have to write to get rid of them? Can they even get into my text in the first place?
Let's suppose I figure that out. Are there other aspects of how the XML recommendation specifies character representations that I don't know about? Probably. Will these have an impact on what I'm trying to implement? Maybe.
If I let the DOM do the character encoding for me, I don't have to worry about any of that stuff.