Remove all hexadecimal characters before loading string into XML Document Object?

Deadly 提交于 2020-02-26 11:57:08

问题


I have an xml string that is being posted to an ashx handler on the server. The xml string is built on the client-side and is based on a few different entries made on a form. Occasionally some users will copy and paste from other sources into the web form. When I try to load the xml string into an XMLDocument object using xmldoc.LoadXml(xmlStr) I get the following exception:

System.Xml.XmlException = {"'', hexadecimal value 0x0B, is an invalid character. Line 2, position 1."}

In debug mode I can see the rogue character (sorry I'm not sure of it's official title?):

My questions is how can I sanitise the xml string before I attempt to load it into the XMLDocument object? Do I need a custom function to parse out all these sorts of characters one-by-one or can I use some native .NET4 class to remove them?


回答1:


Here you have an example to clean xml invalid characters using Regex:

 xmlString = CleanInvalidXmlChars(xmlString);
 XmlDocument xmlDoc = new XmlDocument();
 xmlDoc.LoadXml(xmlString);

 public static string CleanInvalidXmlChars(string text)   
 {   
   string re = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";   
   return Regex.Replace(text, re, "");   
 }   



回答2:


A more efficient way to not error out on invalid XML characters would be to use the CheckCharacters flag in XmlReaderSettings.

var xmlDoc = new XmlDocument();
var xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (var stringReader = new StringReader(xml)) {
    using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings)) {
        xmlDoc.Load(xmlReader);
    }
}


来源:https://stackoverflow.com/questions/19399075/remove-all-hexadecimal-characters-before-loading-string-into-xml-document-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!