easiest way to remove invalid characters from a xml file?

ぐ巨炮叔叔 提交于 2019-12-24 08:35:16

问题


I have a xml file with invalid characters. I searched through internet and haven't found any other way than reading the file as a text file and replace invalid characters one by one.

Can somebody please tell me an easiest way to remove invalid characters from a xml file..

ex xml stream:

<Year>where 12 > 13 occures </Year>

回答1:


I would try HtmlAgilityPack. At least better than trying to parse manually.

HtmlAgilityPack.HtmlDocument hdoc = new HtmlAgilityPack.HtmlDocument();
hdoc.LoadHtml("<Year>where 12 > 13 occures </Year>");

using(StringWriter wr = new StringWriter())
{
   using (XmlWriter xmlWriter = XmlWriter.Create(wr,
           new XmlWriterSettings() { OmitXmlDeclaration = true }))
   {
       hdoc.Save(xmlWriter);
       Console.WriteLine(wr.ToString());
   }
}

this outputs:

<year>where 12 &gt; 13 occures </year>



回答2:


Start by thinking of the question differently. Your problem is that the input isn't valid XML. So you actually want to remove invalid characters from a non-XML file. That might sound pedantic, but it immediately indicates that tools designed for processing XML will be no use to you, because your input is not XML.

Fixing the problem at source is always better than trying to repair the damage later. But it you are going to embark on a repair strategy, the first thing is to define precisely what faults in the data you want to repair and how you intend to repair them. It's also a good idea to say clearly what constraints you apply to the solution: for example, does it matter if your repair accidentally changes the contents of any comments or CDATA sections?

Once you have defined your repair strategy: e.g. "replace any & by &amp; if it is not immediately followed by either #nn; or #xnn; or a name followed by ';', coding it up becomes quite straightforward.



来源:https://stackoverflow.com/questions/9681084/easiest-way-to-remove-invalid-characters-from-a-xml-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!