XDocument : is it possible to force the load of a malformed XML file?

不羁的心 提交于 2019-12-19 20:00:20

问题


I have a malformed XML file. The root tag is not closed by a tag. The final tag is missing.

When I try to load my malformed XML file in C#

StreamReader sr = new StreamReader(path);
batchFile = XDocument.Load(sr); // Exception

I get an exception "Unexpected end of file has occurred. The following elements are not closed: batch. Line 54, position 1."

Is it possible to ignore the close tag or to force the loading? I noticed that all my XML tools ((like XML notepad) ) automaticly fix or ignore the problem. I can not fix the XML file. This one copme from a third party software and sometimes the file is correct.


回答1:


You cant do it with XDocument because this class loads all document in memory and parse it completly.
But its possible to process document with XmlReader it would get you to read and process complete document and at the end youll get missing tag exeption.




回答2:


I suggest using Tidy.NET to cleanup messy input

Tidy.NET has a nice API to get a list of problems (MessageCollection) in your 'XML' and you can use it to fix the text stream in memory. The simplest thing would be to fix one error at a time, thought that will not perform too well with many errors. Otherwise, you might fix errors in reverse document order so that the offsets of messages stay valid while doing the fixes

Here is an example to convert HTML input into XHTML:

Tidy tidy = new Tidy();

/* Set the options you want */
tidy.Options.DocType = DocType.Strict;
tidy.Options.DropFontTags = true;
tidy.Options.LogicalEmphasis = true;
tidy.Options.Xhtml = true;
tidy.Options.XmlOut = true;
tidy.Options.MakeClean = true;
tidy.Options.TidyMark = false;

/* Declare the parameters that is needed */
TidyMessageCollection tmc = new TidyMessageCollection();
MemoryStream input = new MemoryStream();
MemoryStream output = new MemoryStream();

byte[] byteArray = Encoding.UTF8.GetBytes("Put your HTML here...");
input.Write(byteArray, 0 , byteArray.Length);
input.Position = 0;
tidy.Parse(input, output, tmc);

string result = Encoding.UTF8.GetString(output.ToArray());



回答3:


What you could do is add the closing tag to the xml in memory and then load it.

So after loading the xml into the streamreader, manipulate the data before you do the xml load



来源:https://stackoverflow.com/questions/5700618/xdocument-is-it-possible-to-force-the-load-of-a-malformed-xml-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!