问题
I have a program which runs tests and generates a grid-view with all the results in it, and also an XML log file. The program also has the functionality to load logs to replicate the grid-view.
Since the program writes to the log file as its executing, if it crashes the log file will be missing closing tags. I still want to be able to load these XML files though as there is still lots of valuable data that can help me find out what caused the crash.
I was thinking maybe going through the XML file and closing off any unclosed XML tag, or maybe writing some kind of "Dirty" XML reader that would pretend every tag was closed. Any ideas on what I could do or how I should proceed?
Edit:
<Root>
  <Parent>
     <Child Name="One">
        <Foo>...</Foo>
        <Bar>...</Bar>
        <Baz>...</Baz>
     </Child>
     <Child Name="Two">
        <Foo>...</Foo>
        <Bar>...</Bar>
 !-- Crash happens here --!
From this I would still look to produce
 Child   Foo   Bar   Baz
 One     ...   ...   ...
 Two     ...   ...    /
    回答1:
Presumably it's all valid until it's truncated... so using XmlReader could work... just be prepared to handle it going bang when it reaches the truncation point.
Now the XmlReader API isn't terribly pleasant (IMO) so you might want to move to the start of some interesting data (which would have to be complete in itself) and then call the XNode.ReadFrom(XmlReader) method to get that data in a simple-to-use form. Then move to the start of the next element and do the same, etc.
Sample code:
using System;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
class Program
{
    static void Main(string[] args)
    {
        using (XmlReader reader = XmlReader.Create("test.xml"))
        {
            while (true)
            {
                while (reader.NodeType != XmlNodeType.Element ||
                    reader.LocalName != "Child")
                {
                    if (!reader.Read())
                    {
                        Console.WriteLine("Finished!");
                    }
                }
                XElement element = (XElement) XNode.ReadFrom(reader);
                Console.WriteLine("Got child: {0}", element.Value);
            }
        }
    }
}
Sample XML:
<Root>
  <Parent>
    <Child>First child</Child>
    <Child>Second child</Child>
    <Child>Broken
Sample output:
Got child: First child Got child: Second child
Unhandled Exception: System.Xml.XmlException: Unexpected end of file has occurred
The following elements are not closed: Child, Parent, Root. Line 5, position 18.
   at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XElement.ReadElementFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XNode.ReadFrom(XmlReader reader)
   at Program.Main(String[] args)
So obviously you'd want to catch the exception, but you can see that it managed to read the first two elements correctly.
回答2:
As a last resort and depending on what you're doing, you could use an HTML reader like HtmlAgilityPack(Nuget page) or SGMLReader. SGMLReader will actually convert it to an XmlDocument, so that might be more what you're looking for.
Of course, HTML isn't XML so you get what you get when using this method.
回答3:
There is no such thing in the Framework taht does this by default, neither is there a good solution available that will somehow parse generic invalid xml.
The most sensable thing yu can do is fixing the XML before starting to read it. Since only the end is cut off you should be able to figure out all open tags and close them.
来源:https://stackoverflow.com/questions/9703852/reading-xml-with-unclosed-tags-in-c-sharp