Why does XmlSerializer throws an Exception and raise a ValidationEvent when a schema validation error occurs inside IXmlSerializable.ReadXml()

核能气质少年 提交于 2021-02-17 04:59:46

问题


I have written some tests for reading an XML file and validating it against an XSD schema. My data objects are using a mix of attribute based and custom IXmlSerializable implementation and I am using the XmlSerializer to perform deserialization.

My test involves inserting an unknown element into the XML so that it does not conform to the schema. I then test if the validation event fires.

If the unknown element is placed in the XML so it's a child of one of the attribute based data classes (i.e. the properties are decorated with XmlAttribute and XmlElement attributes), then the validation fires correctly.

If however, the unknown element is placed in the XML so it's a child of one of the IXmlSerializable classes, then a System.InvalidOperationException is thrown, but the validation does still fire.

The code inside the custom collection's ReadXmlElements creates a new XmlSerializer to read in the child items, it is the Deserialize call where the InvalidOperationException is thrown.

If I place a try .. catch block around this call, it gets stuck in an endless loop. The only solution appears to be to put a try-catch block around the top-level XmlSerializer.Deserialize call (as shown in the test).

Does anyone know why the XmlSerializer is behaving in this way? Ideally I would like to try to catch the exception where it is thrown, rather than having a top-level exception handler, so there is a secondary question as to why the code gets stuck in an endless loop if a try..catch block is added into the collection class.

Here is the exception that is thrown:

System.InvalidOperationException: There is an error in XML document (13, 10). ---> System.InvalidOperationException: There is an error in XML document (13, 10). ---> System.InvalidOperationException: <UnknownElement xmlns='example'> was not expected.
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderGroup.Read1_Group()
   --- End of inner exception stack trace ---
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader)
   at XmlSerializerTest.EntityCollection~1.ReadXmlElements(XmlReader reader) in C:\source\repos\XmlSerializerTest\XmlSerializerTest\EntityCollection.cs:line 55
   at XmlSerializerTest.EntityCollection~1.ReadXml(XmlReader reader) in C:\Users\NGGMN9O\source\repos\XmlSerializerTest\XmlSerializerTest\EntityCollection.cs:line 41
   at System.Xml.Serialization.XmlSerializationReader.ReadSerializable(IXmlSerializable serializable, Boolean wrappedAny)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderExample.Read2_Example(Boolean isNullable, Boolean checkType)
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderExample.Read3_Example()
   --- End of inner exception stack trace ---
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader)
   at XmlSerializerTest.StackOverflowExample.InvalidElementInGroupTest() in C:\source\repos\XmlSerializerTest\XmlSerializerTest\XmlSerializerTest.cs:line 35

Schema.xsd

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema xmlns:local="example"
           attributeFormDefault="unqualified"
           elementFormDefault="qualified"
           targetNamespace="example"
           version="1.0"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!--  Attribute Groups -->
  <xs:attributeGroup name="Identifiers">
    <xs:attribute name="Id"
                  type="xs:string"
                  use="required" />
    <xs:attribute name="Name"
                  type="xs:string"
                  use="required" />
  </xs:attributeGroup>
  <!-- Complex Types -->
  <xs:complexType abstract="true"
                  name="Entity">
    <xs:sequence>
      <xs:element name="Description"
                  type="xs:string"
                  minOccurs="0"
                  maxOccurs="1" />
    </xs:sequence>
    <xs:attributeGroup ref="local:Identifiers" />
  </xs:complexType>
  <xs:complexType name="DerivedEntity">
    <xs:complexContent>
      <xs:extension base="local:Entity">
        <xs:attribute name="Parameter"
                      use="required" />
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="Groups">
      <xs:sequence>
          <xs:element name="Group" type="local:Group" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
  </xs:complexType>
  <xs:complexType name="Group">
    <xs:complexContent>
      <xs:extension base="local:Entity">
        <xs:sequence>
          <xs:element name="DerivedEntity"
                      type="local:DerivedEntity"
                      minOccurs="0"
                      maxOccurs="unbounded" />
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  <!-- Main Schema Definition -->
  <xs:element name="Example">
      <xs:complexType>
          <xs:sequence>
              <xs:element name="Groups"
                          type="local:Groups"
                          minOccurs="1"
                          maxOccurs="1" />
          </xs:sequence>
      </xs:complexType>
  </xs:element>
</xs:schema>

InvalidElementInGroup.xml

<?xml version="1.0"?>
<Example xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="example">
    <Groups>
        <Group Name="abc" Id="123">
            <DerivedEntity Id="123" Name="xyz" Parameter="ijk">
                <Description>def</Description>
            </DerivedEntity>
            <DerivedEntity Id="234" Name="bob" Parameter="12"/>
        </Group>
        <Group Name="def" Id="124">
            <Description>This is a description.</Description>
        </Group>
        <UnknownElement/>
    </Groups>
</Example>

The Implementation Note: The code shown in this example is not the production code. I know that I could just use a List<T> implementation which supports serialization without needing to implement IXmlSerializable.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Schema;
using System.Xml.Serialization;

namespace XmlSerializerTest
{
    public class Example
    {
        public Example()
        {
            Groups = new Groups();
        }

        public Groups Groups { get; set; }
    }

    public class Groups : EntityCollection<Group>
    {

    }
    public class Group : Entity, IXmlSerializable
    {
        private EntityCollection<DerivedEntity> entityCollection;

        public Group()
        {
            this.entityCollection = new EntityCollection<DerivedEntity>();
        }

        #region IXmlSerializable Implementation

        public XmlSchema GetSchema()
        {
            return null;
        }

        public void ReadXml(XmlReader reader)
        {
            reader.MoveToContent();

            // Read the attributes
            ReadXmlAttributes(reader);

            // Consume the start element
            bool isEmptyElement = reader.IsEmptyElement;
            reader.ReadStartElement();
            if (!isEmptyElement)
            {
                ReadXmlElements(reader);
                reader.ReadEndElement();
            }
        }

        /// <summary>
        /// Reads the XML elements.
        /// </summary>
        /// <param name="reader">The reader.</param>
        public override void ReadXmlElements(XmlReader reader)
        {
            // Handle the optional base class description element
            base.ReadXmlElements(reader);

            entityCollection.ReadXmlElements(reader);
        }

        public void WriteXml(XmlWriter writer)
        {
            throw new NotImplementedException();
        }

        #endregion
    }

    public class EntityCollection<T> : IXmlSerializable, IList<T> where T : Entity
    {
        private List<T> childEntityField;

        public EntityCollection()
        {
            childEntityField = new List<T>();
        }

        #region IXmlSerializable Implementation

        public XmlSchema GetSchema()
        {
            return null;
        }

        public void ReadXml(XmlReader reader)
        {
            reader.MoveToContent();

            // Read the attributes
            ReadXmlAttributes(reader);

            // Consume the start element
            bool isEmptyElement = reader.IsEmptyElement;
            reader.ReadStartElement();
            if (!isEmptyElement)
            {
                ReadXmlElements(reader);
                reader.ReadEndElement();
            }
        }

        public virtual void ReadXmlAttributes(XmlReader reader)
        {
        }

        public virtual void ReadXmlElements(XmlReader reader)
        {
            XmlSerializer deserializer = new XmlSerializer(typeof(T), "example");
            while (reader.IsStartElement())
            {
                T item = (T)deserializer.Deserialize(reader);  // throws an InvalidOperationException if an unknown element is encountered.
                if (item != null)
                {
                    Add(item);
                }
            }
        }

        public void WriteXml(XmlWriter writer)
        {
            throw new NotImplementedException();
        }
        #endregion

        #region IList Implementation

        public IEnumerator<T> GetEnumerator()
        {
            return childEntityField.GetEnumerator();
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return ((IEnumerable)childEntityField).GetEnumerator();
        }

        public void Add(T item)
        {
            childEntityField.Add(item);
        }

        public void Clear()
        {
            childEntityField.Clear();
        }

        public bool Contains(T item)
        {
            return childEntityField.Contains(item);
        }

        public void CopyTo(T[] array, int arrayIndex)
        {
            childEntityField.CopyTo(array, arrayIndex);
        }

        public bool Remove(T item)
        {
            return childEntityField.Remove(item);
        }

        public int Count => childEntityField.Count;

        public bool IsReadOnly => ((ICollection<T>)childEntityField).IsReadOnly;

        public int IndexOf(T item)
        {
            return childEntityField.IndexOf(item);
        }

        public void Insert(int index, T item)
        {
            childEntityField.Insert(index, item);
        }

        public void RemoveAt(int index)
        {
            childEntityField.RemoveAt(index);
        }

        public T this[int index]
        {
            get => childEntityField[index];
            set => childEntityField[index] = value;
        }

        #endregion
    }

    [System.Xml.Serialization.XmlIncludeAttribute(typeof(DerivedEntity))]
    public abstract class Entity
    {

        public string Description { get; set; }

        public string Id { get; set; }

        public string Name { get; set; }

        public virtual void ReadXmlAttributes(XmlReader reader)
        {
            Id = reader.GetAttribute("Id");
            Name = reader.GetAttribute("Name");
        }

        public virtual void ReadXmlElements(XmlReader reader)
        {
            if (reader.IsStartElement("Description"))
            {
                Description = reader.ReadElementContentAsString();
            }
        }
    }

    public class DerivedEntity : Entity
    {
        public string Parameter { get; set; }
    }
}

The Test

namespace XmlSerializerTest
{
    using System;
    using System.IO;
    using System.Xml;
    using System.Xml.Schema;
    using System.Xml.Serialization;
    using Microsoft.VisualStudio.TestTools.UnitTesting;

    [TestClass]
    public class StackOverflowExample
    {
        [TestMethod]
        [DeploymentItem(@"Schema.xsd")]
        [DeploymentItem(@"InvalidElementInGroup.xml")]
        public void InvalidElementInGroupTest()
        {
            // Open the file
            FileStream stream = new FileStream("InvalidElementInGroup.xml", FileMode.Open);

            // Configure settings
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.Schemas.Add(null, @"Schema.xsd");
            settings.ValidationType = ValidationType.Schema;
            settings.ValidationEventHandler += OnValidationEvent;

            XmlSerializer xmlDeserializer = new XmlSerializer(typeof(Example), "example");

            // Deserialize from the stream
            stream.Position = 0;
            XmlReader xmlReader = XmlReader.Create(stream, settings);

            try
            {
                Example deserializedObject = (Example)xmlDeserializer.Deserialize(xmlReader);
            }
            catch (Exception e)
            {
                Console.WriteLine("Exception: " + e);
            }
        }

        private void OnValidationEvent(object sender, ValidationEventArgs e)
        {
            Console.WriteLine("Validation Event: " + e.Message);
        }
    }
}

回答1:


Your basic problem is that you have an abstract base class Entity whose inheritors sometimes implement IXmlSerializable and sometimes don't, and when they do they are included in a collection that also implements IXmlSerializable and mingles collection properties with collection children within its XML. Somewhere in the process of reading this XML in you don't advance your XmlReader correctly and deserialization fails.

When implementing IXmlSerializable you need to adhere to the rules stated in this answer to Proper way to implement IXmlSerializable? by Marc Gravell as well as the documentation:

For IXmlSerializable.WriteXml(XmlWriter):

The WriteXml implementation you provide should write out the XML representation of the object. The framework writes a wrapper element and positions the XML writer after its start. Your implementation may write its contents, including child elements. The framework then closes the wrapper element.

For IXmlSerializable.ReadXml(XmlReader):

The ReadXml method must reconstitute your object using the information that was written by the WriteXml method.

When this method is called, the reader is positioned on the start tag that wraps the information for your type. That is, directly on the start tag that indicates the beginning of a serialized object. When this method returns, it must have read the entire element from beginning to end, including all of its contents. Unlike the WriteXml method, the framework does not handle the wrapper element automatically. Your implementation must do so. Failing to observe these positioning rules may cause code to generate unexpected runtime exceptions or corrupt data.

Notice specifically that ReadXml() must entirely consume the container element. This turns out to be problematic in inheritance scenarios; is the base class responsible for consuming the outer element or the derived class? Furthermore, if some derived class improperly positions the XmlReader during reading, this may pass unnoticed by unit tests but cause subsequent data in the XML file to be ignored or corrupted in production.

Thus it makes sense to create an extension framework for reading and writing IXmlSerializable objects whose base and derived classes all have custom (de)serialization logic, in which the processing of the container element, each attribute, and each child element is separated:

public static class XmlSerializationExtensions
{
    public static void ReadIXmlSerializable(XmlReader reader, Func<XmlReader, bool> handleXmlAttribute, Func<XmlReader, bool> handleXmlElement, Func<XmlReader, bool> handleXmlText)
    {
        //https://docs.microsoft.com/en-us/dotnet/api/system.xml.serialization.ixmlserializable.readxml?view=netframework-4.8#remarks
        //When this method is called, the reader is positioned on the start tag that wraps the information for your type. 
        //That is, directly on the start tag that indicates the beginning of a serialized object. 
        //When this method returns, it must have read the entire element from beginning to end, including all of its contents. 
        //Unlike the WriteXml method, the framework does not handle the wrapper element automatically. Your implementation must do so. 
        //Failing to observe these positioning rules may cause code to generate unexpected runtime exceptions or corrupt data.
        reader.MoveToContent();
        if (reader.NodeType != XmlNodeType.Element)
            throw new XmlException(string.Format("Invalid NodeType {0}", reader.NodeType));
        if (reader.HasAttributes)
        {
            for (int i = 0; i < reader.AttributeCount; i++)
            {
                reader.MoveToAttribute(i);
                handleXmlAttribute(reader);
            }
            reader.MoveToElement(); // Moves the reader back to the element node.
        }
        if (reader.IsEmptyElement)
        {
            reader.Read();
            return;
        }
        reader.ReadStartElement(); // Advance to the first sub element of the wrapper element.
        while (reader.NodeType != XmlNodeType.EndElement)
        {
            if (reader.NodeType == XmlNodeType.Element)
            {
                using (var subReader = reader.ReadSubtree())
                {
                    subReader.MoveToContent();
                    handleXmlElement(subReader);
                }
                // ReadSubtree() leaves the reader positioned ON the end of the element, so read that also.
                reader.Read();
            }
            else if (reader.NodeType == XmlNodeType.Text || reader.NodeType == XmlNodeType.CDATA)
            {
                var type = reader.NodeType;
                handleXmlText(reader);
                // Ensure that the reader was not advanced.
                if (reader.NodeType != type)
                    throw new XmlException(string.Format("handleXmlText incorrectly advanced the reader to a new node {0}", reader.NodeType));
                reader.Read();
            }
            else // Whitespace, comment
            {
                // Skip() leaves the reader positioned AFTER the end of the node.
                reader.Skip();
            }
        }
        // Move past the end of the wrapper element
        reader.ReadEndElement();
    }

    public static void WriteIXmlSerializable(XmlWriter writer, Action<XmlWriter> writeAttributes, Action<XmlWriter> writeNodes)
    {
        //https://docs.microsoft.com/en-us/dotnet/api/system.xml.serialization.ixmlserializable.writexml?view=netframework-4.8#remarks
        //The WriteXml implementation you provide should write out the XML representation of the object. 
        //The framework writes a wrapper element and positions the XML writer after its start. Your implementation may write its contents, including child elements. 
        //The framework then closes the wrapper element.
        writeAttributes(writer);
        writeNodes(writer);
    }
}

Then, modify your data model as follows:

public class Constants
{
    public const string ExampleNamespace = "example";
}

[XmlRoot(Namespace = Constants.ExampleNamespace)]
public class Example
{
    public Example()
    {
        Groups = new Groups();
    }

    public Groups Groups { get; set; }
}

public class Groups : EntityCollection<Group>
{

}

public class EntityCollection<T> : IXmlSerializable, IList<T> where T : Entity
{
    private List<T> childEntityField;

    public EntityCollection()
    {
        childEntityField = new List<T>();
    }

    #region IXmlSerializable Implementation

    public XmlSchema GetSchema() { return null; }

    protected internal virtual bool HandleXmlAttribute(XmlReader reader) { return false; }

    protected internal virtual void WriteAttributes(XmlWriter writer) { }

    protected internal virtual bool HandleXmlElement(XmlReader reader)
    {
        var serializer = new XmlSerializer(typeof(T), Constants.ExampleNamespace);
        if (serializer.CanDeserialize(reader))
        {
            T item = (T)serializer.Deserialize(reader);
            if (item != null)
                Add(item);
            return true;
        }
        return false;
    }

    protected internal virtual void WriteNodes(XmlWriter writer)
    {
        var serializer = new XmlSerializer(typeof(T), Constants.ExampleNamespace);
        foreach (var item in this)
        {
            serializer.Serialize(writer, item);
        }
    }

    public void ReadXml(XmlReader reader)
    {
        XmlSerializationExtensions.ReadIXmlSerializable(reader, r => HandleXmlAttribute(r), r => HandleXmlElement(r), r => false);
    }

    public void WriteXml(XmlWriter writer)
    {
        XmlSerializationExtensions.WriteIXmlSerializable(writer, w => WriteAttributes(w), w => WriteNodes(w));
    }

    #endregion

    #region IList Implementation

    public IEnumerator<T> GetEnumerator()
    {
        return childEntityField.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return ((IEnumerable)childEntityField).GetEnumerator();
    }

    public void Add(T item)
    {
        childEntityField.Add(item);
    }

    public void Clear()
    {
        childEntityField.Clear();
    }

    public bool Contains(T item)
    {
        return childEntityField.Contains(item);
    }

    public void CopyTo(T[] array, int arrayIndex)
    {
        childEntityField.CopyTo(array, arrayIndex);
    }

    public bool Remove(T item)
    {
        return childEntityField.Remove(item);
    }

    public int Count { get { return childEntityField.Count; } }

    public bool IsReadOnly { get { return ((ICollection<T>)childEntityField).IsReadOnly; } }

    public int IndexOf(T item)
    {
        return childEntityField.IndexOf(item);
    }

    public void Insert(int index, T item)
    {
        childEntityField.Insert(index, item);
    }

    public void RemoveAt(int index)
    {
        childEntityField.RemoveAt(index);
    }

    public T this[int index]
    {
        get { return childEntityField[index]; }
        set { childEntityField[index] = value; }
    }

    #endregion
}

public class Group : Entity, IXmlSerializable
{
    private EntityCollection<DerivedEntity> entityCollection;

    public Group()
    {
        this.entityCollection = new EntityCollection<DerivedEntity>();
    }

    #region IXmlSerializable Implementation

    public XmlSchema GetSchema()
    {
        return null;
    }

    protected override bool HandleXmlElement(XmlReader reader)
    {
        if (base.HandleXmlElement(reader))
            return true;
        return entityCollection.HandleXmlElement(reader);
    }

    protected override void WriteNodes(XmlWriter writer)
    {
        base.WriteNodes(writer);
        entityCollection.WriteNodes(writer);
    }

    protected override bool HandleXmlAttribute(XmlReader reader)
    {
        if (base.HandleXmlAttribute(reader))
            return true;
        if (entityCollection.HandleXmlAttribute(reader))
            return true;
        return false;
    }

    protected override void WriteAttributes(XmlWriter writer)
    {
        base.WriteAttributes(writer);
        entityCollection.WriteAttributes(writer);
    }

    public void ReadXml(XmlReader reader)
    {
        XmlSerializationExtensions.ReadIXmlSerializable(reader, r => HandleXmlAttribute(r), r => HandleXmlElement(r), r => false);
    }

    public void WriteXml(XmlWriter writer)
    {
        XmlSerializationExtensions.WriteIXmlSerializable(writer, w => WriteAttributes(w), w => WriteNodes(w));
    }

    #endregion
}

public class DerivedEntity : Entity
{
    [XmlAttribute]
    public string Parameter { get; set; }
}

[System.Xml.Serialization.XmlIncludeAttribute(typeof(DerivedEntity))]
public abstract class Entity
{
    [XmlElement]
    public string Description { get; set; }

    [XmlAttribute]
    public string Id { get; set; }

    [XmlAttribute]
    public string Name { get; set; }

    protected virtual void WriteAttributes(XmlWriter writer)
    {
        if (Id != null)
            writer.WriteAttributeString("Id", Id);
        if (Name != null)
            writer.WriteAttributeString("Name", Name);
    }

    protected virtual bool HandleXmlAttribute(XmlReader reader)
    {
        if (reader.LocalName == "Id")
        {
            Id = reader.Value;
            return true;
        }
        else if (reader.LocalName == "Name")
        {
            Name = reader.Value;
            return true;
        }
        return false;
    }

    protected virtual void WriteNodes(XmlWriter writer)
    {
        if (Description != null)
        {
            writer.WriteElementString("Description", Description);
        }
    }

    protected virtual bool HandleXmlElement(XmlReader reader)
    {
        if (reader.LocalName == "Description")
        {
            Description = reader.ReadElementContentAsString();
            return true;
        }
        return false;
    }
}

And you will be able to deserialize and re-serialize Example successfully. Demo fiddle here.

Notes:

  • Seriously consider simplifying this architecture. This is all too complicated.

  • A single validation event will be correctly raised for <UnknownElement/> inside <Groups>, as no such element appears in the schema.

  • XmlSerializer.Deserialize() will throw an InvalidOperationException when the root XML element name and namespace do not match the expected name and namespace. You can check whether the name and namespace are correct by calling XmlSerializer.CanDeserialize(XmlReader).

  • Be sure to test deserialization of XML with and without indentation. Sometimes a ReadXml() method will advance the reader one node too far, but if the XML contains insignificant indentation (i.e. formatting) then no harm will be done as only an insignificant whitespace node gets skipped.

  • When Entity.HandleXmlElement(XmlReader reader) is overridden in a derived class, the base class method should be called first. If the base class method handles the element, true is returned and the derived class should not try to handle it. Similarly, if the derived class handles the element, true should be returned to more derived classes indicating the element was handled. false is returned when neither the class nor the base class could handle the element.

  • XmlReader.ReadSubtree() can be used to ensure that some derived class cannot misposition the XmlReader inside HandleXmlElement(XmlReader reader).

  • If you use any constructor other than new XmlSerializer(Type) and new XmlSerializer(Type, String) to construct an XmlSerializer, you must construct it only once and cache it statically to avoid a severe memory leak. For why, see the documentation and Memory Leak using StreamReader and XmlSerializer. You are not constructing a serializer in such a manner in your sample code but may be doing so in your production code.



来源:https://stackoverflow.com/questions/60449088/why-does-xmlserializer-throws-an-exception-and-raise-a-validationevent-when-a-sc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!