XmlReader read continually

久未见 提交于 2021-02-10 03:20:02

问题


I have a very large xml file. This is the simplified version of xml format.

<?xml version='1.0' encoding='UTF-8'?>
<Sender>
 <SenderID>571099948</SenderID>
 <Sponsors>
  <Sponsor>
    <SponsorID>TEST01</SponsorID>
    <Contracts>
      <Contract>
        <ContractID>000001</ContractID>
        <Member>
          <SSN>1111111111</SSN>
          <Gender>M</Gender>
          <Benefits>
            <Benefit BenefitType="AAA">
            </Benefit>
            <Benefit BenefitType="BBB">
            </Benefit>
          </Benefits>
        </Member>
        <Member>
          <SSN>4444444444</SSN>
          <Gender>F</Gender>
          <Benefits>
            <Benefit BenefitType="AAA">
            </Benefit>
          </Benefits>
        </Member>
      </Contract>
      <Contract>
        <ContractID>0000002</ContractID>
        <Member>
          <SSN>2222222222</SSN>
          <Gender>F</Gender>
          <Benefits>
            <Benefit BenefitType="CCC">
            </Benefit>
            <Benefit BenefitType="DDD">
            </Benefit>
          </Benefits>
        </Member>
      </Contract>
      <Contract>
        <ContractID>0000003</ContractID>
        <Member>
          <SSN>333333333</SSN>
          <Gender>F</Gender>
          <Benefits> 
            <Benefit BenefitType="CCC">
            </Benefit>
          </Benefits>
        </Member>
      </Contract>
    </Contracts>
  </Sponsor>
  <Sponsor>
    <SponsorID>TEST02</SponsorID>
    <Contracts>
      <Contract>
        <ContractID>0000011</ContractID>
        <Member>
          <SSN>1111111111</SSN>
          <Gender>M</Gender>
          <Benefits>
          </Benefits>
        </Member>
      </Contract>
      <Contract>
        <ContractID>0000002</ContractID>
        <Member>
          <SSN>2222222222</SSN>
          <Gender>F</Gender>
          <Benefits>
          </Benefits>
        </Member>
      </Contract>
    </Contracts>
  </Sponsor>
</Sponsors>
</Sender>

I want get all information of contract node, as well as SponsorID from the parent node. Here is the code to partially read xml file using XmlReader:

        static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)      
    {

            using (XmlReader reader = XmlReader.Create(inputUrl))
            {
                reader.MoveToContent();
                while (reader.Read())
                {
                    if (reader.NodeType == XmlNodeType.Element)
                    {
                        if (reader.Name == elementName)
                        {
                            XElement el = XNode.ReadFrom(reader) as XElement;
                            if (el != null)
                            {
                                yield return el;
                            }
                        }
                    }
                }
            }                  
    }

Here is the issue. I cannot use this, because the whole sponsor tree may be too large for the memory.

var sponsor = SimpleStreamAxis(file, "Sponsor");

I cannot use this either, because I cannot tell SponsorID with only Contract node info.

var contract = SimpleStreamAxis(file, "Contract");

Is there a way that I can read the SponsorID in Sponsor, move cursor forward, and read all the Contract nodes under this Sponsor, then move to next Sponsor and read SponsorID and its Contract nodes and so on?


回答1:


Try this:

using (XmlReader xmlReader = XmlReader.Create("file.xml"))
{
    while (xmlReader.Read())
    {
        if (xmlReader.ReadToFollowing("SponsorID"))
        {
            string sponsorId = xmlReader.ReadElementContentAsString();

            // process SponsorID
            Console.WriteLine(sponsorId);

            if (xmlReader.ReadToFollowing("Contract"))
            {
                do
                {
                    XmlReader contractSubtree = xmlReader.ReadSubtree();
                    XElement contractElement = XElement.Load(contractSubtree);

                    // process Contract
                    Console.WriteLine(contractElement.Element("ContractID"));

                } while (xmlReader.ReadToNextSibling("Contract"));
            }
        }
    }
}



回答2:


Yes, this can be done assuming that SponsorID always precedes the Contract nodes.

The basic idea is to read through the XML file until you find elements with the desired names "SponsorID" or"Contract", then yield them for higher processing

    public static IEnumerable<XElement> StreamNamedElements(XmlReader reader, IEnumerable<XName> names)
    {
        var nameSet = new HashSet<XName>(names);

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
            {
                XElement el = XNode.ReadFrom(reader) as XElement;
                if (el != null)
                    yield return el;
            }
        }
    }

In cases where SponsorID is always present and precedes Contract, this will enumerate through these elements correctly. However, if a sponsor ID is missing or out of order, the sponsor ID from a previous sponsor might get picked up. This error can be trapped by restricting the scope of each "SponsorID" to the containing "Sponsor" element using ReadSubtree():

    public static IEnumerable<XmlReader> StreamNamedSubtrees(XmlReader reader, IEnumerable<XName> names)
    {
        var nameSet = new HashSet<XName>(names);

        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
            {
                var subReader = reader.ReadSubtree();
                yield return subReader;
                ((IDisposable)subReader).Dispose(); // Be sure to advance to the end of the subtree if the caller did not.
            }
        }
    }

And then use it like:

        using (var sr = new StringReader(xml))
        using (var reader = XmlReader.Create(sr))
        {
            foreach (var subReader in StreamNamedSubtrees(reader, new[] { (XName)"Sponsor" }))
            {
                XElement sponsorID = null;
                foreach (var el in StreamNamedElements(subReader, new[] { (XName)"SponsorID", (XName)"Contract" }))
                {
                    if (el.Name == "SponsorID")
                    {
                        sponsorID = el;
                    }
                    else if (el.Name == "Contract")
                    {
                        if (sponsorID == null)
                            throw new InvalidOperationException();
                        // Example "higher processing"
                        Debug.WriteLine(string.Format("{0}: {1}", sponsorID.Value, el.ToString()));
                    }
                }
            }
        }


来源:https://stackoverflow.com/questions/31062289/xmlreader-read-continually

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!