Efficient Parsing of XML

北战南征 提交于 2019-12-29 09:32:22

问题


Good day,

I'm writing a program in C# .Net to manage products of my store,

Following a given link I can retrieve an XML file that contains all the possible products that I can list onto my storefront.

The XML structure looks like this :

<Product StockCode="103-10440">
    <lastUpdated><![CDATA[Fri, 20 May 2016 17:00:03 GMT]]></lastUpdated>
    <StockCode><![CDATA[103-10440]]></StockCode>
    <Brand><![CDATA[3COM]]></Brand>
    <BrandID><![CDATA[14]]></BrandID>
    <ProdName><![CDATA[BIG FLOW BLOWING JUNCTION FLEX BLOCK, TAKES 32, 40]]>     </ProdName>
    <ProdDesc/>
    <Categories>
        <TopCat><![CDATA[Accessories]]></TopCat>
        <TopCatID><![CDATA[24]]></TopCatID>
    </Categories>
    <ProdImg/>
    <ProdPriceExclVAT><![CDATA[30296.79]]></ProdPriceExclVAT>
    <ProdQty><![CDATA[0]]></ProdQty>
    <ProdExternalURL><![CDATA[http://pinnacle.eliance.co.za/#!/product/4862]]></ProdExternalURL>
</Product>

Here are the entries I'm looking for :

  • lastUpdated
  • StockCode
  • Brand
  • ProdName
  • ProdDesc
  • TopCat <--- nested in Categories tag.
  • ProdImg
  • ProdPriceExclVAT
  • ProdQty
  • ProdExternalURL

This is all fine to handle , and in-fact I did :

public ProductList Parse() {

    XmlDocument doc = new XmlDocument();
    doc.Load(XMLLink);

    XmlNodeList ProductNodeList = doc.GetElementsByTagName("Product");
    foreach (XmlNode node in ProductNodeList) {
        Product Product = new Product();

        for (int i = 0; i < node.ChildNodes.Count; i++) {
            if (node.ChildNodes[i].Name == "StockCode") {
                Product.VariantSKU = Convert.ToString(node.ChildNodes[i].InnerText);
            }
            if (node.ChildNodes[i].Name == "Brand") {
                Product.Vendor = Convert.ToString(node.ChildNodes[i].InnerText);
            }
            if (node.ChildNodes[i].Name == "ProdName") {
                Product.Title = Convert.ToString(node.ChildNodes[i].InnerText);
                Product.SEOTitle = Product.Title;
                Product.Handle = Product.Title;
            }
            if (node.ChildNodes[i].Name == "ProdDesc") {
                Product.Body = Convert.ToString(node.ChildNodes[i].InnerText);
                Product.SEODescription = Product.Body;
                if (Product.Body == "") {
                    Product.Body = "ERROR";
                    Product.SEODescription = "ERROR";
                }
            }
            if (node.ChildNodes[i].Name == "Categories") {
                if (!tempList.Categories.Contains(node.ChildNodes[i].ChildNodes[0].InnerText)) {
                    if (!tempList.Categories.Contains("All")) {
                        tempList.Categories.Add("All");
                    }
                        tempList.Categories.Add(node.ChildNodes[i].ChildNodes[0].InnerText);
                }

                Product.Type = Convert.ToString(node.ChildNodes[i].ChildNodes[0].InnerText);
            }
            if (node.ChildNodes[i].Name == "ProdImg") {
                Product.ImageSrc = Convert.ToString(node.ChildNodes[i].InnerText);
                if (Product.ImageSrc == "") {
                    Product.ImageSrc = "ERROR";
                }
                Product.ImageAlt = Product.Title;
            }
            if (node.ChildNodes[i].Name == "ProdPriceExclVAT") {
                float baseprice = float.Parse(node.ChildNodes[i].InnerText);
                double Costprice = ((baseprice * 0.14) + (baseprice * 0.15) + baseprice);
                Product.VariantPrice = Costprice.ToString("0.##");
            }
        }
        Product.Supplier = "Pinnacle";
        if (!tempList.Suppliers.Contains(Product.Supplier)) {
            tempList.Suppliers.Add(Product.Supplier);
        }
        tempList.Products.Add(Product);
        }
    return tempList;
    }
}

The problem is however, that this way of doing it, takes about 10 seconds to finish, and this is only just the first of multiple such files that I have to parse.

I am looking for the most efficient way to parse this XML file, getting all the fields's data that I mentioned above.

EDIT : I benchmarked the code when running with a pre-downloaded copy of the file, and when downloading the file from the server at runtime :

  • With local copy : 5 Seconds.

  • Without local copy : 7.30 Seconds.


回答1:


With large XML files you have to use an XmlReader. The code below will read one Product at a time.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlReader reader = XmlReader.Create("filename");
            while(!reader.EOF)
            {
                if (reader.Name != "Product")
                {
                    reader.ReadToFollowing("Product");
                }
                if (!reader.EOF)
                {
                    XElement product = (XElement)XElement.ReadFrom(reader);
                    string lastUpdated = (string)product.Element("lastUpdated");
                }
            }
        }
    }
}


来源:https://stackoverflow.com/questions/37503602/efficient-parsing-of-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!