Complex nested XML Parsing in C#

此生再无相见时 提交于 2020-01-05 10:30:12

问题


<ndActivityLog repositoryId="AA-AAAA1AAA" repositoryName="Company Name" startDate="2013-07-05" endDate="2013-07-06">
    <activity date="2013-07-05T06:42:35" name="open" host="00.00.00.00">
        <user id="joebloggs@email.com" name="Joe Bloggs" memberType="I" /> 
        <storageObject docId="0000-0000-0000" name="Opinion" size="356864" fileExtension="doc">
            <cabinet name="Client and Matters">NG-5MIYABBV</cabinet> 
            <DocumentType>Legal Document</DocumentType> 
            <Author>Joe Bloggs</Author> 
            <Matter>1001</Matter> 
            <Client>R1234</Client> 
        </storageObject>
    </activity>
</ndActivityLog>

This is an example of the XML. There's around 4000 "activity" elements within the document, with varying levels of content. Some have the "Client" and "Matter" elements, others don't. To think of it like a table, these would be blank cells, but the column headers are still there.

I essentially need to parse this into an SQL database, keeping the data structure. On top of this, if an element doesn't exist in certain examples, it needs to reference that fact and leave it as a "blank cell".

 var doc = XDocument.Load(path + "\\" + file + ".xml");

        var root = doc.Root;
        foreach (XElement el in root.Elements())
        {

               // Console.WriteLine(el.Nodes());
                //  Console.WriteLine(el.Value);
                //Console.WriteLine("  Attributes:");
                foreach (XAttribute attr in el.Attributes())
                {

                    Console.WriteLine(attr);
                 //   Console.WriteLine(el.Elements("id"));


                }

           Console.WriteLine("---------------------------");

          // foreach (XElement element in el.Elements())
       //    {

     //          Console.WriteLine("    {0}: {1}", element.Name, element.Value);
      //     }

           }
            //hold console open
            Console.ReadLine();

        }

Code thus far. The output is shown below

date="2013-07-06T17:07:42"
name="open"
host="213.146.142.50

I basically need every piece of information to be extracted so I can store them in essentially a table layout. I'm reasonably new to using XML parsing, so any help would be appreciated.


回答1:


Only you know the permitted attribute names cabinet...Client. The simple brute force-way is to extract each of the expected attributes and then you will know which ones are missing and can set the cell to empty. Foreach will only iterate over what is present on each element - it cannot guess the missing ones.




回答2:


I think you could solve your problem in the following way:

  1. You create a class called BaseNode.

  2. You create classes which extend BaseNode for all of your entity types

  3. You create a set of rules which based on the node determine the preferred entity type

  4. You create a generateEntity method in your BaseNode class.

  5. You use this algorithm (this is not code, so do not try to compile it)

parseXML(node)

for each node in node do

    BaseNode.generateEntity(node.input)

    if (node.hasChildren())

        parseXML(node)

    end if

end for

end parseXML

Of course, you have to store and parse the generated entities.




回答3:


I am not saying this is the best or correct method to solve your particular problem, however, I am providing it as an abridged example of what you could do (hence the lack of exception/error handling etc).

namespace so.consoleapp
{
    using System;
    using System.Collections.Generic;
    using System.Xml.Linq;

    class Program
    {
        static void Main(string[] args)
        {
            var doc = XElement.Load("file.xml");
            var activityElements = doc.Elements("activity");

            ICollection<Activity> collectionOfActivities = new List<Activity>();
            foreach (var activityElement in activityElements)
            {
                var storageObjectElement = activityElement.Element("storageObject");

                string clientElement = null;
                if (storageObjectElement.Element("Client") != null)
                {
                    clientElement = storageObjectElement.Element("Client").Value;
                }

                var newStorageObject = new StorageObject
                {
                    Client = clientElement,
                    Author = storageObjectElement.Element("Author").Value
                };

                var userElement = activityElement.Element("user");
                var newUser = new User
                {
                    Id = userElement.Attribute("id").Value,
                    Name = userElement.Attribute("name").Value,
                    MemberType = userElement.Attribute("memberType").Value
                };

                collectionOfActivities.Add
                (
                    new Activity
                    {
                        Date = activityElement.Attribute("date").Value,
                        Name = activityElement.Attribute("name").Value,
                        Host = activityElement.Attribute("host").Value,
                        User = newUser,
                        StorageObject = newStorageObject
                    }
                );
            }

            Console.ReadLine();
        }
    }

    class Activity
    {
        public string Date
        {
            get;
            set;
        }

        public string Name
        {
            get;
            set;
        }

        public string Host
        {
            get;
            set;
        }

        public User User
        {
            get;
            set;
        }

        public StorageObject StorageObject
        {
            get;
            set;
        }
    }

    class User
    {
        public string Id
        {
            get;
            set;
        }

        public string Name
        {
            get;
            set;
        }

        public string MemberType
        {
            get;
            set;
        }
    }

    class StorageObject
    {
        public string Client
        {
            get;
            set;
        }

        public string Author
        {
            get;
            set;
        }
    }
}



回答4:


Try something like that. Create a new Windows Forms Application, add one DataGrid control to the form and code behind like below:

private void Form1_Load(object sender, EventArgs e)
        {
            populate_datagrid(dataGridView1);
        }

        private void populate_datagrid(DataGridView dataGridView1)
        {
            String xml_string = @"<ndActivityLog repositoryId=""AA-AAAA1AAA"" repositoryName=""Company Name"" startDate=""2013-07-05"" endDate=""2013-07-06"">
                                    <activity date=""2013-07-05T06:42:35"" name=""open"" host=""00.00.00.00"">
                                        <user id=""joebloggs@email.com"" name=""Joe Bloggs"" memberType=""I"" /> 
                                        <storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""356864"" fileExtension=""doc"">
                                            <cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet> 
                                            <DocumentType>Legal Document</DocumentType> 
                                            <Author>Joe Bloggs</Author> 
                                            <Matter>1001</Matter> 
                                            <Client>R1234</Client> 
                                        </storageObject>
                                    </activity>
                                    <activity date=""2013-06-05T06:42:35"" name=""close"" host=""00.00.00.00"">
                                        <user id=""abc@bca.com"" name=""abc"" memberType=""I"" /> 
                                        <storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""25630"" fileExtension=""doc"">
                                            <cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet> 
                                            <DocumentType>Legal Document</DocumentType> 
                                            <Author>abc</Author> 
                                            <Client>R1234</Client> 
                                        </storageObject>
                                    </activity>
                                    <activity date=""2013-06-05T06:42:35"" name=""unknown"" host=""00.00.00.00"">
                                        <user id=""bca@abc.com"" name=""bca"" memberType=""I"" /> 
                                        <storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""45875"" fileExtension=""doc"">
                                            <cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet> 
                                            <DocumentType>Legal Document</DocumentType> 
                                            <Author>bca</Author> 
                                            <Matter>1001</Matter> 
                                        </storageObject>
                                    </activity>
                                    <activity date=""2013-06-05T06:42:35"" name=""open"" host=""00.00.00.00"">
                                        <user id=""cab@abc.com"" name=""cab"" memberType=""I"" /> 
                                        <storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""45875"" fileExtension=""doc"">
                                            <cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet> 
                                            <DocumentType>Legal Document</DocumentType>
                                        </storageObject>
                                    </activity>
                                </ndActivityLog>";

            var query = from XElement c in System.Xml.Linq.XElement.Parse(xml_string).Descendants("activity")
                        select new
                        {
                            user = c.Elements("user").First().Attribute("name").Value,
                            author = c.Descendants("Author").Count() > 0 ? c.Descendants("Author").First().Value : "n/a",
                            matter = c.Descendants("Matter").Count() > 0 ? c.Elements("Matter").First().Value : "n/a"
                        };

            dataGridView1.DataSource = query.ToList();

        }

Hope this helps.



来源:https://stackoverflow.com/questions/18082610/complex-nested-xml-parsing-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!