问题
<ndActivityLog repositoryId="AA-AAAA1AAA" repositoryName="Company Name" startDate="2013-07-05" endDate="2013-07-06">
<activity date="2013-07-05T06:42:35" name="open" host="00.00.00.00">
<user id="joebloggs@email.com" name="Joe Bloggs" memberType="I" />
<storageObject docId="0000-0000-0000" name="Opinion" size="356864" fileExtension="doc">
<cabinet name="Client and Matters">NG-5MIYABBV</cabinet>
<DocumentType>Legal Document</DocumentType>
<Author>Joe Bloggs</Author>
<Matter>1001</Matter>
<Client>R1234</Client>
</storageObject>
</activity>
</ndActivityLog>
This is an example of the XML. There's around 4000 "activity" elements within the document, with varying levels of content. Some have the "Client" and "Matter" elements, others don't. To think of it like a table, these would be blank cells, but the column headers are still there.
I essentially need to parse this into an SQL database, keeping the data structure. On top of this, if an element doesn't exist in certain examples, it needs to reference that fact and leave it as a "blank cell".
var doc = XDocument.Load(path + "\\" + file + ".xml");
var root = doc.Root;
foreach (XElement el in root.Elements())
{
// Console.WriteLine(el.Nodes());
// Console.WriteLine(el.Value);
//Console.WriteLine(" Attributes:");
foreach (XAttribute attr in el.Attributes())
{
Console.WriteLine(attr);
// Console.WriteLine(el.Elements("id"));
}
Console.WriteLine("---------------------------");
// foreach (XElement element in el.Elements())
// {
// Console.WriteLine(" {0}: {1}", element.Name, element.Value);
// }
}
//hold console open
Console.ReadLine();
}
Code thus far. The output is shown below
date="2013-07-06T17:07:42"
name="open"
host="213.146.142.50
I basically need every piece of information to be extracted so I can store them in essentially a table layout. I'm reasonably new to using XML parsing, so any help would be appreciated.
回答1:
Only you know the permitted attribute names cabinet...Client. The simple brute force-way is to extract each of the expected attributes and then you will know which ones are missing and can set the cell to empty. Foreach will only iterate over what is present on each element - it cannot guess the missing ones.
回答2:
I think you could solve your problem in the following way:
You create a class called BaseNode.
You create classes which extend BaseNode for all of your entity types
You create a set of rules which based on the node determine the preferred entity type
You create a generateEntity method in your BaseNode class.
You use this algorithm (this is not code, so do not try to compile it)
parseXML(node)
for each node in node do
BaseNode.generateEntity(node.input)
if (node.hasChildren())
parseXML(node)
end if
end for
end parseXML
Of course, you have to store and parse the generated entities.
回答3:
I am not saying this is the best or correct method to solve your particular problem, however, I am providing it as an abridged example of what you could do (hence the lack of exception/error handling etc).
namespace so.consoleapp
{
using System;
using System.Collections.Generic;
using System.Xml.Linq;
class Program
{
static void Main(string[] args)
{
var doc = XElement.Load("file.xml");
var activityElements = doc.Elements("activity");
ICollection<Activity> collectionOfActivities = new List<Activity>();
foreach (var activityElement in activityElements)
{
var storageObjectElement = activityElement.Element("storageObject");
string clientElement = null;
if (storageObjectElement.Element("Client") != null)
{
clientElement = storageObjectElement.Element("Client").Value;
}
var newStorageObject = new StorageObject
{
Client = clientElement,
Author = storageObjectElement.Element("Author").Value
};
var userElement = activityElement.Element("user");
var newUser = new User
{
Id = userElement.Attribute("id").Value,
Name = userElement.Attribute("name").Value,
MemberType = userElement.Attribute("memberType").Value
};
collectionOfActivities.Add
(
new Activity
{
Date = activityElement.Attribute("date").Value,
Name = activityElement.Attribute("name").Value,
Host = activityElement.Attribute("host").Value,
User = newUser,
StorageObject = newStorageObject
}
);
}
Console.ReadLine();
}
}
class Activity
{
public string Date
{
get;
set;
}
public string Name
{
get;
set;
}
public string Host
{
get;
set;
}
public User User
{
get;
set;
}
public StorageObject StorageObject
{
get;
set;
}
}
class User
{
public string Id
{
get;
set;
}
public string Name
{
get;
set;
}
public string MemberType
{
get;
set;
}
}
class StorageObject
{
public string Client
{
get;
set;
}
public string Author
{
get;
set;
}
}
}
回答4:
Try something like that. Create a new Windows Forms Application
, add one DataGrid
control to the form and code behind like below:
private void Form1_Load(object sender, EventArgs e)
{
populate_datagrid(dataGridView1);
}
private void populate_datagrid(DataGridView dataGridView1)
{
String xml_string = @"<ndActivityLog repositoryId=""AA-AAAA1AAA"" repositoryName=""Company Name"" startDate=""2013-07-05"" endDate=""2013-07-06"">
<activity date=""2013-07-05T06:42:35"" name=""open"" host=""00.00.00.00"">
<user id=""joebloggs@email.com"" name=""Joe Bloggs"" memberType=""I"" />
<storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""356864"" fileExtension=""doc"">
<cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet>
<DocumentType>Legal Document</DocumentType>
<Author>Joe Bloggs</Author>
<Matter>1001</Matter>
<Client>R1234</Client>
</storageObject>
</activity>
<activity date=""2013-06-05T06:42:35"" name=""close"" host=""00.00.00.00"">
<user id=""abc@bca.com"" name=""abc"" memberType=""I"" />
<storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""25630"" fileExtension=""doc"">
<cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet>
<DocumentType>Legal Document</DocumentType>
<Author>abc</Author>
<Client>R1234</Client>
</storageObject>
</activity>
<activity date=""2013-06-05T06:42:35"" name=""unknown"" host=""00.00.00.00"">
<user id=""bca@abc.com"" name=""bca"" memberType=""I"" />
<storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""45875"" fileExtension=""doc"">
<cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet>
<DocumentType>Legal Document</DocumentType>
<Author>bca</Author>
<Matter>1001</Matter>
</storageObject>
</activity>
<activity date=""2013-06-05T06:42:35"" name=""open"" host=""00.00.00.00"">
<user id=""cab@abc.com"" name=""cab"" memberType=""I"" />
<storageObject docId=""0000-0000-0000"" name=""Opinion"" size=""45875"" fileExtension=""doc"">
<cabinet name=""Client and Matters"">NG-5MIYABBV</cabinet>
<DocumentType>Legal Document</DocumentType>
</storageObject>
</activity>
</ndActivityLog>";
var query = from XElement c in System.Xml.Linq.XElement.Parse(xml_string).Descendants("activity")
select new
{
user = c.Elements("user").First().Attribute("name").Value,
author = c.Descendants("Author").Count() > 0 ? c.Descendants("Author").First().Value : "n/a",
matter = c.Descendants("Matter").Count() > 0 ? c.Elements("Matter").First().Value : "n/a"
};
dataGridView1.DataSource = query.ToList();
}
Hope this helps.
来源:https://stackoverflow.com/questions/18082610/complex-nested-xml-parsing-in-c-sharp