How best to use XPath with very large XML files in .NET?

前端未结

关注

 10  2055

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The prob

相关标签:

10条回答

心在旅途

2020-12-14 10:29

It seems that you already tried using XPathDocument and could not accomodate the parsed xml document in memory.

If this is the case, before starting to split the file (which is ultimately the right decision!) you may try using the Saxon XSLT/XQuery processor. It has a very efficient in-memory representation of a loaded XML document (the "tinytree" model). In addition Saxon SA (the shema-aware version, which isn't free) has some streaming extensions. Read more about this here.

0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2020-12-14 10:32

XPathReader is the answer. It isn't part of the C# runtime, but it is available for download from Microsoft. Here is an MSDN article.

If you construct an XPathReader with an XmlTextReader you get the efficiency of a streaming read with the convenience of XPath expressions.

I haven't used it on gigabyte sized files, but I have used it on files that are tens of megabytes, which is usually enough to slow down DOM based solutions.

Quoting from the below: "The XPathReader provides the ability to perform XPath over XML documents in a streaming manner".

Download from Microsoft

0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2020-12-14 10:46

How about just reading the whole thing into a database and then work with the temp database? That might be better because then your queries can be done more efficiently using TSQL.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野性不改

2020-12-14 10:47

Since in your case the data size can run in Gbs have you considered using ADO.NET with XML as a database. In addition to that the memory footprint would not be huge.

Another approach would be using Linq to XML with using elements like XElementStream. Hope this helps.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2