Streaming XPath evaluation

后端 未结 7 1386
萌比男神i
萌比男神i 2020-12-13 21:06

Are there any production-ready libraries for streaming XPath expressions evaluation against provided xml-document? My investigations show that most of existing solutions loa

相关标签:
7条回答
  • 2020-12-13 21:31

    Try Joost.

    0 讨论(0)
  • 2020-12-13 21:35

    I think I'll go for custom code. .NET library gets us quite close to the target, if one just wants to read some paths of the xml document.

    Since all the solutions I see so far respect only XPath subset, this is also this kind of solution. The subset is really small though. :)

    This C# code reads xml file and counts nodes given an explicit path. You can also operate on attributes easily, using xr["attrName"] syntax.

      int c = 0;
      var r = new System.IO.StreamReader(asArgs[1]);
      var se = new System.Xml.XmlReaderSettings();
      var xr = System.Xml.XmlReader.Create(r, se);
      var lstPath = new System.Collections.Generic.List<String>();
      var sbPath = new System.Text.StringBuilder();
      while (xr.Read()) {
        //Console.WriteLine("type " + xr.NodeType);
        if (xr.NodeType == System.Xml.XmlNodeType.Element) {
          lstPath.Add(xr.Name);
        }
    
        // It takes some time. If 1 unit is time needed for parsing the file,
        // then this takes about 1.0.
        sbPath.Clear();
        foreach(object n in lstPath) {
          sbPath.Append('/');
          sbPath.Append(n);
        }
        // This takes about 0.6 time units.
        string sPath = sbPath.ToString();
    
        if (xr.NodeType == System.Xml.XmlNodeType.EndElement
            || xr.IsEmptyElement) {
          if (xr.Name == "someElement" && lstPath[0] == "main")
            c++;
          // And test simple XPath explicitly:
          // if (sPath == "/main/someElement")
        }
    
        if (xr.NodeType == System.Xml.XmlNodeType.EndElement
            || xr.IsEmptyElement) {
          lstPath.RemoveAt(lstPath.Count - 1);
        }
      }
      xr.Close();
    
    0 讨论(0)
  • 2020-12-13 21:40

    There are several options:

    • DataDirect Technologies sells an XQuery implementation that employs projection and streaming, where possible. It can handle files into the multi-gigabyte range - e.g. larger than available memory. It's a thread-safe library, so it's easy to integrate. Java-only.

    • Saxon is an open-source version, with a modestly-priced more expensive cousin, which will do streaming in some contexts. Java, but with a .net port also.

    • MarkLogic and eXist are XML databases that, if your XML is loaded into them, will process XPaths in a fairly intelligent fashion.

    0 讨论(0)
  • 2020-12-13 21:45

    XSLT 3.0 provides streaming mode of processing and this will become a standard with the XSLT 3.0 W3C specification becoming a W3C Recommendation.

    At the time of writing this answer (May, 2011) Saxon provides some support for XSLT 3.0 streaming .

    0 讨论(0)
  • 2020-12-13 21:47

    Though I have no practical experience with it, I thought it is worth mentioning QuiXProc ( http://code.google.com/p/quixproc/ ). It is a streaming approach to XProc, and uses libraries that provide streaming support for XPath amongst others..

    0 讨论(0)
  • 2020-12-13 21:52

    Would this be practical for a complete XPath implementation, given that XPath syntax allows for:

    /AAA/XXX/following::*
    

    and

    /AAA/BBB/following-sibling::*
    

    which implies look-ahead requirements ? i.e. from a particular node you're going to have to load the rest of the document anyway.

    The doc for the Nux library (specifically StreamingPathFilter) makes this point, and references some implementations that rely on a subset of XPath. Nux claims to perform some streaming query capability, but given the above there will be some limitations in terms of XPath implementation.

    0 讨论(0)
提交回复
热议问题