Finding the lowest common ancestor of an XML node-set

问题

I have a node set constructed using the xsl:key structure in XSLT. I would like to find the lowest common ancestor (LCA) of all of the nodes in this node-set - any ideas?

I know about Kaysian intersects and XPath's intersect function, but these seem to be geared towards finding the LCA of just a pair of elements: I don't know in advance how many items will be in each node-set.

I was wondering if there might be a solution using a combination of the 'every' and 'intersect' expressions, but I haven't been able to think of one yet!

Thanks in advance, Tom

回答1:

Here is a bottom-up approach:

 <xsl:function name="my:lca" as="node()?">
  <xsl:param name="pSet" as="node()*"/>

  <xsl:sequence select=
   "if(not($pSet))
      then ()
      else
       if(not($pSet[2]))
         then $pSet[1]
         else
           if($pSet intersect $pSet/ancestor::node())
             then
               my:lca($pSet[not($pSet intersect ancestor::node())])
             else
               my:lca($pSet/..)
   "/>
 </xsl:function>

A test:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:my="my:my">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:variable name="vSet1" select=
      "//*[self::A.1.1 or self::A.2.1]"/>

    <xsl:variable name="vSet2" select=
      "//*[self::B.2.2.1 or self::B.1]"/>

    <xsl:variable name="vSet3" select=
      "$vSet1 | //B.2.2.2"/>

 <xsl:template match="/">
<!---->
     <xsl:sequence select="my:lca($vSet1)/name()"/>
     =========

     <xsl:sequence select="my:lca($vSet2)/name()"/>
     =========

     <xsl:sequence select="my:lca($vSet3)/name()"/>

 </xsl:template>

 <xsl:function name="my:lca" as="node()?">
  <xsl:param name="pSet" as="node()*"/>

  <xsl:sequence select=
   "if(not($pSet))
      then ()
      else
       if(not($pSet[2]))
         then $pSet[1]
         else
           if($pSet intersect $pSet/ancestor::node())
             then
               my:lca($pSet[not($pSet intersect ancestor::node())])
             else
               my:lca($pSet/..)
   "/>
 </xsl:function>
</xsl:stylesheet>

When this transformation is applied on the following XML document:

<t>
    <A>
        <A.1>
            <A.1.1/>
            <A.1.2/>
        </A.1>
        <A.2>
            <A.2.1/>
        </A.2>
        <A.3/>
    </A>
    <B>
        <B.1/>
        <B.2>
            <B.2.1/>
            <B.2.2>
                <B.2.2.1/>
                <B.2.2.2/>
            </B.2.2>
        </B.2>
    </B>
</t>

the wanted, correct result is produced for all three cases:

     A
     =========

     B
     =========

     t

Update: I have what I think is probably the most efficient algorithm.

The idea is that the LCA of a node-set is the same as the LCA of just two nodes of this node-set: the "leftmost" and the "rightmost" ones. The proof that this is correct is left as an exercise for the reader :)

Here is a complete XSLT 2.0 implementation:

<xsl:stylesheet version="2.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:my="my:my">
        <xsl:output omit-xml-declaration="yes" indent="yes"/>

        <xsl:variable name="vSet1" select=
          "//*[self::A.1.1 or self::A.2.1]"/>

        <xsl:variable name="vSet2" select=
          "//*[self::B.2.2.1 or self::B.1]"/>

        <xsl:variable name="vSet3" select=
          "$vSet1 | //B.2.2.2"/>

     <xsl:template match="/">
         <xsl:sequence select="my:lca($vSet1)/name()"/>
         =========

         <xsl:sequence select="my:lca($vSet2)/name()"/>
         =========

         <xsl:sequence select="my:lca($vSet3)/name()"/>

     </xsl:template>

     <xsl:function name="my:lca" as="node()?">
      <xsl:param name="pSet" as="node()*"/>

      <xsl:sequence select=
       "if(not($pSet))
          then ()
          else
           if(not($pSet[2]))
             then $pSet[1]
             else
              for $n1 in $pSet[1],
                  $n2 in $pSet[last()]
               return my:lca2nodes($n1, $n2)
       "/>
     </xsl:function>

     <xsl:function name="my:lca2nodes" as="node()?">
      <xsl:param name="pN1" as="node()"/>
      <xsl:param name="pN2" as="node()"/>

      <xsl:variable name="n1" select=
       "($pN1 | $pN2)
                    [count(ancestor-or-self::node())
                    eq
                     min(($pN1 | $pN2)/count(ancestor-or-self::node()))
                    ]
                     [1]"/>

      <xsl:variable name="n2" select="($pN1 | $pN2) except $n1"/>

      <xsl:sequence select=
       "$n1/ancestor-or-self::node()
                 [exists(. intersect $n2/ancestor-or-self::node())]
                     [1]"/>
     </xsl:function>
</xsl:stylesheet>

when this transformation is performed on the same XML document (above), the same correct result is produced, but much faster -- especially if the size of the node-set is big:

 A
 =========

 B
 =========

 t

回答2:

I tried the following:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:mf="http://example.com/mf"
  exclude-result-prefixes="xs mf"
  version="2.0">

  <xsl:output method="html" indent="yes"/>

  <xsl:function name="mf:lca" as="node()?">
    <xsl:param name="nodes" as="node()*"/>
    <xsl:variable name="all-ancestors" select="$nodes/ancestor::node()"/>
    <xsl:sequence
      select="$all-ancestors[every $n in $nodes satisfies exists($n/ancestor::node() intersect .)][last()]"/>
  </xsl:function>

  <xsl:template match="/">
    <xsl:sequence select="mf:lca(//foo)"/>
  </xsl:template>

</xsl:stylesheet>

Tested with the sample

<root>
  <anc1>
    <anc2>
      <foo/>
      <bar>
        <foo/>
      </bar>
      <bar>
        <baz>
          <foo/>
        </baz>
      </bar>
    </anc2>
  </anc1>
</root>

I get the anc2 element but I haven't tested with more complex settings and don't have the time now. Maybe you can try with your sample data and report back whether you get the results you want.

回答3:

Martin's solution will work, but I think it could be quite expensive in some situations, with a lot of elimination of duplicates. I'd be inclined to use an approach that finds the LCA of two nodes, and then use this recursively, on the theory that LCA(x,y,z) = LCA(LCA(x,y),z) [a theory which I leave the reader to prove...].

Now LCA(x,y) can be found fairly efficiently by looking at the sequences x/ancestor-or-self::node() and y/ancestor-or-self::node(), truncating both sequences to the length of the shorter, and then finding the last node that is in both: in XQuery notation:

( let $ax := $x/ancestor-or-self::node()
  let $ay := $y/ancestor-or-self::node()
  let $len := min((count($ax), count($ay))
  for $i in reverse($len to 1) 
  where $ax[$i] is $ay[$i]
  return $ax[$i]
)[1]

来源：https://stackoverflow.com/questions/8742002/finding-the-lowest-common-ancestor-of-an-xml-node-set

标签

xml

xslt

xpath

xslt-2.0