Xpath query to get the ancester nodes based on element value

问题

I am trying to find all the element names which follow the below two rules.

1. elements should have the <set>erase</set>

2. if two or more elements have the <set>erase</set> in hierarchy (Ex: <b> and <d> both have <set>erase</set>) then only the parent node name has to be printed(ie <b> in this case).

So the required result for below xml needs to be :

b , y , p

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a>
    <b>
        <set>erase</set>
        <d>
        <set>erase</set>
        </d>
    </b>

    <c>
        <x>
        </x>
    </c>

    <e>

        <y>
                    <set>erase</set>
            <q>
            </q>
        </y>
        <z>
            <p>
            <set>erase</set>
            </p>
        </z>
    </e>
</a>

When I use the query = (//set[contains(.,'erase')])[1] I get only node b in result set.
When I use the query = //set[contains(.,'erase')] I get all nodesList b,d,y,p in result set.

Can anyone help me find the query to result in nodeList b , y and p.

Here is the java code snippet I used.

        XPath xpath = factory.newXPath();
    String query = "//set[contains(.,'erase')]";
            XPathExpression expr=null;
    try {
        expr = xpath.compile(query);
    } catch (XPathExpressionException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
        Object result = null;
    try {
        result = expr.evaluate(doc, XPathConstants.NODESET);
    } catch (XPathExpressionException e) {
        e.printStackTrace();
    }
    NodeList nodes = (NodeList) result;


    for (int i = 0; i < nodes.getLength(); i++) {
        String x = "";
        Node n = nodes.item(i).getParentNode();
        x=n.getNodeName();
        while(!n.getNodeName().equalsIgnoreCase(request.getClass().getSimpleName())){
            if ((n = n.getParentNode())!=null){
                x=n.getNodeName()+"."+x;
            }
        }



        System.out.println("Path: "+x);

output:

a.b
a.b.d
a.e.y
a.e.z.p

Could anyone help me figure out the query which results in only a.b , a.e.y and a.e.z.p Let me know if you need more details. or any other use-case.

回答1:

One expression that selects exactly the wanted elements is:

      //*[set[. = 'erase' and not(node()[2])]
         and
          not(ancestor::*
                 [set
                    [. = 'erase' and not(node()[2])]
                 ]
              )
          ]

XSLT - based verification:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:template match="/">
         <xsl:for-each select=
         "//*[set[. = 'erase' and not(node()[2])]
             and
              not(ancestor::*
                     [set
                        [. = 'erase' and not(node()[2])]
                     ]
                  )
              ]">

          <xsl:value-of select="name()"/>
          <xsl:text>&#xA;</xsl:text>
        </xsl:for-each>
     </xsl:template>
</xsl:stylesheet>

This transformation, when applied on the provided by Sean B. Durkin XML document:

<a>
    <b>
        <set>erase</set>
        <set>
            <a/>erase
        </set>
        <d>
            <set>erase</set>
        </d>
    </b>
    <c>
        <x>         </x>
    </c>
    <e>
        <y>
            <set>erase</set>
            <q>             </q>
        </y>
        <z>
            <p>
                <set>erase</set>
            </p>
        </z>
    </e>
</a>

evaluates the XPath expression above and outputs the names of the selected elements -- the wanted, correct result is produced:

b
y
p

Do note that the following two expressions are quite incorrect:

*[set[text()='erase']][not(ancestor::*[set[text()='erase']])]

Or:

*[set[text()='erase']][ancestor::*[set[text()!='erase']]]

These two expressions suffer from more than one problem:

They are relative expressions and regardless with which initial context they are applied, they cannot select all wanted elements in an hierarchy with undefined depth and structure.
set[text()='erase'] selects not only an element of the form:

...

<set>erase</set>

but also elements of the form:

<set>
xyz
 <a/>erase</set>

.3. Similarly:

set[text()!='erase']

selects elements of the form:

<set>
xyz
 <a/>erase</set>

回答2:

This is my second attempt:

//*[                    set[count(node())=1 and text()='erase'] and
      not( ancestor::*[ set[count(node())=1 and text()='erase']])
   ]

This selection passes the test case shown in my first answer.

回答3:

The following XPath selects the nodes that you want:

//*[set[text()='erase']][not(ancestor::*[set[text()='erase']])]

I tested it with the following stylesheet

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" encoding="utf-8" indent="yes"/>

    <xsl:template match="@*|text()" />

    <xsl:template match="//*[set[text()='erase']][not(ancestor::*[set[text()='erase']])]">
        <xsl:text>(</xsl:text>
        <xsl:for-each select="self::*|ancestor::*">
            <xsl:value-of select="name()"/>
            <xsl:text>.</xsl:text>
        </xsl:for-each>
        <xsl:text>) </xsl:text>
    </xsl:template>

</xsl:stylesheet>

It produced the output

(a.b.) (a.e.y.) (a.e.z.p.)

回答4:

Or this slight tweek on Harpo's answer?:

*[set[text()='erase']][ancestor::*[set[text()!='erase']]]

Following my comment on Novatchev's answer, please consider useful test case:

This is a change from the questionioner's demo document. I have added another node.

<?xml version="1.0"?>
<a>
    <b>
        <set>erase</set>
        <set><a/>erase</set>
        <d>
        <set>erase</set>
        </d>
    </b>
    <c>
        <x>
        </x>
    </c>
    <e>
        <y>
        <set>erase</set>
            <q>
            </q>
        </y>
        <z>
            <p>
            <set>erase</set>
            </p>
        </z>
    </e>
</a>

Answer should be

b
y
p

来源：https://stackoverflow.com/questions/9271001/xpath-query-to-get-the-ancester-nodes-based-on-element-value

标签

xml

xslt

xpath

xml-parsing