How to get xpath from an XmlNode instance

后端 未结 14 1112
难免孤独
难免孤独 2020-11-30 18:17

Could someone supply some code that would get the xpath of a System.Xml.XmlNode instance?

Thanks!

14条回答
  •  情书的邮戳
    2020-11-30 18:52

    Jon's correct that there are any number of XPath expressions that will yield the same node in an an instance document. The simplest way to build an expression that unambiguously yields a specific node is a chain of node tests that use the node position in the predicate, e.g.:

    /node()[0]/node()[2]/node()[6]/node()[1]/node()[2]
    

    Obviously, this expression isn't using element names, but then if all you're trying to do is locate a node within a document, you don't need its name. It also can't be used to find attributes (because attributes aren't nodes and don't have position; you can only find them by name), but it will find all other node types.

    To build this expression, you need to write a method that returns a node's position in its parent's child nodes, because XmlNode doesn't expose that as a property:

    static int GetNodePosition(XmlNode child)
    {
       for (int i=0; i

    (There's probably a more elegant way to do that using LINQ, since XmlNodeList implements IEnumerable, but I'm going with what I know here.)

    Then you can write a recursive method like this:

    static string GetXPathToNode(XmlNode node)
    {
        if (node.NodeType == XmlNodeType.Attribute)
        {
            // attributes have an OwnerElement, not a ParentNode; also they have
            // to be matched by name, not found by position
            return String.Format(
                "{0}/@{1}",
                GetXPathToNode(((XmlAttribute)node).OwnerElement),
                node.Name
                );            
        }
        if (node.ParentNode == null)
        {
            // the only node with no parent is the root node, which has no path
            return "";
        }
        // the path to a node is the path to its parent, plus "/node()[n]", where 
        // n is its position among its siblings.
        return String.Format(
            "{0}/node()[{1}]",
            GetXPathToNode(node.ParentNode),
            GetNodePosition(node)
            );
    }
    

    As you can see, I hacked in a way for it to find attributes as well.

    Jon slipped in with his version while I was writing mine. There's something about his code that's going to make me rant a bit now, and I apologize in advance if it sounds like I'm ragging on Jon. (I'm not. I'm pretty sure that the list of things Jon has to learn from me is exceedingly short.) But I think the point I'm going to make is a pretty important one for anyone who works with XML to think about.

    I suspect that Jon's solution emerged from something I see a lot of developers do: thinking of XML documents as trees of elements and attributes. I think this largely comes from developers whose primary use of XML is as a serialization format, because all the XML they're used to using is structured this way. You can spot these developers because they're using the terms "node" and "element" interchangeably. This leads them to come up with solutions that treat all other node types as special cases. (I was one of these guys myself for a very long time.)

    This feels like it's a simplifying assumption while you're making it. But it's not. It makes problems harder and code more complex. It leads you to bypass the pieces of XML technology (like the node() function in XPath) that are specifically designed to treat all node types generically.

    There's a red flag in Jon's code that would make me query it in a code review even if I didn't know what the requirements are, and that's GetElementsByTagName. Whenever I see that method in use, the question that leaps to mind is always "why does it have to be an element?" And the answer is very often "oh, does this code need to handle text nodes too?"

提交回复
热议问题