XPath text with children

前端 未结 2 1409
醉话见心
醉话见心 2021-01-23 07:16

Given this html:

2条回答
  •  我在风中等你
    2021-01-23 08:18

    @Tomalak is correct in saying that XPath generally cannot select that which is not there.

    However, in this case, the results you want are the string values of li elements. As you've found,

    string(//ul/li)
    

    gets you close but only returns the first desired string.

    This points to a shortcoming in XPath 1.0 that was addressed in XPath 2.0.

    In XPath 1.0, you have to iterate over the nodeset selected by //ul/li outside of XPath -- in XSLT, Python, Java, etc.

    In XPath 2.0, the last location step can be a function, so you can use,

    //ul/li/string()
    

    to directly return

    This is a link
    This is another link.
    

    as requested.

    This is more educational than practical if you're stuck with Scrapy, which only supports XPath 1.0, but knowing

    • XPath 1.0 only passes the first of a nodeset to string(),
    • XPath 2.0 allows the last location step to be a function, and
    • there's a difference between text() nodes and string values

    is generally helpful in reasoning about XPath text selections.

提交回复
热议问题