Extract text between tags with XPath including markup

匿名 (未验证) 提交于 2019-12-03 00:52:01

问题:

I have the following piece of XML:

...<span class="st">In Tim <em>Power</em>: Politieman...</span>... 

I want to extract the part between the <span> tags. For this I use XPath:

   /span[@class="st"] 

This however will extract everything including the <span>. and.

  /span[@class="st"]/text() 

will return a list of two text elements. One containing "In Tim". The other ":Politieman". The <em>..</em> is not included and is handled like a separator.

Is there a pure XPath solution which returns:

In Tim <em>Power</em>: Politieman... 

EDIT Thanks to @helderdarocha and @TextGeek. Seems non trivial to extract plain text with XPath only including the <em>.

The /span[@class="st"]/node() solution creates a list containing the individual lines, from which it is trivial in Python to create a String.

回答1:

To get any child node you can use:

/span[@class="st"]/node() 

This will return:

  1. Two child text nodes
  2. The full <em> node (element and contents).

If you actually want all the text() nodes, including the ones inside em, then get all the text() descendants:

/span[@class="st"]//text() 

or

/span[@class="st"]/descendant::text() 

This will return three text nodes, the text inside <em>, but not the <em> elements.



回答2:

Sounds like you want the equivalent of the Javascript DOM innerHTML() function, but for XML. I don't think there's a way to do that in pure XPath.

XPath doesn't really operate on markup strings like "<em>" and "</em>" at all -- it works with a tree of Node objects (there might possibly be an XPath implementation that tries to work directly off markup, but I doubt it). Most XPath implementations wouldn't even have the 4 characters "<em>" anywhere (except maybe kept around for printing error messages or something), and of course the DOM could have been built from scratch rather than from XML or other input in the first place. Likewise, XPath doesn't really figure on handing back marked-up strings, but lists of nodes.

In XSLT or XQuery you can do this easily, but not in XPath by itself, unless I'm missing something.

-s



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!