Xpath to extract text between specific div tag and next div

江枫思渺然 提交于 2021-02-11 14:21:04

问题


I want to extract text in <p> between the div tag 'Heading1' and the next div tag, in the example below. I can't used 'heading2 to isolate the next div as this text may change.

library(XML)
# create example html
html <- '
<div class="AAA">
<div class="AAA">Heading1</div>
</div>
<p>text1 I want</p>
<p>text2 I want</p>
<p>text3 I want</p>
<div class="AAA">
<div class="AAA">Heading2</div> <!-- Do not always know what this heading is -->
</div>
<p>more text</p>
<p>more text</p>
<p>more text</p>
<div class="AAA">
<div class="AAA">Heading3</div>
</div>'

doc <- htmlParse(html)

xpath <- "//p[preceding::div[@class='AAA' and contains(., 'Heading1')]]"

xpathSApply(doc, xpath, xmlValue)

This works up to here, but I'm stuck with limiting the xpath at the next div. I have tried using the following, thinking it would get the next div.

"//p[preceding::div[@class='AAA' and contains(., 'Heading1')]and following::div[position()=1]]"

回答1:


I don't think it's necessary to test the next div. You should be able to do something like this...

//p[preceding-sibling::div[1][normalize-space()='Heading1']]

or this if the class matters...

//p[preceding-sibling::div[1][@class='AAA'][normalize-space()='Heading1']]

or this if you need to still use contains()...

//p[preceding-sibling::div[1][@class='AAA'][contains(normalize-space(),'Heading1')]]



回答2:


Try this one

//p[preceding-sibling::div[div="Heading1"] and count(preceding-sibling::div[div])=1]


来源:https://stackoverflow.com/questions/65200248/xpath-to-extract-text-between-specific-div-tag-and-next-div

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!