问题
I want to extract text in <p>
between the div
tag 'Heading1' and the next div
tag, in the example below. I can't used 'heading2 to isolate the next div
as this text may change.
library(XML)
# create example html
html <- '
<div class="AAA">
<div class="AAA">Heading1</div>
</div>
<p>text1 I want</p>
<p>text2 I want</p>
<p>text3 I want</p>
<div class="AAA">
<div class="AAA">Heading2</div> <!-- Do not always know what this heading is -->
</div>
<p>more text</p>
<p>more text</p>
<p>more text</p>
<div class="AAA">
<div class="AAA">Heading3</div>
</div>'
doc <- htmlParse(html)
xpath <- "//p[preceding::div[@class='AAA' and contains(., 'Heading1')]]"
xpathSApply(doc, xpath, xmlValue)
This works up to here, but I'm stuck with limiting the xpath at the next div. I have tried using the following, thinking it would get the next div
.
"//p[preceding::div[@class='AAA' and contains(., 'Heading1')]and following::div[position()=1]]"
回答1:
I don't think it's necessary to test the next div. You should be able to do something like this...
//p[preceding-sibling::div[1][normalize-space()='Heading1']]
or this if the class matters...
//p[preceding-sibling::div[1][@class='AAA'][normalize-space()='Heading1']]
or this if you need to still use contains()
...
//p[preceding-sibling::div[1][@class='AAA'][contains(normalize-space(),'Heading1')]]
回答2:
Try this one
//p[preceding-sibling::div[div="Heading1"] and count(preceding-sibling::div[div])=1]
来源:https://stackoverflow.com/questions/65200248/xpath-to-extract-text-between-specific-div-tag-and-next-div