Nokogiri for selecting text and html between between unique sets of tags

白昼怎懂夜的黑 提交于 2019-12-11 09:17:30

问题


I am trying to use Nokogiri to extract the text in-between two unique sets of tags.

What is the best way to get the text within the p-tag in between <h2 class="point">The problem</h2> and <h2 class="point">The solution</h2>, and then all of the HTML between <h2 class="point">The solution</h2> and <div class="frame box sketh">?

Sample of the full html:

<h2 class="point">The problem</h2>
<p>TEXT I WANT </p>
<h2 class="point">The solution</h2>
HTML I WANT with it's own set of tags (but never an <h2> or <div>)
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

Thank you!


回答1:


require 'nokogiri'

doc = Nokogiri.HTML(DATA)
doc.search('//h2/following-sibling::node()[name() != "h2" and name() != "div" and text() != "\n"]').each do |block|
  p block.text
end

__END__
<h2 class="point">The problem</h2>
<p>TEXT I WANT</p>
<h2 class="point">The solution</h2>
<div>dont capture this</div>
<span>HTML I WANT with it's <p>own set <b>of</b> tags</p></span>
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

Output:

"TEXT I WANT"
"HTML I WANT with it's own set of tags"

This XPath selects all following sibling nodes of h2 which is not a h2, div or contains nothing but the string "\n".




回答2:


here is how you can get p tags text between two h2 that contains class point

//h2[@class="point"][1]/following-sibling::p[./following-sibling::h2[@class="point"]]/text()

for second one you should explore w3schools , and take first one as example and do it.



来源:https://stackoverflow.com/questions/12478272/nokogiri-for-selecting-text-and-html-between-between-unique-sets-of-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!