Watir scraping sequential elements : so simple, but no

倖福魔咒の 提交于 2019-12-10 12:24:30

问题


This is so simple... I want to scrap some web page like that with watir (gem of ruby:)

<div class="Time">time1</div> 
<div class="Locus">locus1</div>
<div class="Locus">locus2</div>
<div class="Time">time2</div>
<div class="Locus">locus3</div>
<div class="Time">time3</div>
<div class="Locus">locus4</div>
<div class="Locus">locus5</div>
<div class="Locus">locus6</div>
<div class="Time">time4</div>
etc..

The result should be an array like that :

time1 locus1
time1 locus2
time2 locus3
time3 locus4
time3 locus5
time3 locus6
time4 xxx

All the divs are at the same level (not imbricated). No way to find the solution using the watir methods... Thx for your help


回答1:


For each Locus element, you can retrieve the preceding Time element via the #preceding_sibling method:

result = browser.divs(class: 'Locus').map do |div|
  time = div.preceding_sibling(class: 'Time').text
  locus = div.text
  "#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]

Note that if the list is long, you may want to retrieve the HTML via Watir but then do the parsing in Nokogiri. This would save a lot of execution time, but at the cost of readability.

doc = Nokogiri::HTML.parse(browser.html) # where `browser` is the usual Watir::Browser
result = doc.css('.Locus').map do |div|
  time = div.at('./preceding-sibling::div[@class="Time"]').text
  locus = div.text
  "#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]


来源:https://stackoverflow.com/questions/51315038/watir-scraping-sequential-elements-so-simple-but-no

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!