How do I scrape data from a page that loads specific data after the main page load?

只谈情不闲聊 提交于 2019-12-04 17:23:35

I am not sure how to do it with Open-URI, but if you want to use Watir-Webdriver, the following works.

require 'watir-webdriver'
b = Watir::Browser.new
b.goto('http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358')
puts b.h3(:class, 'order-num').when_present.text

Note that a when_present() is performed on the h3 tag. What this means is that the script will wait for the h3 to appear before trying to get its text. If you know there are parts that take time to load, adding an explicit wait usually solves the problem.

Try installing Capybara-webkit (make sure you have QtWebKit installed, otherwise the gem install would fail). This will give you a headless solution. Then try this:

require 'capybara-webkit'
require 'capybara/dsl'
require 'nokogiri'
require 'open-uri'

url = 'http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358'
#change the capybara config to DSL and to use webkit
include Capybara::DSL
Capybara.current_driver = :webkit
visit(url)
doc = Nokogiri::HTML.parse(body)

then parse the body as you would normally. To remove all that error messages try this:

Capybara.register_driver :webkit do |app|
  Capybara::Driver::Webkit.new(app, :stdout => nil)
end

Following @benaneesh's answer I had to make slight modifications to get it to work in my ruby script and not show the unknown url messages...

require 'capybara-webkit'
require 'capybara/dsl'
require 'nokogiri'
require 'open-uri'

include Capybara::DSL
Capybara.current_driver = :webkit

Capybara::Webkit.configure do |config|
  config.block_unknown_urls
  config.allow_url("*mysite.com")
end

#... rest of code
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!