Adjusting timeouts for Nokogiri connections

主宰稳场 提交于 2019-12-22 18:37:32

问题


Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this?

# this is a method from the eng class
def get_page(url,page_type)
 begin
  timeout(10) do
    # Get a Nokogiri::HTML::Document for the page we’re interested in...
    @@doc = Nokogiri::HTML(open(url))
  end
 rescue Timeout::Error
  puts "Time out connection request"
  raise
  end
end

 # this is a snippet from the main app calling eng class
 # receives a hash with urls and goes throgh asking one by one
 def retrieve_in_loop(links)
  (0..links.length).each do |idx|
    url = links[idx]
    puts "Visiting link #{idx} of #{links.length}"
    puts "link: #{url}"
    begin
        @@eng.get_page(url, product)
    rescue Exception => e
        puts "Error getting url: #{idx} #{url}"
        puts "This link will be skeeped. Continuing with next one"
    end
  end
end

回答1:


The timeout block is simply the max time that that code has to execute inside the block without triggering an exception. It does not affect anything inside Nokogiri or OpenURI.

You can set the timeout to a year, but OpenURI can still time out whenever it likes.

So your problem is most likely that OpenURI is timing out on the connection attempt itself. Nokogiri has no timeouts; it's just a parser.

Adjusting read timeout

The only timeout you can adjust on OpenURI is the read timeout. It seems you cannot change the connection timeout through this method:

open(url, :read_timeout => 10)

Adjusting connection timeout

To adjust the connection timeout you would have to go with Net::HTTP directly instead:

uri = URI.parse(url)

http = Net::HTTP.new(uri.host, uri.port)
http.open_timeout = 10
http.read_timeout = 10

response = http.get(uri.path)

Nokogiri.parse(response.body)

You can also take a look at some additional discussion here:

Ruby Net::HTTP time out
Increase timeout for Net::HTTP



来源:https://stackoverflow.com/questions/14498653/adjusting-timeouts-for-nokogiri-connections

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!