How to handle 404 not found errors in Nokogiri

微笑、不失礼 提交于 2019-12-12 07:50:13

问题


I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?

http://yoursite/page/38475 #=> page number 38475 doesn't exist

I tried the following which didn't work.

url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
  begin
    rescue Exception => e
      puts "Try again later"
  end
end

回答1:


It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises an error in case of finding 404 status. The following code should work:

url = 'http://yoursite/page/38475'
begin
  file = open(url)
  doc = Nokogiri::HTML(file) do
    # handle doc
  end
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    # handle 404 error
  else
    raise e
  end
end

BTW, about rescuing Exception: Why is it a bad style to `rescue Exception => e` in Ruby?



来源:https://stackoverflow.com/questions/18270596/how-to-handle-404-not-found-errors-in-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!