How to GET a URL with User-Agent and timeout through some Proxy in Ruby?

帅比萌擦擦* 提交于 2019-12-08 00:53:50

问题


How do I get a URL if I need to get it through some proxy, it has to have a timeout of max n. seconds, and a User-Agent?

   require 'nokogiri'
   require 'net/http'
   require 'rexml/document'

   def get_with_max_wait(param, proxy, timeout)
     url = "http://example.com/?p=#{param}"
     uri = URI.parse(url)
     proxy_uri = URI.parse(proxy)
     http = Net::HTTP.new(uri.host, 80, proxy_uri.host, proxy_uri.port)
     http.open_timeout = timeout
     http.read_timeout = timeout
     response = http.get(url)
     doc = Nokogiri.parse(response.body)
     doc.css(".css .goes .here")[0].content.strip
   end

The code above gets a URL through a proxy with timeout, but it's missing the User-Agent. How do I get it with User-Agent?


回答1:


You should use open-uri and set the user agent as parameter in open function .

Below is an example where I am setting user Agent in a variable and using that as parameter in open function .

    require 'rubygems'
    require 'nokogiri'
    require 'open-uri'

    user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.854.0 Safari/535.2"

    url = "http://www.somedomain.com/somepage/"

    @doc = Nokogiri::HTML(open(url, 'proxy' => 'http://(ip_address):(port)', 'User-Agent' => user_agent, 'read_timeout' => 10 ), nil, "UTF-8")

There is an option to set readtime out in openURI

You can review the documentation of Open URI in the below link

Open URI documentation



来源:https://stackoverflow.com/questions/24383940/how-to-get-a-url-with-user-agent-and-timeout-through-some-proxy-in-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!