open-uri

Iterating through multiple URLs to parse HTML with Nokogori

醉酒当歌 提交于 2019-12-23 03:15:08
问题 What I'm trying to do is scrape the names and prices of items from multiple vendors using Nokogiri. I'm passing the CSS selectors (to the find names and prices) to Nokogiri with method arguments. Any guidance on how to pass multiple URLs to the "scrape" method while also passing the other arguments (ex: vendor, item_path)? Or am I going about this the completely wrong way? Here is the code: require 'rubygems' # Load Ruby Gems require 'nokogiri' # Load Nokogiri require 'open-uri' # Load Open

Iterating through multiple URLs to parse HTML with Nokogori

大城市里の小女人 提交于 2019-12-23 03:15:04
问题 What I'm trying to do is scrape the names and prices of items from multiple vendors using Nokogiri. I'm passing the CSS selectors (to the find names and prices) to Nokogiri with method arguments. Any guidance on how to pass multiple URLs to the "scrape" method while also passing the other arguments (ex: vendor, item_path)? Or am I going about this the completely wrong way? Here is the code: require 'rubygems' # Load Ruby Gems require 'nokogiri' # Load Nokogiri require 'open-uri' # Load Open

Adjusting timeouts for Nokogiri connections

主宰稳场 提交于 2019-12-22 18:37:32
问题 Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc =

Adjusting timeouts for Nokogiri connections

醉酒当歌 提交于 2019-12-22 18:37:00
问题 Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc =

Ruby Proxy Authentication GET/POST with OpenURI or net/http

吃可爱长大的小学妹 提交于 2019-12-22 05:51:07
问题 I'm using ruby 1.9.3 and trying to use open-uri to get a url and try posting using Net:HTTP Im trying to use proxy authentication for both: Trying to do a POST request with net/http : require 'net/http' require 'open-uri' http = Net::HTTP.new("google.com", 80) headers = { 'User-Agent' => 'Ruby 193'} resp, data = http.post("/", "name1=value1&name2=value2", headers) puts data And for open-uri which I can't get to do POST I use: data = open("http://google.com/","User-Agent"=> "Ruby 193").read

Display HTTP headers using Open::URI?

穿精又带淫゛_ 提交于 2019-12-22 04:35:29
问题 with Open::URI, I can do the following: require 'open-uri' #check status open('http://google.com').status #get entire html open('http://google.com').read Is it possible to get the HTTP headers of a request so things can be debugged, something like Curls' curl -I http://google.com ? $ curl -I google.com HTTP/1.1 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Date: Mon, 17 Dec 2012 14:28:17 GMT Expires: Wed, 16 Jan 2013 14:28:17 GMT Cache-Control:

open-uri is not redirecing http to https

为君一笑 提交于 2019-12-19 13:12:07
问题 I am using Hpricot and OpenURI to parse webpages and extract URLs from them. When I get a link like "http:rapidshare.com", it is not redirecting to https. This is the error I got: /home/leonidus/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/open-uri.rb:216:in `open_loop': redirection forbidden: http:.................=> https:......................... . . I tried to use the exception handler OPENURI::HTTPREDIRECT but then again I am getting the same error. I tried all the blogs but it is not

open-uri is not redirecing http to https

时光毁灭记忆、已成空白 提交于 2019-12-19 13:12:03
问题 I am using Hpricot and OpenURI to parse webpages and extract URLs from them. When I get a link like "http:rapidshare.com", it is not redirecting to https. This is the error I got: /home/leonidus/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/open-uri.rb:216:in `open_loop': redirection forbidden: http:.................=> https:......................... . . I tried to use the exception handler OPENURI::HTTPREDIRECT but then again I am getting the same error. I tried all the blogs but it is not

How do I make a POST request with open-uri?

旧城冷巷雨未停 提交于 2019-12-17 17:56:47
问题 Is it possible to make a POST request from Ruby with open-uri? 回答1: Unfortunately open-uri only supports the GET verb. You can either drop down a level and use net/http , or use rest-open-uri , which was designed to support POST and other verbs. You can do gem install rest-open-uri to install it. 回答2: require 'open-uri' require 'net/http' params = {'param1' => 'value1', 'param2' => 'value2'} url = URI.parse('http://thewebsite.com/thepath') resp, data = Net::HTTP.post_form(url, params) puts

Ruby 2 Upgrade Breaks Nokogiri and/or open-uri Encoding?

风流意气都作罢 提交于 2019-12-14 04:17:09
问题 I have a mystery to solve when upgrading our Rails3.2 Ruby 1.9 app to a Rails3.2 Ruby 2.1.2 one. Nokogiri seems to break, in that it changes its behavior using open-uri. No gem versions are changed, just the ruby version (this is all on OSX Mavericks, using brew, gcc4 etc). Steps to reproduce: $ ruby -v ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-darwin13.1.0] $ rails console Connecting to database specified by database.yml Loading development environment (Rails 3.2.18) > feed =