open-uri | 易学教程

XML => HTML with Hpricot and Rails

阅读更多关于 XML => HTML with Hpricot and Rails

问题 I have never worked with web services and rails, and obviously this is something I need to learn. I have chosen to use hpricot because it looks great. Anyway, _why's been nice enough to provide the following example on the hpricot website: #!ruby require 'hpricot' require 'open-uri' # load the RedHanded home page doc = Hpricot(open("http://redhanded.hobix.com/index.html")) # change the CSS class on links (doc/"span.entryPermalink").set("class", "newLinks") # remove the sidebar (doc/"#sidebar"

HTTP redirection loop RuntimeError in open_uri_redirections gem

阅读更多关于 HTTP redirection loop RuntimeError in open_uri_redirections gem

问题 Thanks for your time. I'm working up a Ruby script to parse a CSV of urls and evaluate them on a variety of dimensions to see if certain tags and attributes are present or confirm to certain pattern. I'm using Nokogiri, open-uri and the patch for open-url to allow the script to follow redirections, open_uri_redirections. On a handful of problematic domains, I encounter an error and the script encounters a runtime error: Loading https://www.exampleproblemdomain.com C:/Ruby24-x64/lib/ruby/2.4.0

How can I QUICKLY get a string from one of the first couple lines of a long CSV at a remote URL?

阅读更多关于 How can I QUICKLY get a string from one of the first couple lines of a long CSV at a remote URL?

问题 I'm working on an assignment where I retrieve several stock prices from online, using Yahoo's stock price system. Unfortunately, the Yahoo API I'm required to use returns a .csv file that apparently contains a line for every single day that stock has been traded, which is at least 5 thousand lines for the stocks I'm working with, and over 10 thousand lines for some of them (example). I only care about the current price, though, which is in the second line. I'm currently doing this: require

HTML is read before fully loaded using open-uri and nokogiri

阅读更多关于 HTML is read before fully loaded using open-uri and nokogiri

问题 I'm using open-uri and nokogiri with ruby to do some simple webcrawling. There's one problem that sometimes html is read before it is fully loaded. In such cases, I cannot fetch any content other than the loading-icon and the nav bar. What is the best way to tell open-uri or nokogiri to wait until the page is fully loaded? Currently my script looks like: require 'nokogiri' require 'open-uri' url = "https://www.the-page-i-wanna-crawl.com" doc = Nokogiri::HTML(open(url, ssl_verify_mode: OpenSSL

Open URI - Invalid URI Error, encoding/escaping not affecting

阅读更多关于 Open URI - Invalid URI Error, encoding/escaping not affecting

问题 I'm building out a YahooFinance Api and keep hitting a brick wall when trying to use open URI. Code: uri = ("http://ichart.finance.yahoo.com/table.csv?s=#{URI.escape(code)}&a=#{start_month}&b=#{start_day}&c=#{start_year}&d=#{end_month}&e=#{end_day}&f=#{end_year}&g=d&ignore=.csv") puts "#{uri}" conn = open(uri) Error: `split': bad URI(is not URI?): http://ichart.finance.yahoo.com/table.csv?s=%255EIXIC&a=00&b=1&c=1994&d=09&e=14&f=2014&g=d&ignore=.csv} (URI::InvalidURIError) I have tried URI

Ruby open-uri proxy authentication fails

阅读更多关于 Ruby open-uri proxy authentication fails

问题 I'm coding a native Ruby script to scrap a website using Nokogiri, whenever I pass proxy options to the open-uri open() method, it returns 407 Proxy Authentication Required but my options does have the authentification details, here's my code proxy_url = URI.parse("http://12.34.567.89:PORT") session = Nokogiri::HTML(open("http://google.com", :proxy_http_basic_authentication =>[proxy_url, "username", "password"] Note: As my proxy is premium, I have replaced real proxy credentials with fake one

Get image from url with cookies in ruby

阅读更多关于 Get image from url with cookies in ruby

问题 As you know, some captchas are generating using user session, and i must to somehow save to computer this image, for testing our app, but how, and what better to choise? For example on http::get i have such code, with cookies: http = Net::HTTP.new('***', 443) http.use_ssl = true http.verify_mode = OpenSSL::SSL::VERIFY_NONE path = '****' # GET request -> so the host can set his cookies resp, data = http.get(path) body_text = resp.body #puts "Body = #{body_text}" cookie = resp.response['set

Trouble opening utf-8 URI's with Ruby's 'open-uri'

阅读更多关于 Trouble opening utf-8 URI's with Ruby's 'open-uri'

问题 I'm trying to get Danish location addresses from google maps web services API with ruby and open-uri. Trying to get Ærø, Denmark : http://maps.googleapis.com/maps/api/geocode/json?address=ærø&sensor=false&region=dk works in Chrome does not with open-uri: require 'rubygems' require "open-uri" require 'json' uri = "http://maps.googleapis.com/maps/api/geocode/json?address=ærø&sensor=false&region=dk" response = open(uri) array = JSON.parse(response) pp array Here it yields /usr/lib/ruby/1.8/uri

open_uri / Nokogiri redirection problems

阅读更多关于 open_uri / Nokogiri redirection problems

问题 I am using Nokogiri for scraping a webpage that works fine unless the page has a redirection loop. So when I scraping this site: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/ I get this error /home/balint/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:224:in open_loop': redirection forbidden: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/ -> http://www.facebook.com/cardcomplete (RuntimeError) But when I try to scrape this site I get the same error but now it is

`open_http': 403 Forbidden (OpenURI::HTTPError) for the string “Steve_Jobs” but not for any other string

阅读更多关于 `open_http': 403 Forbidden (OpenURI::HTTPError) for the string “Steve_Jobs” but not for any other string

问题 I was going through the Ruby tutorials provided at http://ruby.bastardsbook.com/ and I encountered the following code: require "open-uri" remote_base_url = "http://en.wikipedia.org/wiki" r1 = "Steve_Wozniak" r2 = "Steve_Jobs" f1 = "my_copy_of-" + r1 + ".html" f2 = "my_copy_of-" + r2 + ".html" # read the first url remote_full_url = remote_base_url + "/" + r1 rpage = open(remote_full_url).read # write the first file to disk file = open(f1, "w") file.write(rpage) file.close # read the first url