open-uri

XML => HTML with Hpricot and Rails

倾然丶 夕夏残阳落幕 提交于 2019-12-14 02:29:37
问题 I have never worked with web services and rails, and obviously this is something I need to learn. I have chosen to use hpricot because it looks great. Anyway, _why's been nice enough to provide the following example on the hpricot website: #!ruby require 'hpricot' require 'open-uri' # load the RedHanded home page doc = Hpricot(open("http://redhanded.hobix.com/index.html")) # change the CSS class on links (doc/"span.entryPermalink").set("class", "newLinks") # remove the sidebar (doc/"#sidebar"

HTTP redirection loop RuntimeError in open_uri_redirections gem

血红的双手。 提交于 2019-12-12 22:19:57
问题 Thanks for your time. I'm working up a Ruby script to parse a CSV of urls and evaluate them on a variety of dimensions to see if certain tags and attributes are present or confirm to certain pattern. I'm using Nokogiri, open-uri and the patch for open-url to allow the script to follow redirections, open_uri_redirections. On a handful of problematic domains, I encounter an error and the script encounters a runtime error: Loading https://www.exampleproblemdomain.com C:/Ruby24-x64/lib/ruby/2.4.0

How can I QUICKLY get a string from one of the first couple lines of a long CSV at a remote URL?

让人想犯罪 __ 提交于 2019-12-12 19:18:47
问题 I'm working on an assignment where I retrieve several stock prices from online, using Yahoo's stock price system. Unfortunately, the Yahoo API I'm required to use returns a .csv file that apparently contains a line for every single day that stock has been traded, which is at least 5 thousand lines for the stocks I'm working with, and over 10 thousand lines for some of them (example). I only care about the current price, though, which is in the second line. I'm currently doing this: require

HTML is read before fully loaded using open-uri and nokogiri

纵饮孤独 提交于 2019-12-12 08:47:44
问题 I'm using open-uri and nokogiri with ruby to do some simple webcrawling. There's one problem that sometimes html is read before it is fully loaded. In such cases, I cannot fetch any content other than the loading-icon and the nav bar. What is the best way to tell open-uri or nokogiri to wait until the page is fully loaded? Currently my script looks like: require 'nokogiri' require 'open-uri' url = "https://www.the-page-i-wanna-crawl.com" doc = Nokogiri::HTML(open(url, ssl_verify_mode: OpenSSL

Open URI - Invalid URI Error, encoding/escaping not affecting

岁酱吖の 提交于 2019-12-12 05:48:47
问题 I'm building out a YahooFinance Api and keep hitting a brick wall when trying to use open URI. Code: uri = ("http://ichart.finance.yahoo.com/table.csv?s=#{URI.escape(code)}&a=#{start_month}&b=#{start_day}&c=#{start_year}&d=#{end_month}&e=#{end_day}&f=#{end_year}&g=d&ignore=.csv") puts "#{uri}" conn = open(uri) Error: `split': bad URI(is not URI?): http://ichart.finance.yahoo.com/table.csv?s=%255EIXIC&a=00&b=1&c=1994&d=09&e=14&f=2014&g=d&ignore=.csv} (URI::InvalidURIError) I have tried URI

Ruby open-uri proxy authentication fails

可紊 提交于 2019-12-11 11:33:53
问题 I'm coding a native Ruby script to scrap a website using Nokogiri, whenever I pass proxy options to the open-uri open() method, it returns 407 Proxy Authentication Required but my options does have the authentification details, here's my code proxy_url = URI.parse("http://12.34.567.89:PORT") session = Nokogiri::HTML(open("http://google.com", :proxy_http_basic_authentication =>[proxy_url, "username", "password"] Note: As my proxy is premium, I have replaced real proxy credentials with fake one

Get image from url with cookies in ruby

吃可爱长大的小学妹 提交于 2019-12-11 09:27:38
问题 As you know, some captchas are generating using user session, and i must to somehow save to computer this image, for testing our app, but how, and what better to choise? For example on http::get i have such code, with cookies: http = Net::HTTP.new('***', 443) http.use_ssl = true http.verify_mode = OpenSSL::SSL::VERIFY_NONE path = '****' # GET request -> so the host can set his cookies resp, data = http.get(path) body_text = resp.body #puts "Body = #{body_text}" cookie = resp.response['set

Trouble opening utf-8 URI's with Ruby's 'open-uri'

南楼画角 提交于 2019-12-10 21:17:22
问题 I'm trying to get Danish location addresses from google maps web services API with ruby and open-uri. Trying to get Ærø, Denmark : http://maps.googleapis.com/maps/api/geocode/json?address=ærø&sensor=false&region=dk works in Chrome does not with open-uri: require 'rubygems' require "open-uri" require 'json' uri = "http://maps.googleapis.com/maps/api/geocode/json?address=ærø&sensor=false&region=dk" response = open(uri) array = JSON.parse(response) pp array Here it yields /usr/lib/ruby/1.8/uri

open_uri / Nokogiri redirection problems

给你一囗甜甜゛ 提交于 2019-12-10 18:55:19
问题 I am using Nokogiri for scraping a webpage that works fine unless the page has a redirection loop. So when I scraping this site: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/ I get this error /home/balint/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:224:in open_loop': redirection forbidden: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/ -> http://www.facebook.com/cardcomplete (RuntimeError) But when I try to scrape this site I get the same error but now it is

`open_http': 403 Forbidden (OpenURI::HTTPError) for the string “Steve_Jobs” but not for any other string

浪子不回头ぞ 提交于 2019-12-10 02:23:50
问题 I was going through the Ruby tutorials provided at http://ruby.bastardsbook.com/ and I encountered the following code: require "open-uri" remote_base_url = "http://en.wikipedia.org/wiki" r1 = "Steve_Wozniak" r2 = "Steve_Jobs" f1 = "my_copy_of-" + r1 + ".html" f2 = "my_copy_of-" + r2 + ".html" # read the first url remote_full_url = remote_base_url + "/" + r1 rpage = open(remote_full_url).read # write the first file to disk file = open(f1, "w") file.write(rpage) file.close # read the first url