nokogiri | 易学教程

Segmentation fault when I run rails S (cant compile nokogiri)

阅读更多关于 Segmentation fault when I run rails S (cant compile nokogiri)

问题 I have been in configuration hell for two days and I have tried just about everything on Stack Overflow to fix it. I feel like some of the stuff I have tried may have made things worse. I was using RVM, then I tried using rbenv, and now I am back to using RVM again. I am not sure if there are remnants of rbenv that are causing this or what but I followed the instructions to remove it completely. The error I am getting currently is here: https://gist.github.com/EvanTedesco/d4288cfb1f8dfc5a1e03

Ruby Watir Gem, Timing Out on Form Input

阅读更多关于 Ruby Watir Gem, Timing Out on Form Input

问题 I'm practicing webscraping using Watir, Mechanize and Nokigiri gems. I'm running into an issue with my Watir script. My plan is to get a list of prices from flights via http://tripadvisor.com/. When I run the script, the Chrome browser opens as it should, the script proceeds to fill out the first parts of the form, origin and destination and then it halts. Here is the error message I'm getting: This code has slept for the duration of the default timeout waiting for an Element to be present.

Ruby Conditional argument to method

阅读更多关于 Ruby Conditional argument to method

问题 I have some 'generic' methods that extract data based on css selectors that usually are the same in many websites. However I have another method that accept as argument the css selector for a given website. I need to call the get_title method if title_selector argument is nos passed. How can I do that? Scrape that accept css selectors as arguments def scrape(urls, item_selector, title_selector, price_selector, image_selector) collection = [] urls.each do |url| doc = Nokogiri::HTML(open(url)

How do I loop through items in XML in Nokogiri in Ruby? [closed]

阅读更多关于 How do I loop through items in XML in Nokogiri in Ruby? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Given I have XML from this address, how would I loop through the search/events/event events? What I mean is loop through the event items and print their details to the screen? So far I have the following code: get_xml('/events/search', :location => 'London, United Kingdom', :date => 'Today').at('events') get_xml

Using the Mechanize gem with the Nokogirl gem?

阅读更多关于 Using the Mechanize gem with the Nokogirl gem?

问题 I'm trying to scrape a website that requires authentication to get an element on a page with an id of #cellTotal . Right now, using Mechanize I have logged into the page I want to access, but using basic Nokogiri functions like: @selector = page.css("#cellTotal").text Gives me this error: undefined method `css' for #<Mechanize::Page:0x61234f8> Here is what I have so far: agent = Mechanize.new agent.get("example.com") agent.page.forms[0]["username_field"] = "username" agent.page.forms[0][

nokogiri not installing correctly for ruby-1.9.1

阅读更多关于 nokogiri not installing correctly for ruby-1.9.1

问题 I have recently installedHi everyone, I have recently installed ruby-1.9.1 using rvm. I have tried installing nokogiri following this guide: https://github.com/tenderlove/nokogiri/wiki/what-to-do-if-libxml2-is-being-a-jerk but I am still getting the following error once the gem is installed: HI. You're using libxml2 version 2.6.16 which is over 4 years old and has plenty of bugs. We suggest that for maximum HTML/XML parsing pleasure, you upgrade your version of libxml2 and re-install nokogiri

How do I remove HTTP links with ActiveSupport's “starts_with” using Nokogiri?

阅读更多关于 How do I remove HTTP links with ActiveSupport's “starts_with” using Nokogiri?

问题 When I try this: item.css("a").each do |a| if !a.starts_with? 'http://' a.replace a.content end end I get: NoMethodError: undefined method 'starts_with?' for #<Nokogiri::XML::Element:0x1b48a60> EDIT: Sure there is a cleaner way, but this seems to be working. item.css("a").each do |a| unless a["href"].blank? if !a["href"].starts_with? 'http://' a.replace a.content end end end 回答1: The problem is you're trying to use the starts_with method on an object that doesn't implement it. item.css("a")

Function 'xsltParseStylesheetDoc' not found in [libxml2.so]

阅读更多关于 Function 'xsltParseStylesheetDoc' not found in [libxml2.so]

问题 This error comes up in Redhat Enterprise Linux Server 5.4 - 64 bit. Linux rhl-64-tibbr5 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux There is also this error in the stack trace. uninitialized constant Nokogiri::VERSION_INFO More version details: jruby-1.4.0RC1 ruby/gems/1.8/gems/activesupport-2.3.4 Any idea? 回答1: After wasting a few hours on this issue, we realized we didn't need nokogiri in our application. So we got it rid of it and these errors

How to scrape data using Ruby which is generated by a Javascript function?

阅读更多关于 How to scrape data using Ruby which is generated by a Javascript function?

问题 I am trying to scrape the data url link from the latest date (first row of the table) from this page. But it seems like the content of the table is generated by a Javascript function. I tried using Nokogiri to get it but in vain as nokogiri can not scrape Javascript. Then, I tried to get the script part only using Nokogiri by using: url = "http://www.sgx.com/wps/portal/sgxweb/home/marketinfo/historical_data/derivatives/daily_data" doc = Nokogiri::HTML(open(url)) js = doc.css("script").text

ruby (1.8.7): How to get rid of non-printable chars while scraping?

阅读更多关于 ruby (1.8.7): How to get rid of non-printable chars while scraping?

问题 I'm trying to parse an HTML page with Nokogiri but I'm having some issues with text. Mainly, I cannot get rid of unwanted chars. While parsing, when I obtain a String I always try to clean it as much as possible. I try to convert nonprintable chars to unique spaces. I use this method without success after a lot of modifications: def clear_string(str) CGI::unescapeHTML(str).gsub(/\s+/mu," ").strip end For instance, supose this HTML fragment (copy-pasted from http://www.gisa.cat/gisa/servlet