nokogiri | 易学教程

Installing nokogiri - Failed to build gem native extension

阅读更多关于 Installing nokogiri - Failed to build gem native extension

问题 While installing Nokogiri on Ubuntu 12, I got an error: Installing nokogiri (1.4.4) with native extensions Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /usr/bin/ruby1.9.1 extconf.rb extconf.rb:10: Use RbConfig instead of obsolete and deprecated Config. checking for libxml/parser.h... yes checking for libxslt/xslt.h... yes checking for libexslt/exslt.h... yes checking for iconv_open() in iconv.h... yes checking for xmlParseDoc() in -lxml2... yes checking

Nokogiri Xpath to retrieve text after within <TD> and

阅读更多关于 Nokogiri Xpath to retrieve text after within and

问题 I have the following html and like to know how to use xpath to retrieve all the info: - Name(first, last) - Nick Name - email - shipping address... Primarily, retrieve text after . Many Thanks in advance. <table> <tr> <td valign="top" width="50%" align="left"> Buyer FirstName LastName NickName First.Last@SomeCompany.com</td> <tr><td valign="top" width="40%" align="left"> Shipping address - confirmed FirstName LastName<br

How to get the raw HTML source code for a page by using Ruby or Nokogiri?

阅读更多关于 How to get the raw HTML source code for a page by using Ruby or Nokogiri?

问题 I'm using Nokogiri (Ruby Xpath library) to grep contents on web pages. Then I found problems with some web pages, such as Ajax web pages, and that means when I view source code I won't be seeing the exact contents such as <table> , etc. How can I get the HTML code for the actual content? 回答1: Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example: require 'open-uri' html = open('http:

How to scrape pages which have lazy loading

阅读更多关于 How to scrape pages which have lazy loading

问题 Here is the code which i used for parsing of web page.I did it in rails console.But i am not getting any output in my rails console.The site which i want to scrape is having lazy loading require 'nokogiri' require 'open-uri' page = 1 while true url = "http://www.justdial.com/functions"+"/ajxsearch.php?national_search=0&act=pagination&city=Delhi+%2F+NCR&search=Pandits"+"&where=Delhi+Cantt&catid=1195&psearch=&prid=&page=#{page}" doc = Nokogiri::HTML(open(url)) doc = Nokogiri::HTML(doc.at_css('

Getting nokogiri to use a newer version of libxml2

阅读更多关于 Getting nokogiri to use a newer version of libxml2

问题 I've been trying to get Nokogiri installed on my computer (Mountain Lion) to use with rspec and capybara, but for the life of me, I can't get it to run properly. From what I can tell, the issue is with nokogiri using the wrong version of libxml2. I've so far tried uninstalling and reinstalling libxml2 using Homebrew (making sure it's the most recent one), uninstalling and reinstalling nokogiri using bundle, and specifying the exact path to the libxml2 files that Homebrew installed when

Can I incorporate system libraries (e.g. libxml2) I compile against into a gem (e.g. nokogiri) that I can deploy to Heroku?

阅读更多关于 Can I incorporate system libraries (e.g. libxml2) I compile against into a gem (e.g. nokogiri) that I can deploy to Heroku?

问题 Nokogiri has a problem with translating to and from UTF-8 characters that turns out to come from libxml2, specifically version 2.7.6, which is the highest supported version on Ubuntu 10.04 LTS. The bug is fixed in version 2.7.7 and up, but since our app is hosted on Heroku (bamboo-ree-1.8.7 stack, based on Ubuntu 10.04), we have to use version 2.7.6, and continue to experience the bug, unless: Someone can/has hacked nokogiri to get around the problem Canonical bumps the supported libxml2

Nokogiri leaving HTML entities untouched

阅读更多关于 Nokogiri leaving HTML entities untouched

问题 I want Nokogiri to leave HTML entities untouched, but it seems to be converting the entities into the actual symbol. For example: Nokogiri::HTML.fragment('®').to_s results in: "®" Nothing seems to return the original HTML back to me. The .inner_html, .text, .content methods all return '®' instead of '®' Is there a way for Nokogiri to leave these HTML entities untouched? I've already searched stackoverflow and found similar questions, but nothing exactly like this one. 回答1: Not

unable to install nokogiri in ubuntu 12.04

阅读更多关于 unable to install nokogiri in ubuntu 12.04

问题 I am trying to setup rails env on my new ubuntu machine. But I am facing trouble while install nokogiri gem.. I have installed libxslt and libxml2 libs thourgh rvm pkg command as well as using apt-get. I thought it is showing me libxslt is missing error. Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /home/hacker5/.rvm/rubies/ruby-1.9.3-p194/bin/ruby extconf.rb --with-xslt-include=/usr/include/libxslt

How do I create XML using Nokogiri::XML::Builder with a hyphen in the element name?

阅读更多关于 How do I create XML using Nokogiri::XML::Builder with a hyphen in the element name?

问题 I am trying to build an XML document using Nokogiri. Some of the elements have hyphens in them. Here's an example: require "nokogiri" builder = Nokogiri::XML::Builder.new do |xml| xml.foo_bar "hello" end puts builder.to_xml Which produces: <?xml version="1.0"?> <foo_bar>hello</foo_bar> However, when I try: builder = Nokogiri::XML::Builder.new do |xml| xml.foo-bar "hello" end I get: syntax error, unexpected tSTRING_BEG, expecting kDO or '{' or '(' xml.foo-bar "hello" Now I realise this is

Is it possible to plug a JavaScript engine with Ruby and Nokogiri?

阅读更多关于 Is it possible to plug a JavaScript engine with Ruby and Nokogiri?

问题 I'm writing an application to crawl some websites and scrape data from them. I'm using Ruby, Curl and Nokogiri to do this. In most cases it's straightforward and I only need to ping a URL and parse the HTML data. The setup works perfectly fine. However, in some scenarios, the websites retrieve data based on user input on some radio buttons. This invokes some JavaScript which fetches some more data from the server. The generated URL and posted data is determined by JavaScript code. Is it