nokogiri | 易学教程

How to prevent Nokogiri from adding <DOCTYPE> tags?

阅读更多关于 How to prevent Nokogiri from adding tags?

问题 I noticed something strange using Nokogiri recently. All of the HTML I had been parsing had been given start and end <html> and <body> tags. <!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n How can I prevent Nokogiri from doing this? I.E., when I do: doc = Nokogiri::HTML("<div>some content</div>") doc.to_s or: doc.to_html I get the original: <html blah><body>div>some content</div></body></html> 回答1: The problem

Find and replace HTML tags

阅读更多关于 Find and replace HTML tags

问题 I have the following HTML: <html> <body> <h1>Foo</h1> <p>The quick brown fox.</p> <h1>Bar</h1> <p>Jumps over the lazy dog.</p> </body> </html> I'd like to change it into the following HTML: <html> <body> <p class="title">Foo</p> <p>The quick brown fox.</p> <p class="title">Bar</p> <p>Jumps over the lazy dog.</p> </body> </html> How can I find and replace certain HTML tags? I can use the Nokogiri gem. 回答1: Try this: require 'nokogiri' html_text = "<html><body><h1>Foo</h1><p>The quick brown fox

OS X Lion, Attempting Nokogiri install - libxml2 is missing

阅读更多关于 OS X Lion, Attempting Nokogiri install - libxml2 is missing

问题 sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/sajeev86/.rvm/rubies/ruby-1.8.7-p352/bin/ruby extconf.rb checking for libxml/parser.h... no ----- libxml2 is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies. ----- *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary

Nokogiri, open-uri, and Unicode Characters

阅读更多关于 Nokogiri, open-uri, and Unicode Characters

I'm using Nokogiri and open-uri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What's the best way to deal with these? Here's what I'm doing: require 'open-uri' require 'nokogiri' doc = Nokogiri::HTML(open(link)) title = doc.at_css("title") At this point, the title looks like this: Rag\303\271 Instead of: Ragù How can I have nokogiri return the proper character (e.g. ù in this case)? Here's an example URL: http://www.epicurious.com/recipes/food/views/Tagliatelle-with-Duck-Ragu-242037 When you say "looks like this," are you viewing this value

Installing Nokogiri on OSX 10.10 Yosemite

阅读更多关于 Installing Nokogiri on OSX 10.10 Yosemite

I recently upgrade to the 10.10 Yosemite beta, but I'm having trouble getting Nokogiri installed. I'm using RVM and Ruby 1.9.3. I've also followed the steps here and tried following the instructions on Nokogiri's homepage. I've installed libxml2 (2.9.1) and libxslt (1.1.28) via homebrew, and have tried using the command line tools from both my Xcode 5 install and Xcode 6 beta. gem install nokogiri -v '1.5.5' Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/grantdavis/.rvm/rubies/ruby-1.9.3-p362/bin/ruby

How to access attributes using Nokogiri

阅读更多关于 How to access attributes using Nokogiri

问题 I have a simple task of accessing the values of some attributes. This is a simple script that uses Nokogiri::XML::Builder to create a simple XML doc. require 'nokogiri' builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml| xml.Placement(:messageId => "392847-039820-938777", :system => "MOD", :version => "2.0") { xml.objects { xml.object(:myattribute => "99", :anotherattrib => "333") xml.nextobject_ '9387toot' xml.Entertainment "Last Man Standing" } } end puts builder.to_xml puts

Error installing nokogiri: Failed to build gem native extension & libiconv is missing (OSX)

阅读更多关于 Error installing nokogiri: Failed to build gem native extension & libiconv is missing (OSX)

问题 I try to clone this repo and run bundle install . The bundle process failed and throw this error: ... Installing nokogiri 1.6.2.1 with native extensions Building nokogiri using packaged libraries. Gem::Ext::BuildError: ERROR: Failed to build gem native extension. /Users/zulhilmizainudin/.rvm/rubies/ruby-2.2.0/bin/ruby -r ./siteconf20151130-43880-pntnc6.rb extconf.rb Building nokogiri using packaged libraries. ----- libiconv is missing. please visit http://nokogiri.org/tutorials/installing

Mac user and getting WARNING: Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.7.3

阅读更多关于 Mac user and getting WARNING: Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.7.3

I have done all kinds of research and tried many different things. I know this question has been answered many times, but none of the suggested solutions are working for me. After upgrading to Lion I am getting segmentation faults in Ruby. I'm fairly confident it's Nokogiri. So I installed libxml2 via Homebrew. I ran brew link libxml2 . Then I reinstalled Nokogiri using that version of the library. For proof: $ nokogiri -v # Nokogiri (1.5.0) --- warnings: [] nokogiri: 1.5.0 ruby: version: 1.9.2 platform: x86_64-darwin11.0.0 description: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64

Convert a Nokogiri document to a Ruby Hash

阅读更多关于 Convert a Nokogiri document to a Ruby Hash

Is there an easy way to convert a Nokogiri XML document to a Hash? Something like Rails' Hash.from_xml . I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML ( http://github.com/Empact/roxml/tree ) which maps xml elements to ruby objects; it is built atop libxml. # USAGE: Hash.from_libxml(YOUR_XML_STRING) require 'xml/libxml' # adapted from # http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0 class Hash class << self def from_libxml(xml, strict=true) begin

extract single string from HTML using Ruby/Mechanize (and Nokogiri)

阅读更多关于 extract single string from HTML using Ruby/Mechanize (and Nokogiri)

问题 I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath. Sample code: require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip puts post_page