nokogiri

How to prevent Nokogiri from adding <DOCTYPE> tags?

坚强是说给别人听的谎言 提交于 2019-11-27 04:33:03
问题 I noticed something strange using Nokogiri recently. All of the HTML I had been parsing had been given start and end <html> and <body> tags. <!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n How can I prevent Nokogiri from doing this? I.E., when I do: doc = Nokogiri::HTML("<div>some content</div>") doc.to_s or: doc.to_html I get the original: <html blah><body>div>some content</div></body></html> 回答1: The problem

Find and replace HTML tags

ぐ巨炮叔叔 提交于 2019-11-27 04:31:10
问题 I have the following HTML: <html> <body> <h1>Foo</h1> <p>The quick brown fox.</p> <h1>Bar</h1> <p>Jumps over the lazy dog.</p> </body> </html> I'd like to change it into the following HTML: <html> <body> <p class="title">Foo</p> <p>The quick brown fox.</p> <p class="title">Bar</p> <p>Jumps over the lazy dog.</p> </body> </html> How can I find and replace certain HTML tags? I can use the Nokogiri gem. 回答1: Try this: require 'nokogiri' html_text = "<html><body><h1>Foo</h1><p>The quick brown fox

OS X Lion, Attempting Nokogiri install - libxml2 is missing

孤街浪徒 提交于 2019-11-27 04:15:34
问题 sudo gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/sajeev86/.rvm/rubies/ruby-1.8.7-p352/bin/ruby extconf.rb checking for libxml/parser.h... no ----- libxml2 is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies. ----- *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary

Nokogiri, open-uri, and Unicode Characters

我怕爱的太早我们不能终老 提交于 2019-11-27 03:41:10
I'm using Nokogiri and open-uri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What's the best way to deal with these? Here's what I'm doing: require 'open-uri' require 'nokogiri' doc = Nokogiri::HTML(open(link)) title = doc.at_css("title") At this point, the title looks like this: Rag\303\271 Instead of: Ragù How can I have nokogiri return the proper character (e.g. ù in this case)? Here's an example URL: http://www.epicurious.com/recipes/food/views/Tagliatelle-with-Duck-Ragu-242037 When you say "looks like this," are you viewing this value

Installing Nokogiri on OSX 10.10 Yosemite

一笑奈何 提交于 2019-11-27 02:49:15
I recently upgrade to the 10.10 Yosemite beta, but I'm having trouble getting Nokogiri installed. I'm using RVM and Ruby 1.9.3. I've also followed the steps here and tried following the instructions on Nokogiri's homepage. I've installed libxml2 (2.9.1) and libxslt (1.1.28) via homebrew, and have tried using the command line tools from both my Xcode 5 install and Xcode 6 beta. gem install nokogiri -v '1.5.5' Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/grantdavis/.rvm/rubies/ruby-1.9.3-p362/bin/ruby

How to access attributes using Nokogiri

最后都变了- 提交于 2019-11-27 01:43:53
问题 I have a simple task of accessing the values of some attributes. This is a simple script that uses Nokogiri::XML::Builder to create a simple XML doc. require 'nokogiri' builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml| xml.Placement(:messageId => "392847-039820-938777", :system => "MOD", :version => "2.0") { xml.objects { xml.object(:myattribute => "99", :anotherattrib => "333") xml.nextobject_ '9387toot' xml.Entertainment "Last Man Standing" } } end puts builder.to_xml puts

Error installing nokogiri: Failed to build gem native extension & libiconv is missing (OSX)

笑着哭i 提交于 2019-11-27 00:16:42
问题 I try to clone this repo and run bundle install . The bundle process failed and throw this error: ... Installing nokogiri 1.6.2.1 with native extensions Building nokogiri using packaged libraries. Gem::Ext::BuildError: ERROR: Failed to build gem native extension. /Users/zulhilmizainudin/.rvm/rubies/ruby-2.2.0/bin/ruby -r ./siteconf20151130-43880-pntnc6.rb extconf.rb Building nokogiri using packaged libraries. ----- libiconv is missing. please visit http://nokogiri.org/tutorials/installing

Mac user and getting WARNING: Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.7.3

风格不统一 提交于 2019-11-26 19:35:26
I have done all kinds of research and tried many different things. I know this question has been answered many times, but none of the suggested solutions are working for me. After upgrading to Lion I am getting segmentation faults in Ruby. I'm fairly confident it's Nokogiri. So I installed libxml2 via Homebrew. I ran brew link libxml2 . Then I reinstalled Nokogiri using that version of the library. For proof: $ nokogiri -v # Nokogiri (1.5.0) --- warnings: [] nokogiri: 1.5.0 ruby: version: 1.9.2 platform: x86_64-darwin11.0.0 description: ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64

Convert a Nokogiri document to a Ruby Hash

◇◆丶佛笑我妖孽 提交于 2019-11-26 18:33:28
Is there an easy way to convert a Nokogiri XML document to a Hash? Something like Rails' Hash.from_xml . I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML ( http://github.com/Empact/roxml/tree ) which maps xml elements to ruby objects; it is built atop libxml. # USAGE: Hash.from_libxml(YOUR_XML_STRING) require 'xml/libxml' # adapted from # http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0 class Hash class << self def from_libxml(xml, strict=true) begin

extract single string from HTML using Ruby/Mechanize (and Nokogiri)

百般思念 提交于 2019-11-26 18:30:27
问题 I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath. Sample code: require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]/text()').to_s.strip puts post_page