nokogiri | 易学教程

Adjusting timeouts for Nokogiri connections

阅读更多关于 Adjusting timeouts for Nokogiri connections

问题 Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc =

Strip all tbody tags without destroying their children

阅读更多关于 Strip all tbody tags without destroying their children

问题 This Ruby code using Nokogiri doc.xpath("//tbody").remove removes the children of the <tbody> (as well as the <tbody> themselves). I only want to remove all <tbody> tags from the document, leaving their children in place. How can I achieve this? 回答1: require 'rubygems' require 'nokogiri' html = Nokogiri::HTML(DATA) html.xpath('//table/tbody').each do |tbody| tbody.children.each do |child| child.parent = tbody.parent end tbody.remove end puts html.xpath('//table').to_s __END__ <table border="0

How to extract text from <script> tag by using nokogiri and mechanize?

阅读更多关于 How to extract text from tag by using nokogiri and mechanize?

问题 this is a part of the source code of a bookings web site: <script> booking.ensureNamespaceExists('env'); booking.env.b_map_center_latitude = 53.36480155016638; booking.env.b_map_center_longitude = -2.2752803564071655; booking.env.b_hotel_id = '35523'; booking.env.b_query_params_no_ext = '?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=e1c9e4c7a000518d8a3725b9bb6e5306;dcid=1'; </script> And I want to extract booking.env.b_hotel_id . So that i would get the value

How do I scrape HTML between two HTML comments using Nokogiri?

阅读更多关于 How do I scrape HTML between two HTML comments using Nokogiri?

问题 I have some HTML pages where the contents to be extracted are marked with HTML comments like below. <html> .....  <div>some text</div> <div><p>Some more elements</p></div>  ... </html> I am using Nokogiri and trying to extract the HTML between the  and  comments. I want to extract the full elements between these two HTML comments: <div>some text</div> <div><p>Some more elements</p></div> I can get the text

Use Nokogiri to replace <img src /> tags with <%= image_tag %>?

阅读更多关于 Use Nokogiri to replace tags with ?

问题 How can I use nokogiri to replace all img tags with image tags? This is to utilize Rails' ability to plugin the correct asset server automatically? require 'nokogiri' class ToImageTag def self.convert Dir.glob("app/views/**/*").each do |filename| doc = Nokogiri::HTML(File.open(filename)) doc.xpath("//img").each |img_tags| # grab the src and all the attributes and move them to ERB end # rewrite the file end rescue => err puts "Exception: #{err}" end end 回答1: Somewhat inspired by maerics'

How do I merge two XML files into one using Nokogiri?

阅读更多关于 How do I merge two XML files into one using Nokogiri?

问题 I have two XML files and want to merge them, but the tags that are already there should not be changed: XML 1: <?xml version="1.0"?> <formX xmlns="sdu:x"> <identify> <mat>8</mat> </identify> </formX> XML 2: <?xml version="1.0"?> <formX xmlns="sdu:x"> <identify> <mat>9999</mat> <name>John Smith</name> </identify> </formX> I want the result to be like this: <?xml version="1.0"?> <formX xmlns="sdu:x"> <identify> <mat>8</mat> <name>John Smith</name> </identify> </formX> The previous tags should

Nokogiri Segmentation fault?

阅读更多关于 Nokogiri Segmentation fault?

问题 I am trying to run a simple Ruby script from Railscast and once I run my program I get the following Segmentation fault bug error. I have tried uninstalling and reinstalling Nokogiri and LibXML and still nothing. Is there anyway to fix the Ruby 1.87 version? Maybe that is the problem? $ ruby -v ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0] /Users/da/.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/nokogiri.bundle: [BUG] Segmentation fault ruby 1.8.7 (2009-06-12

How do I use xpath on nodes with a prefix but without a namespace?

阅读更多关于 How do I use xpath on nodes with a prefix but without a namespace?

问题 I have an XML file that I need to parse. I have no control over the format of the file and cannot change it. The file makes use of a prefix (call it a ), but it doesn't define a namespace for that prefix anywhere. I can't seem to use xpath to query for nodes with the a namespace. Here's the contents of the xml document <?xml version="1.0" encoding="UTF-8"?> <a:root> <a:thing>stuff0</a:thing> <a:thing>stuff1</a:thing> <a:thing>stuff2</a:thing> <a:thing>stuff3</a:thing> <a:thing>stuff4</a:thing

Is it possible to parse a stylesheet with Nokogiri?

阅读更多关于 Is it possible to parse a stylesheet with Nokogiri?

问题 I've spent my requisite two hours Googling this, and I can not find any good answers, so let's see if humans can beat Google computers. I want to parse a stylesheet in Ruby so that I can apply those styles to elements in my document (to make the styles inlined). So, I want to take something like <style> .mystyle { color:white; } </style> And be able to extract it into a Nokogiri object of some sort. The Nokogiri class "CSS::Parser" (http://nokogiri.rubyforge.org/nokogiri/Nokogiri/CSS/Parser

libxml2 missing for nokogiri gem on Windows 8 x64 with Ruby 1.9.3

阅读更多关于 libxml2 missing for nokogiri gem on Windows 8 x64 with Ruby 1.9.3

问题 What I found searching for similar issues was that Nokogiri does not yet have x64 support with Ruby 2.0 . However although I'm on a Windows x64 machine my Ruby version is ruby 1.9.3p392 (2013-02-22) [i386-mingw32] from railsinstaller.org (with Rails 3.2.13) . This also means DevKit is already installed. gem install nokogiri --pre gives this error: Temporarily enhancing PATH to include DevKit... Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: