nokogiri

Adjusting timeouts for Nokogiri connections

醉酒当歌 提交于 2019-12-22 18:37:00
问题 Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc =

Strip all tbody tags without destroying their children

假装没事ソ 提交于 2019-12-22 18:28:20
问题 This Ruby code using Nokogiri doc.xpath("//tbody").remove removes the children of the <tbody> (as well as the <tbody> themselves). I only want to remove all <tbody> tags from the document, leaving their children in place. How can I achieve this? 回答1: require 'rubygems' require 'nokogiri' html = Nokogiri::HTML(DATA) html.xpath('//table/tbody').each do |tbody| tbody.children.each do |child| child.parent = tbody.parent end tbody.remove end puts html.xpath('//table').to_s __END__ <table border="0

How to extract text from <script> tag by using nokogiri and mechanize?

a 夏天 提交于 2019-12-22 18:27:27
问题 this is a part of the source code of a bookings web site: <script> booking.ensureNamespaceExists('env'); booking.env.b_map_center_latitude = 53.36480155016638; booking.env.b_map_center_longitude = -2.2752803564071655; booking.env.b_hotel_id = '35523'; booking.env.b_query_params_no_ext = '?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=e1c9e4c7a000518d8a3725b9bb6e5306;dcid=1'; </script> And I want to extract booking.env.b_hotel_id . So that i would get the value

How do I scrape HTML between two HTML comments using Nokogiri?

随声附和 提交于 2019-12-22 18:08:24
问题 I have some HTML pages where the contents to be extracted are marked with HTML comments like below. <html> ..... <!-- begin content --> <div>some text</div> <div><p>Some more elements</p></div> <!-- end content --> ... </html> I am using Nokogiri and trying to extract the HTML between the <!-- begin content --> and <!-- end content --> comments. I want to extract the full elements between these two HTML comments: <div>some text</div> <div><p>Some more elements</p></div> I can get the text

Use Nokogiri to replace <img src /> tags with <%= image_tag %>?

不羁岁月 提交于 2019-12-22 11:28:02
问题 How can I use nokogiri to replace all img tags with image tags? This is to utilize Rails' ability to plugin the correct asset server automatically? require 'nokogiri' class ToImageTag def self.convert Dir.glob("app/views/**/*").each do |filename| doc = Nokogiri::HTML(File.open(filename)) doc.xpath("//img").each |img_tags| # grab the src and all the attributes and move them to ERB end # rewrite the file end rescue => err puts "Exception: #{err}" end end 回答1: Somewhat inspired by maerics'

How do I merge two XML files into one using Nokogiri?

北城余情 提交于 2019-12-22 10:00:12
问题 I have two XML files and want to merge them, but the tags that are already there should not be changed: XML 1: <?xml version="1.0"?> <formX xmlns="sdu:x"> <identify> <mat>8</mat> </identify> </formX> XML 2: <?xml version="1.0"?> <formX xmlns="sdu:x"> <identify> <mat>9999</mat> <name>John Smith</name> </identify> </formX> I want the result to be like this: <?xml version="1.0"?> <formX xmlns="sdu:x"> <identify> <mat>8</mat> <name>John Smith</name> </identify> </formX> The previous tags should

Nokogiri Segmentation fault?

梦想与她 提交于 2019-12-22 07:25:58
问题 I am trying to run a simple Ruby script from Railscast and once I run my program I get the following Segmentation fault bug error. I have tried uninstalling and reinstalling Nokogiri and LibXML and still nothing. Is there anyway to fix the Ruby 1.87 version? Maybe that is the problem? $ ruby -v ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0] /Users/da/.rvm/gems/ruby-1.9.2-p180/gems/nokogiri-1.4.4/lib/nokogiri/nokogiri.bundle: [BUG] Segmentation fault ruby 1.8.7 (2009-06-12

How do I use xpath on nodes with a prefix but without a namespace?

断了今生、忘了曾经 提交于 2019-12-22 05:54:15
问题 I have an XML file that I need to parse. I have no control over the format of the file and cannot change it. The file makes use of a prefix (call it a ), but it doesn't define a namespace for that prefix anywhere. I can't seem to use xpath to query for nodes with the a namespace. Here's the contents of the xml document <?xml version="1.0" encoding="UTF-8"?> <a:root> <a:thing>stuff0</a:thing> <a:thing>stuff1</a:thing> <a:thing>stuff2</a:thing> <a:thing>stuff3</a:thing> <a:thing>stuff4</a:thing

Is it possible to parse a stylesheet with Nokogiri?

本小妞迷上赌 提交于 2019-12-22 04:18:08
问题 I've spent my requisite two hours Googling this, and I can not find any good answers, so let's see if humans can beat Google computers. I want to parse a stylesheet in Ruby so that I can apply those styles to elements in my document (to make the styles inlined). So, I want to take something like <style> .mystyle { color:white; } </style> And be able to extract it into a Nokogiri object of some sort. The Nokogiri class "CSS::Parser" (http://nokogiri.rubyforge.org/nokogiri/Nokogiri/CSS/Parser

libxml2 missing for nokogiri gem on Windows 8 x64 with Ruby 1.9.3

老子叫甜甜 提交于 2019-12-22 03:22:47
问题 What I found searching for similar issues was that Nokogiri does not yet have x64 support with Ruby 2.0 . However although I'm on a Windows x64 machine my Ruby version is ruby 1.9.3p392 (2013-02-22) [i386-mingw32] from railsinstaller.org (with Rails 3.2.13) . This also means DevKit is already installed. gem install nokogiri --pre gives this error: Temporarily enhancing PATH to include DevKit... Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: