nokogiri | 易学教程

Parsing Large XML files w/ Ruby & Nokogiri

阅读更多关于 Parsing Large XML files w/ Ruby & Nokogiri

问题 I have a large XML file (about 10K rows) I need to parse regularly that is in this format: <summarysection> <totalcount>10000</totalcount> </summarysection> <items> <item> <cat>Category</cat> <name>Name 1</name> <value>Val 1</value> </item> ...... 10,000 more times </items> What I'd like to do is parse each of the individual nodes using nokogiri to count the amount of items in one category. Then, I'd like to subtract that number from the total_count to get an ouput that reads "Count of

How to handle 404 not found errors in Nokogiri

阅读更多关于 How to handle 404 not found errors in Nokogiri

问题 I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception? http://yoursite/page/38475 #=> page number 38475 doesn't exist I tried the following which didn't work. url = "http://yoursite/page/38475" doc = Nokogiri::HTML(open(url)) do begin rescue Exception => e puts "Try again later" end end 回答1: It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises

How to wrap Nokogiri nodeset in ONE span

阅读更多关于 How to wrap Nokogiri nodeset in ONE span

问题 So my goal is to wrap all paragraphs after the initial paragraph within a span. I'm trying to figure out how to wrap a nodeset within a single span and .wrap() wraps each node in its own span. As in want: <p>First</p> <p>Second</p> <p>Third</p> To become: <p>First</p> <span> <p>Second</p> <p>Third</p> </span> Any sample code to help? Thanks! 回答1: I'd do as below : require 'nokogiri' doc = Nokogiri::HTML::DocumentFragment.parse(<<-html) <p>First</p> <p>Second</p> <p>Third</p> html nodeset =

How to get XML parent attribute value

阅读更多关于 How to get XML parent attribute value

问题 I have multiple statements like: <House name="test1"> <Room id="test2" name="test3" > <test name="test4" param="test5"> <blah id="test6" name="test7"> </blah> </test> </Room> </House> When the blah name is some particular value like test7 I need the corresponding Room name. How do I achieve that? 回答1: I never used Nokogiri but I tried and this seems to work: xml_doc.css('blah[name="test7"]').first.ancestors("Room").first['name'] => "test3" Just check for nil s. 2.3.1 :132 > xml_doc.css('blah

FF Xpather to Nokogiri — Can I just copy and paste?

阅读更多关于 FF Xpather to Nokogiri — Can I just copy and paste?

问题 I was doing this manually and then I got stuck and I can't figure out why it's not working. I downloaded xpather and it is giving me: /html/body/center/table/tbody/tr[3]/td/table as the path to the item I want. I have manually confirmed that this is correct but when I paste it into my code, all it does is return nil Here is my code: a = parentdoc.at_xpath("//html/body/center/table/tbody/tr[3]/td/table[1]") puts a If I do something like this: a = parentdoc.at_xpath("//html/body/center") puts a

Nokogiri Ruby HTML Parser

阅读更多关于 Nokogiri Ruby HTML Parser

问题 I'm running into problems scraping across multiple pages with Nokogiri. I need to be able to narrow down the results of what I am searching for based on the qualified hrefs first. So here is a script to get all of the hrefs I'm interested in obtaining. However, I'm having trouble parsing out the titles of the article so that I can link to them. It would be great to know that I can manually inspect the elements so that I have the links I want and whenever I find a link I want I can also grab

Nokogiri (Ruby): Extract tag contents for a specific attribute inside each node

阅读更多关于 Nokogiri (Ruby): Extract tag contents for a specific attribute inside each node

问题 I have a XML with the following structure <Root> <Batch name="value"> <Document id="ID1"> <Tags> <Tag id="ID11" name="name11">Contents</Tag> <Tag id="ID12" name="name12">Contents</Tag> </Tags> </Document> <Document id="ID2"> <Tags> <Tag id="ID21" name="name21">Contents</Tag> <Tag id="ID22" name="name22">Contents</Tag> </Tags> </Document> </Batch> </Root> I want to extract the contents of specific tags for each Document node, using something like this: xml.xpath('//Document/Tags').each do

Install mechanize with Ruby 2.3 on Windows 7 got error

阅读更多关于 Install mechanize with Ruby 2.3 on Windows 7 got error

问题 I'm trying to install Mechanize with Ruby 2.3 on Windows 7. However I got the following error. Could anyone point me to the right direction? PS C:\DevKit> ruby --version ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32] PS C:\DevKit> gem install mechanize Fetching: net-http-digest_auth-1.4.gem (100%) Successfully installed net-http-digest_auth-1.4 Fetching: net-http-persistent-2.9.4.gem (100%) Successfully installed net-http-persistent-2.9.4 Fetching: mime-types-2.99.1.gem (100%)

Using Nokogiri's CSS method to get all elements within an alt tag

阅读更多关于 Using Nokogiri's CSS method to get all elements within an alt tag

问题 I am trying to use Nokogiri's CSS method to get some names from my HTML. This is an example of the HTML: <section class="container partner-customer padding-bottom--60"> <div> <div> <a id="technologies"></a> <h4 class="center-align">The Team</h4> </div> </div> <div class="consultant list-across wrap"> <div class="engineering"> <img class="" src="https://v0001.jpg" alt="Person 1"/> <p>Person 1<br>Founder, Chairman & CTO</p> </div> <div class="engineering"> <img class="" src="https://v0002.png"

Error installing “nokogiri” in a Ruby on Rails application?

阅读更多关于 Error installing “nokogiri” in a Ruby on Rails application?

问题 I've been following along with the Lynda.com's Ruby on Rails course. I did everything just as mentioned inside the videos. I am trying to run the rails server command, which should default to WEBrick, correct? I run the command and it has an issue in the nokogiri.rb file and on line 29 where the error is happening this is what is read: require 'nokogiri/nokogiri' Which is what my command prompt is throwing up on when running the rails server command. Any idea what could be causing this? If so