nokogiri | 易学教程

Modifying text inside html nodes - nokogiri

阅读更多关于 Modifying text inside html nodes - nokogiri

问题 Let's say i have the following HTML: <ul><li>Bullet 1.</li> <li>Bullet 2.</li> <li>Bullet 3.</li> <li>Bullet 4.</li> <li>Bullet 5.</li></ul> What I wish to do with it, is replace any periods, question marks or exclamation marks with itself and a trailing asterisk, that is inside an HTML node, then convert back to HTML. So the result would be: <ul><li>Bullet 1.*</li> <li>Bullet 2.*</li> <li>Bullet 3.*</li> <li>Bullet 4.*</li> <li>Bullet 5.*</li></ul> I've been messing around with this a bit in

How to get Meta Keywords using Nokogiri?

阅读更多关于 How to get Meta Keywords using Nokogiri?

问题 I'm using Nokogiri for an assignment and I'm struggling to figure this out. It's hurting my brain. Any steps, hints, or examples leading to the solution would be lovely. 回答1: Here is a simple example: require 'rubygems' require 'nokogiri' doc = Nokogiri::HTML("<html><head><meta name=\"Keywords\" content=\"one, two, three\"></head><body></body></html>") doc.xpath("//meta[@name='Keywords']/@content").each do |attr| puts attr.value end 来源： https://stackoverflow.com/questions/9554053/how-to-get

An error occurred while installing nokogiri (1.5.2)

阅读更多关于 An error occurred while installing nokogiri (1.5.2)

问题 When I try to run a ruby on rails project I got an error: An error occurred while installing nokogiri (1.5.2), and bundle cannot continue. Make sure that 'gem install nokogiri -v 1.5.2 succeed before building. I'm working on Ubuntu 10.10 My co-worker uses on Windows/RVM and does not have this problem. Edit: gem_make.out /opt/bitnami/ruby/bin/ruby extconf.rb extconf.rb:10: Use RbConfig instead of obsolete and deprecated Config. checking for libxml/parser.h... yes checking for libxslt/xslt.h...

How to parse a HTML table with Nokogiri?

阅读更多关于 How to parse a HTML table with Nokogiri?

问题 I'm trying to parse a table but I don't know how to save the data from it. I want to save the data in each row row to look like: ['Raw name 1', 2,094, 0,017, 0,098, 0,113, 0,452] The sample table is: html = <<EOT <table class="open"> <tr> <th>Table name</th> <th>Column name 1</th> <th>Column name 2</th> <th>Column name 3</th> <th>Column name 4</th> <th>Column name 5</th> </tr> <tr> <th>Raw name 1</th> <td>2,094</td> <td>0,017</td> <td>0,098</td> <td>0,113</td> <td>0,452</td> </tr> . . . <tr>

Scraping pages that do not seem to have URLs

阅读更多关于 Scraping pages that do not seem to have URLs

问题 I'm trying to scrape these listings and provide more exposure for these job listings on a site that belongs to a client of mine. The issue is that I need to be able to link to the specific job listing in order for the job seeker to apply. This is the page I'm trying to save listing links from. It would be ideal if I could save an address for the job seeker to click on to see the original listing and then apply. What is this website doing to not feature a URL for these pages Is it possible to

Scraping an AngularJS application

阅读更多关于 Scraping an AngularJS application

问题 I'm scrapping some HTML pages with Rails, using Nokogiri. I had some problems when I tried to scrap an AngularJS page because the gem is opening the HTML before it has been fully rendered. Is there some way to scrap this type of page? How can I have the page fully rendered before scraping it? 回答1: If you're trying to scrape AngularJS pages in a fully generic fashion, then you're likely going to need something like what @tadman mentioned in the comments (PhantomJS) -- some type of headless

Parsing Large XML with Nokogiri

阅读更多关于 Parsing Large XML with Nokogiri

问题 So I'm attempting to parse a 400k+ line XML file using Nokogiri. The XML file has this basic format: <?xml version="1.0" encoding="windows-1252"?> <JDBOR date="2013-09-01 04:12:31" version="1.0.20 [2012-12-14]" copyright="Orphanet (c) 2013"> <DisorderList count="6760"> *** Repeated Many Times *** <Disorder id="17601"> <OrphaNumber>166024</OrphaNumber> <Name lang="en">Multiple epiphyseal dysplasia, Al-Gazali type</Name> <DisorderSignList count="18"> <DisorderSign> <ClinicalSign id="2040">

nokogiri: why is this an invalid xpath?

阅读更多关于 nokogiri: why is this an invalid xpath?

问题 //br/preceding-sibling::normalize-space(text()) i am getting invalid xpath expression with nokogiri 回答1: normalize-space is a function. You can't use it there. You need a node-set. maybe you mean //br/preceding-sibling::* or you could use normalize-space in a predicate , inside square brackets. Think of the predicate as a filter or selector on the node-set. So you can do this: //br/preceding-sibling::*[normalize-space()='Fred'] In English that translates to "all elements preceding <br> in the

Ruby 2.1 and Nokogiri install error?

阅读更多关于 Ruby 2.1 and Nokogiri install error?

问题 I know this is an issue that has been "solved" many times here, but I have tried all of the solutions and still can't get it to work. Here is my error: 22-01-14 17:57:56> gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/josh/.rvm/rubies/ruby-2.1.0/bin/ruby extconf.rb Extracting libxml2-2.8.0.tar.gz into tmp//ports/libxml2/2.8.0... OK Running 'configure' for libxml2 2.8.0... ERROR,

RVM 1.9.1 & nokogiri

阅读更多关于 RVM 1.9.1 & nokogiri

问题 Having trouble installing the nokogiri gem under rvm ruby 1.9.1. gem install nokogiri I'm getting ... /usr/include/libxml2... no libxml2 is missing. try 'port install libxml2' or 'yum install libxml2-devel' *** extconf.rb failed *** but i checked: sudo apt-get install libxml2 and i got: Reading state information... Done libxml2 is already the newest version. is this a root thing perhaps? RVM runs everything in userspace. 回答1: You might want to confirm that the version of libxml installed by