nokogiri

Modifying text inside html nodes - nokogiri

ぃ、小莉子 提交于 2019-12-09 12:36:07
问题 Let's say i have the following HTML: <ul><li>Bullet 1.</li> <li>Bullet 2.</li> <li>Bullet 3.</li> <li>Bullet 4.</li> <li>Bullet 5.</li></ul> What I wish to do with it, is replace any periods, question marks or exclamation marks with itself and a trailing asterisk, that is inside an HTML node, then convert back to HTML. So the result would be: <ul><li>Bullet 1.*</li> <li>Bullet 2.*</li> <li>Bullet 3.*</li> <li>Bullet 4.*</li> <li>Bullet 5.*</li></ul> I've been messing around with this a bit in

How to get Meta Keywords using Nokogiri?

ぃ、小莉子 提交于 2019-12-09 10:40:36
问题 I'm using Nokogiri for an assignment and I'm struggling to figure this out. It's hurting my brain. Any steps, hints, or examples leading to the solution would be lovely. 回答1: Here is a simple example: require 'rubygems' require 'nokogiri' doc = Nokogiri::HTML("<html><head><meta name=\"Keywords\" content=\"one, two, three\"></head><body></body></html>") doc.xpath("//meta[@name='Keywords']/@content").each do |attr| puts attr.value end 来源: https://stackoverflow.com/questions/9554053/how-to-get

An error occurred while installing nokogiri (1.5.2)

北慕城南 提交于 2019-12-09 10:24:36
问题 When I try to run a ruby on rails project I got an error: An error occurred while installing nokogiri (1.5.2), and bundle cannot continue. Make sure that 'gem install nokogiri -v 1.5.2 succeed before building. I'm working on Ubuntu 10.10 My co-worker uses on Windows/RVM and does not have this problem. Edit: gem_make.out /opt/bitnami/ruby/bin/ruby extconf.rb extconf.rb:10: Use RbConfig instead of obsolete and deprecated Config. checking for libxml/parser.h... yes checking for libxslt/xslt.h...

How to parse a HTML table with Nokogiri?

时光总嘲笑我的痴心妄想 提交于 2019-12-09 09:23:30
问题 I'm trying to parse a table but I don't know how to save the data from it. I want to save the data in each row row to look like: ['Raw name 1', 2,094, 0,017, 0,098, 0,113, 0,452] The sample table is: html = <<EOT <table class="open"> <tr> <th>Table name</th> <th>Column name 1</th> <th>Column name 2</th> <th>Column name 3</th> <th>Column name 4</th> <th>Column name 5</th> </tr> <tr> <th>Raw name 1</th> <td>2,094</td> <td>0,017</td> <td>0,098</td> <td>0,113</td> <td>0,452</td> </tr> . . . <tr>

Scraping pages that do not seem to have URLs

走远了吗. 提交于 2019-12-09 03:47:27
问题 I'm trying to scrape these listings and provide more exposure for these job listings on a site that belongs to a client of mine. The issue is that I need to be able to link to the specific job listing in order for the job seeker to apply. This is the page I'm trying to save listing links from. It would be ideal if I could save an address for the job seeker to click on to see the original listing and then apply. What is this website doing to not feature a URL for these pages Is it possible to

Scraping an AngularJS application

人盡茶涼 提交于 2019-12-09 01:54:38
问题 I'm scrapping some HTML pages with Rails, using Nokogiri. I had some problems when I tried to scrap an AngularJS page because the gem is opening the HTML before it has been fully rendered. Is there some way to scrap this type of page? How can I have the page fully rendered before scraping it? 回答1: If you're trying to scrape AngularJS pages in a fully generic fashion, then you're likely going to need something like what @tadman mentioned in the comments (PhantomJS) -- some type of headless

Parsing Large XML with Nokogiri

老子叫甜甜 提交于 2019-12-09 01:50:48
问题 So I'm attempting to parse a 400k+ line XML file using Nokogiri. The XML file has this basic format: <?xml version="1.0" encoding="windows-1252"?> <JDBOR date="2013-09-01 04:12:31" version="1.0.20 [2012-12-14]" copyright="Orphanet (c) 2013"> <DisorderList count="6760"> *** Repeated Many Times *** <Disorder id="17601"> <OrphaNumber>166024</OrphaNumber> <Name lang="en">Multiple epiphyseal dysplasia, Al-Gazali type</Name> <DisorderSignList count="18"> <DisorderSign> <ClinicalSign id="2040">

nokogiri: why is this an invalid xpath?

余生长醉 提交于 2019-12-08 19:46:25
问题 //br/preceding-sibling::normalize-space(text()) i am getting invalid xpath expression with nokogiri 回答1: normalize-space is a function. You can't use it there. You need a node-set. maybe you mean //br/preceding-sibling::* or you could use normalize-space in a predicate , inside square brackets. Think of the predicate as a filter or selector on the node-set. So you can do this: //br/preceding-sibling::*[normalize-space()='Fred'] In English that translates to "all elements preceding <br> in the

Ruby 2.1 and Nokogiri install error?

烈酒焚心 提交于 2019-12-08 18:03:44
问题 I know this is an issue that has been "solved" many times here, but I have tried all of the solutions and still can't get it to work. Here is my error: 22-01-14 17:57:56> gem install nokogiri Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /Users/josh/.rvm/rubies/ruby-2.1.0/bin/ruby extconf.rb Extracting libxml2-2.8.0.tar.gz into tmp//ports/libxml2/2.8.0... OK Running 'configure' for libxml2 2.8.0... ERROR,

RVM 1.9.1 & nokogiri

依然范特西╮ 提交于 2019-12-08 17:34:07
问题 Having trouble installing the nokogiri gem under rvm ruby 1.9.1. gem install nokogiri I'm getting ... /usr/include/libxml2... no libxml2 is missing. try 'port install libxml2' or 'yum install libxml2-devel' *** extconf.rb failed *** but i checked: sudo apt-get install libxml2 and i got: Reading state information... Done libxml2 is already the newest version. is this a root thing perhaps? RVM runs everything in userspace. 回答1: You might want to confirm that the version of libxml installed by