nokogiri | 易学教程

Getting all links of a webpage using Ruby

阅读更多关于 Getting all links of a webpage using Ruby

I'm trying to retrieve every external link of a webpage using Ruby. I'm using String.scan with this regex: /href="https?:[^"]*|href='https?:[^']*/i Then, I can use gsub to remove the href part: str.gsub(/href=['"]/) This works fine, but I'm not sure if it's efficient in terms of performance. Is this OK to use or I should work with a more specific parser (nokogiri, for example)? Which way is better? Thanks! why you dont use groups in your pattern? e.g. /http[s]?:\/\/(.+)/i so the first group will already be the link you searched for. Using regular expressions is fine for a quick and dirty

unable to install nokogiri in ubuntu 12.04

阅读更多关于 unable to install nokogiri in ubuntu 12.04

I am trying to setup rails env on my new ubuntu machine. But I am facing trouble while install nokogiri gem.. I have installed libxslt and libxml2 libs thourgh rvm pkg command as well as using apt-get. I thought it is showing me libxslt is missing error. Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /home/hacker5/.rvm/rubies/ruby-1.9.3-p194/bin/ruby extconf.rb --with-xslt-include=/usr/include/libxslt checking for libxml/parser.h... yes checking for libxslt/xslt.h... yes checking for libexslt/exslt.h... yes

How to Get the Page Source with Mechanize/Nokogiri

阅读更多关于 How to Get the Page Source with Mechanize/Nokogiri

I'm logged into a webpage/servlet using Mechanize. I have a page object jobShortListPg = agent.get(addressOfPage) When i use the following puts jobShortListPg I get the "mechanized" version of the page which I don't want e.g. #<Mechanize::Page::Link "Home" "blahICScriptProgramName=WEBLIB_MENU.ISCRIPT3.FieldFormula.IScript_DrillDown&target=main0&Level=0&RL=&navc=3171"> How do I get the html source of the page instead? Use .body puts jobShortListPg.body Use the content method of the page object. jobShortListPg.content 来源： https://stackoverflow.com/questions/6487101/how-to-get-the-page-source

How to add a new node to XML

阅读更多关于 How to add a new node to XML

I have a simple XML file, items.xml: <?xml version="1.0" encoding="UTF-8" ?> <items> <item> <name>mouse</name> <manufacturer>Logicteh</manufacturer> </item> <item> <name>keyboard</name> <manufacturer>Logitech - Inc.</manufacturer> </item> <item> <name>webcam</name> <manufacturer>Logistech</manufacturer> </item> </items> I am trying to insert a new node with the following code: require 'rubygems' require 'nokogiri' f = File.open('items.xml') @items = Nokogiri::XML(f) f.close price = Nokogiri::XML::Node.new "price", @items price.content = "10" @items.xpath('//items/item/manufacturer').each do

gem install nokogiri -v '1.6.8.1' fails

阅读更多关于 gem install nokogiri -v '1.6.8.1' fails

问题 Building a new Rails app and getting a problem with nokogiri. Said to try gem install nokogiri -v '1.6.8.1' which fails with output below. I tried deleting Gemfile.lock and using the Gemfile from another app which has no problem— bundle install still fails. The original failure is bundle install which continues to work in other apps. From console: gem install nokogiri -v '1.6.8.1' Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build

What are some examples of using Nokogiri?

阅读更多关于 What are some examples of using Nokogiri?

I am trying to understand Nokogiri. Does anyone have a link to a basic example of Nokogiri parse/scrape showing the resultant tree. Think it would really help my understanding. the Tin Man Using IRB and Ruby 1.9.2: Load Nokogiri: 1.9.2-p290 :001 > require 'nokogiri' true Parse a document: 1.9.2-p290 :002 > doc = Nokogiri::HTML('<html><body><p>foobar</p></body></html>') #<Nokogiri::HTML::Document:0x1012821a0 @node_cache = [], attr_accessor :errors = [], attr_reader :decorators = nil Nokogiri likes well formed docs. Note that it added the DOCTYPE because I parsed as a document. It's possible to

I can't remove whitespaces from a string parsed by Nokogiri

阅读更多关于 I can't remove whitespaces from a string parsed by Nokogiri

I can't remove whitespaces from a string. My HTML is: <p class='your-price'> Cena pro Vás: <strong>139 <small>Kč</small></strong> </p> My code is: #encoding: utf-8 require 'rubygems' require 'mechanize' agent = Mechanize.new site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky") price = site.search("//p[@class='your-price']/strong/text()") val = price.first.text => "139 " val.strip => "139 " val.gsub(" ", "") => "139 " gsub , strip , etc. don't work. Why, and how do I fix this? val.class => String val.dump => "\"139\\u{a0}\"" ! val.encoding => #<Encoding:UTF-8> __ENCODING__ =>

Parsing simple XML with Nokogiri

阅读更多关于 Parsing simple XML with Nokogiri

问题 I have the following XML: <links> <item> <title>Title 1</title> <url>http://www.example.com/url-1</url> </item> <item> <title>Title 2</title> <url>http://www.example.com/url-2</url> </item> <item> <title>Title 3</title> <url>http://www.example.com/url-3</url> </item> </links> And, I would like to convert it to a HTML list: <ul> <li><a href="http://www.example.com/url-1">Title 1</a></li> <li><a href="http://www.example.com/url-2">Title 2</a></li> <li><a href="http://www.example.com/url-3"

set tag attribute and add plain text content to the tag using nokogiri builder (ruby)

阅读更多关于 set tag attribute and add plain text content to the tag using nokogiri builder (ruby)

I am trying to build XML using Nokogiri with some tags that have both attributes and plain text inside the tag. So I am trying to get to this: <?xml version="1.0"?> <Transaction requestName="OrderRequest"> <Option b="hive">hello</Option> </Transaction> Using builder I have this: builder = Nokogiri::XML::Builder.new { |xml| xml.Transaction("requestName" => "OrderRequest") do xml.Option("b" => "hive").text("hello") end } which renders to: <Transaction requestName="OrderRequest"> <Option b="hive" class="text">hello</Option> </Transaction> So it produces <Option b="hive" class="text">hello</Option

Getting attribute's value in Nokogiri to extract link URLs

阅读更多关于 Getting attribute's value in Nokogiri to extract link URLs

问题 I have a document which look like this: <div id="block"> <a href="http://google.com">link</a> </div> I can't get Nokogiri to get me the value of href attribute. I'd like to store the address in a Ruby variable as a string. 回答1: html = <<HTML <div id="block"> <a href="http://google.com">link</a> </div> HTML doc = Nokogiri::HTML(html) doc.xpath('//div/a/@href') #=> [#<Nokogiri::XML::Attr:0x80887798 name="href" value="http://google.com">] Or if you wanna be more specific about the div: >> doc