nokogiri

Getting all links of a webpage using Ruby

廉价感情. 提交于 2019-12-03 11:36:44
I'm trying to retrieve every external link of a webpage using Ruby. I'm using String.scan with this regex: /href="https?:[^"]*|href='https?:[^']*/i Then, I can use gsub to remove the href part: str.gsub(/href=['"]/) This works fine, but I'm not sure if it's efficient in terms of performance. Is this OK to use or I should work with a more specific parser (nokogiri, for example)? Which way is better? Thanks! why you dont use groups in your pattern? e.g. /http[s]?:\/\/(.+)/i so the first group will already be the link you searched for. Using regular expressions is fine for a quick and dirty

unable to install nokogiri in ubuntu 12.04

假装没事ソ 提交于 2019-12-03 10:34:30
I am trying to setup rails env on my new ubuntu machine. But I am facing trouble while install nokogiri gem.. I have installed libxslt and libxml2 libs thourgh rvm pkg command as well as using apt-get. I thought it is showing me libxslt is missing error. Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /home/hacker5/.rvm/rubies/ruby-1.9.3-p194/bin/ruby extconf.rb --with-xslt-include=/usr/include/libxslt checking for libxml/parser.h... yes checking for libxslt/xslt.h... yes checking for libexslt/exslt.h... yes

How to Get the Page Source with Mechanize/Nokogiri

爱⌒轻易说出口 提交于 2019-12-03 09:39:48
I'm logged into a webpage/servlet using Mechanize. I have a page object jobShortListPg = agent.get(addressOfPage) When i use the following puts jobShortListPg I get the "mechanized" version of the page which I don't want e.g. #<Mechanize::Page::Link "Home" "blahICScriptProgramName=WEBLIB_MENU.ISCRIPT3.FieldFormula.IScript_DrillDown&target=main0&Level=0&RL=&navc=3171"> How do I get the html source of the page instead? Use .body puts jobShortListPg.body Use the content method of the page object. jobShortListPg.content 来源: https://stackoverflow.com/questions/6487101/how-to-get-the-page-source

How to add a new node to XML

核能气质少年 提交于 2019-12-03 08:11:20
I have a simple XML file, items.xml: <?xml version="1.0" encoding="UTF-8" ?> <items> <item> <name>mouse</name> <manufacturer>Logicteh</manufacturer> </item> <item> <name>keyboard</name> <manufacturer>Logitech - Inc.</manufacturer> </item> <item> <name>webcam</name> <manufacturer>Logistech</manufacturer> </item> </items> I am trying to insert a new node with the following code: require 'rubygems' require 'nokogiri' f = File.open('items.xml') @items = Nokogiri::XML(f) f.close price = Nokogiri::XML::Node.new "price", @items price.content = "10" @items.xpath('//items/item/manufacturer').each do

gem install nokogiri -v '1.6.8.1' fails

时光毁灭记忆、已成空白 提交于 2019-12-03 07:46:13
问题 Building a new Rails app and getting a problem with nokogiri. Said to try gem install nokogiri -v '1.6.8.1' which fails with output below. I tried deleting Gemfile.lock and using the Gemfile from another app which has no problem— bundle install still fails. The original failure is bundle install which continues to work in other apps. From console: gem install nokogiri -v '1.6.8.1' Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build

What are some examples of using Nokogiri?

佐手、 提交于 2019-12-03 07:22:43
I am trying to understand Nokogiri. Does anyone have a link to a basic example of Nokogiri parse/scrape showing the resultant tree. Think it would really help my understanding. the Tin Man Using IRB and Ruby 1.9.2: Load Nokogiri: 1.9.2-p290 :001 > require 'nokogiri' true Parse a document: 1.9.2-p290 :002 > doc = Nokogiri::HTML('<html><body><p>foobar</p></body></html>') #<Nokogiri::HTML::Document:0x1012821a0 @node_cache = [], attr_accessor :errors = [], attr_reader :decorators = nil Nokogiri likes well formed docs. Note that it added the DOCTYPE because I parsed as a document. It's possible to

I can't remove whitespaces from a string parsed by Nokogiri

若如初见. 提交于 2019-12-03 07:22:39
I can't remove whitespaces from a string. My HTML is: <p class='your-price'> Cena pro Vás: <strong>139 <small>Kč</small></strong> </p> My code is: #encoding: utf-8 require 'rubygems' require 'mechanize' agent = Mechanize.new site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky") price = site.search("//p[@class='your-price']/strong/text()") val = price.first.text => "139 " val.strip => "139 " val.gsub(" ", "") => "139 " gsub , strip , etc. don't work. Why, and how do I fix this? val.class => String val.dump => "\"139\\u{a0}\"" ! val.encoding => #<Encoding:UTF-8> __ENCODING__ =>

Parsing simple XML with Nokogiri

霸气de小男生 提交于 2019-12-03 07:06:09
问题 I have the following XML: <links> <item> <title>Title 1</title> <url>http://www.example.com/url-1</url> </item> <item> <title>Title 2</title> <url>http://www.example.com/url-2</url> </item> <item> <title>Title 3</title> <url>http://www.example.com/url-3</url> </item> </links> And, I would like to convert it to a HTML list: <ul> <li><a href="http://www.example.com/url-1">Title 1</a></li> <li><a href="http://www.example.com/url-2">Title 2</a></li> <li><a href="http://www.example.com/url-3"

set tag attribute and add plain text content to the tag using nokogiri builder (ruby)

别说谁变了你拦得住时间么 提交于 2019-12-03 05:46:43
I am trying to build XML using Nokogiri with some tags that have both attributes and plain text inside the tag. So I am trying to get to this: <?xml version="1.0"?> <Transaction requestName="OrderRequest"> <Option b="hive">hello</Option> </Transaction> Using builder I have this: builder = Nokogiri::XML::Builder.new { |xml| xml.Transaction("requestName" => "OrderRequest") do xml.Option("b" => "hive").text("hello") end } which renders to: <Transaction requestName="OrderRequest"> <Option b="hive" class="text">hello</Option> </Transaction> So it produces <Option b="hive" class="text">hello</Option

Getting attribute's value in Nokogiri to extract link URLs

↘锁芯ラ 提交于 2019-12-03 05:36:18
问题 I have a document which look like this: <div id="block"> <a href="http://google.com">link</a> </div> I can't get Nokogiri to get me the value of href attribute. I'd like to store the address in a Ruby variable as a string. 回答1: html = <<HTML <div id="block"> <a href="http://google.com">link</a> </div> HTML doc = Nokogiri::HTML(html) doc.xpath('//div/a/@href') #=> [#<Nokogiri::XML::Attr:0x80887798 name="href" value="http://google.com">] Or if you wanna be more specific about the div: >> doc