nokogiri

How to fix Nokogiri on Ubuntu?

江枫思渺然 提交于 2021-02-18 16:44:33
问题 I run Ubuntu 13.04 on my workstation with ruby 2.0.0, which was installed via RVM. $ aptitude show libxml2 Package: libxml2 State: installed Automatically instlled: no Multi-Arch: same Version: 2.9.0+dfsg1-4ubuntu4.1 $ aptitude show libxml2-dev Package: libxml2-dev State: installed Automatically installed: no Multi-Arch: same Version: 2.9.0+dfsg1-4ubuntu4.1 $ aptitude show libxslt-dev Package: libxslt1-dev State: installed Automatically installed: no Version: 1.1.27-1ubuntu2 Priority:

How to delete an XML element according to value of one of its children?

╄→гoц情女王★ 提交于 2021-02-10 07:20:06
问题 I have an xml element looking something like this: <Description> <ID>1234</ID> <SubDescription> <subID>4501</subID> </SubDescription> <SubDescription> <subID>4502</subID> </SubDescription> </Description> How can I delete the entire "Description" element according to the value of its "ID" child? 回答1: You can use the following xpath to select Description nodes that contain an ID node with value 1234: //Description[./ID[text()='1234']] So to remove the node, you can do: doc.xpath("//Description[

How to collect the first of several elements of a node in Nokogiri

↘锁芯ラ 提交于 2021-01-23 08:56:26
问题 I have data that looks like: <release> <artists> <artist> <name>Johnny Mnemonic</name> </artist> <artist> <name>Constantine</name> </artist> <artists> </release> <release> <artists> <artist> <name>Speed</name> </artist> <artist> <name>The Matrix</name> </artist> <artists> </release> ...and so on. For each release I want only the data from the first <artist> tag. I tried the following code but it pulls all text from the artists: page = Nokogiri::XML(open("37.xml")) page.xpath("//artists[1]")

[Ruby]使用Ruby抓取网页及加工处理

霸气de小男生 提交于 2020-08-12 02:09:49
并不是专业做网页抓取的爬虫的,只是之前在一个做的挺烂的网站上帮人刷票起步逐渐学习了网页抓取的工具。 最初的时候是用Python的urllib2,拿到网页当文本处理,后来才在论坛上看到有BeautifulSoap这种级别的神器,Python处理起这种来实在是方便,可惜后来我遇到了Ruby及Rails,从此移情别恋,乃至读了Metaprogramming Ruby后更是喜欢的不要不要的。 最近工作压力不大,闲来无事便想抓取一下股市的一些数据。首先遇到的一个问题便是从哪里拿到上市和深市所有的股票代码,即便网上有现成的列表我也想着用程序抓取加工一下才显得牛逼,所以我找到了这个网页:http://quote.eastmoney.com/stocklist.html 看完之后就觉得其实代码并不复杂,只是第一次用Ruby来抓取网页不熟悉,不知道用什么工具以及怎么用而已,经过一番搜索要用到open-uri及Nokogiri。 首先来看一下open-uri,这个是Ruby内建的功能。要想使用open-uri只需要在代码中加入require 'open-uri'即可,使用起来也很简单。 1 open( " http://www.ruby-lang.org/en " ) {|f| 2 f.each_line {|line| p line} 3 p f.base_uri # <URI::HTTP

xpath parent attribute of selection

五迷三道 提交于 2020-08-01 05:24:48
问题 Syntax of the xml document: <x name="GET-THIS"> <y> <z>Z</z> <z>Z__2</z> <z>Z__3</z> </y> </x> I'm able to get all z elements using: xpath("//z") But after that I got stuck, I'm not sure what to do next. I don't really understand the syntax of the .. parent method So, how do I get the attribute of the parent of the parent of the element? 回答1: Instead of traversing back to the parent, just find the right parent to begin with: //x will select all x elements. //x[//z] will select all x elements

How to install Nokogiri on Ruby 2.7.0

て烟熏妆下的殇ゞ 提交于 2020-05-29 06:53:42
问题 I recently upgraded to Ruby v2.7.0. When I tried to install Nokogiri I got the following error: ERROR: Error installing nokogiri: The last version of nokogiri (>= 0) to support your Ruby & RubyGems was 1.10.9. Try installing it with `gem install nokogiri -v 1.10.9` nokogiri requires Ruby version >= 2.3, < 2.7.dev. The current ruby versi on is 2.7.0.0. I tried to install this gem with gem install nokogiri -v 1.10.9 but I got the same error. How can I install Nokogiri now that I am using Ruby

How to ensure files are closed when reference is held by a different object

房东的猫 提交于 2020-05-13 05:28:27
问题 In Ruby, when the reference to an open file is handed to another object, as in the following code, do I need to wrap the other object reference in a " begin / ensure " block, to ensure the unmanaged resources get closed, or is there another way? @doc = Nokogiri::XML(File.open("shows.xml")) @doc.xpath("//character") # => ["<character>Al Bundy</character>", # "<character>Bud Bundy</character>", # "<character>Marcy Darcy</character>", # "<character>Larry Appleton</character>", # "<character

Nokogiri 详细使用方法

自闭症网瘾萝莉.ら 提交于 2020-03-24 10:01:58
3 月,跳不动了?>>> 1. 安装 gem install nokogiri 2. 类的结构 Nokogiri:: HTML ::Document < Nokogiri:: XML ::Document < Nokogiri:: XML ::Node < Object Nokogiri:: XML ::Element < Nokogiri:: XML ::Node < Object Nogogiri:: XML ::NodeSet < Enumerable < Object 3. HTML代码的解析 3.1 取得HTML的DOC对象 doc= Nokogiri::HTML(open(‘http://×××.com’)) 3.2 基本的处理模式 tds=doc.xpath("//td") # => td标签検索(得到NodeSet对象) tds.size # => td标签个数 tds[0] # => 第一个td标签(Element对象) tds[0]["class"] # => 第一个tdのclass名(String) tds[0].xpath(".//a") # => 第一个td里面检索a标签(NodeSet对象) 4. Node参照的方法 4.1 检索方法 at(".css") #根据css名,检索这个nodes里面的第一个node(返回一个Element) css("

卸载旧版本的Ruby gem

邮差的信 提交于 2020-02-26 15:40:47
我有几个版本的Ruby gem: $ gem list rjb (1.3.4, 1.3.3, 1.1.9) 如何删除旧版本但保留最新版本? #1楼 要删除 所有 已安装gem的旧版本,请遵循以下两个命令: gem cleanup --dryrun 上面的命令将预览要删除的宝石。 gem cleanup 上面的命令实际上将删除它们。 #2楼 清除任何旧版宝石的方法。 sudo gem cleanup 如果您只想查看要删除的内容列表,可以使用: sudo gem cleanup -d 您还可以通过指定其名称来清除特定的gem: sudo gem cleanup gemname 仅删除特定版本,如1.1.9 gem uninstall gemname --version 1.1.9 如果您仍然面临一些安装gem的异常,例如: 无效的gem:包已损坏,验证时出现异常:nil的未定义方法`size':/home/rails/.rvm/gems/ruby-2.1.1@project/cache/nokogiri-1.6.6.2中的NilClass(NoMethodError)。宝石 你可以从缓存中删除它: rm /home/rails/.rvm/gems/ruby-2.1.1@project/cache/nokogiri-1.6.6.2.gem 有关详细信息: http://blog