What are some examples of using Nokogiri?

佐手、 提交于 2019-12-03 07:22:43
the Tin Man

Using IRB and Ruby 1.9.2:

Load Nokogiri:

1.9.2-p290 :001 > require 'nokogiri'
true

Parse a document:

1.9.2-p290 :002 > doc = Nokogiri::HTML('<html><body><p>foobar</p></body></html>')
#<Nokogiri::HTML::Document:0x1012821a0
    @node_cache = [],
    attr_accessor :errors = [],
    attr_reader :decorators = nil

Nokogiri likes well formed docs. Note that it added the DOCTYPE because I parsed as a document. It's possible to parse as a document fragment too, but that is pretty specialized.

1.9.2-p290 :003 > doc.to_html
"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>foobar</p></body></html>\n"

Search the document to find the first <p> node using CSS and grab its content:

1.9.2-p290 :004 > doc.at('p').text
"foobar"

Use a different method name to do the same thing:

1.9.2-p290 :005 > doc.at('p').content
"foobar"

Search the document for all <p> nodes inside the <body> tag, and grab the content of the first one. search returns a nodeset, which is like an array of nodes.

1.9.2-p290 :006 > doc.search('body p').first.text
"foobar"

Change the content of the node:

1.9.2-p290 :007 > doc.at('p').content = 'bar'
"bar"

Emit a parsed document as HTML:

1.9.2-p290 :008 > doc.to_html
"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><p>bar</p></body></html>\n"

Remove a node:

1.9.2-p290 :009 > doc.at('p').remove
#<Nokogiri::XML::Element:0x80939178 name="p" children=[#<Nokogiri::XML::Text:0x8091a624 "bar">]>
1.9.2-p290 :010 > doc.to_html
"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body></body></html>\n"

As for scraping, there are a lot of questions on SO about using Nokogiri for tearing apart HTML from sites. Searching StackOverflow for "nokogiri and open-uri" should help.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!