nokogiri

Find and replace entire HTML nodes with Nokogiri

巧了我就是萌 提交于 2019-11-30 12:37:37
i have an HTML, that should be transformed, having some tags replaced with another tags. I don't know about these tags, because they will come from db. So, set_attribute or name methods of Nokogiri are not suitable for me. I need to do it, in a way, like in this pseudo-code: def preprocess_content doc = Nokogiri::HTML( self.content ) doc.css("div.to-replace").each do |div| # "get_html_text" will obtain HTML from db. It can be anything, even another tags, tag groups etc. div.replace self.get_html_text end self.content = doc.css("body").first.inner_html end I found Nokogiri::XML::Node::replace

Get link and href text from html doc with Nokogiri & Ruby?

[亡魂溺海] 提交于 2019-11-30 12:21:12
问题 I'm trying to use the nokogiri gem to extract all the urls on the page as well their link text and store the link text and url in a hash. <html> <body> <a href=#foo>Foo</a> <a href=#bar>Bar </a> </body> </html> I would like to return {"Foo" => "#foo", "Bar" => "#bar"} 回答1: Here's a one-liner: Hash[doc.xpath('//a[@href]').map {|link| [link.text.strip, link["href"]]}] #=> {"Foo"=>"#foo", "Bar"=>"#bar"} Split up a bit to be arguably more readable: h = {} doc.xpath('//a[@href]').each do |link| h

Parsing Javascript using Ruby code

瘦欲@ 提交于 2019-11-30 08:26:14
问题 I'm writing a test code in Ruby and trying to parse a HTML source file of a website. It has a JavaScript variable which I can use to compare it against other values. For example: <script type="text/javascript" language="JavaScript"> function GetParam(name) { var req_var = { a: 'xyz', b: 'yy.com', c: 'en', d:0, e: 'y' }; } </script> Here I want to extract the variable req_var from this function. Is it possible to do that? If so can anyone please help me with that? 回答1: javascript parser in

Get text directly inside a tag in Nokogiri

痴心易碎 提交于 2019-11-30 08:15:25
I have some HTML that looks like: <dt> <a href="#">Hello</a> (2009) </dt> I already have all my HTML loaded into a variable called record . I need to parse out the year i.e. 2009 if it exists. How can I get the text inside the dt tag but not the text inside the a tag? I've used record.search("dt").inner_text and this gives me everything. It's a trivial question but I haven't managed to figure this out. To get all the direct children with text, but not any further sub-children, you can use XPath like so: doc.xpath('//dt/text()') Or if you wish to use search: doc.search('dt').xpath('text()')

How to parse consecutive tags with Nokogiri?

纵饮孤独 提交于 2019-11-30 07:47:53
I have HTML code like this: <div id="first"> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </div> My code does not work. doc.css("first").each do |item| label = item.css("dt") value = item.css("dd") end Show all the <dt> tags firsts and then the <dd> tags and I need "label: value" First of all, your HTML should have the <dt> and <dd> elements inside a <dl> : <div id="first"> <dl> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </dl> </div> but that won't change how you parse it. You want to find the <dt> s and iterate over them, then at each <dt> you

What is a robust installation process for Nokogiri (on Ubuntu)?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-30 07:22:18
I tried to install Nokogiri on my Ubuntu 12.04 system, and got an error that said " libxslt is missing ", but the libxslt-dev and libxml2-dev are installed. Is there a robust installation process? How can I check the links to dependent libraries? I used RVM, and the RVM pkg is installed too. ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /home/victor/.rvm/rubies/ruby-1.9.3-p125/bin/ruby extconf.rb checking for libxml/parser.h... yes checking for libxslt/xslt.h... yes checking for libexslt/exslt.h... yes checking for iconv_open() in iconv.h... yes checking for

How do I use Nokogiri::XML::Reader to parse large XML files?

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-30 07:06:06
I'm trying to use Ruby's Nokogiri to parse large (1 GB or more) XML files. I'm testing code on a smaller file, containing only 4 records available here . I'm using Nokogiri version 1.5.0, Ruby 1.8.7 on Ubuntu 10.10. Since I don't understand SAX very well, I'm trying Nokogiri::XML::Reader to start. My first attempt, to retrieve the content of the PMID tag, looks like this: #!/usr/bin/ruby require "rubygems" require "nokogiri" file = ARGV[0] reader = Nokogiri::XML::Reader(File.open(file)) p = [] reader.each do |node| if node.name == "PMID" p << node.inner_xml end end puts p.inspect Here's what I

Creating an XML document with a namespaced root element with Nokogiri builder

僤鯓⒐⒋嵵緔 提交于 2019-11-30 07:00:13
I'm implementing an exporter for an XML data format that requires namespaces. I'm using the Nokogiri XML Builder (version 1.4.0) to do this. However, I can't get Nokogiri to create a root node with a namespace. This works: Nokogiri::XML::Builder.new { |xml| xml.root('xmlns:foobar' => 'my-ns-url') }.to_xml <?xml version="1.0"?> <root xmlns:foobar="my-ns-url"/> As does this: Nokogiri::XML::Builder.new do |xml| xml.root('xmlns:foobar' => 'my-ns-url') { xml['foobar'].child } end.to_xml <?xml version="1.0"?> <root xmlns:foobar="my-ns-url"> <foobar:child/> </root> However, I need something like <foo

Nokogiri to_xml without carriage returns

一曲冷凌霜 提交于 2019-11-30 06:21:17
I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of me figure out how to get rid of them. Here's an example: b = Nokogiri::XML::Builder.new do |xml| xml.root do xml.text("Value") end end b.to_xml This results in the following: <?xml version="1.0"?> <root>Value</root> What I want is this (notice the missing newline): <?xml version="1.0"?><root>Value</root> How can this be done? Thanks in advance!

How to add child nodes in NodeSet using Nokogiri

旧时模样 提交于 2019-11-30 05:20:49
I am trying to add child nodes under a root node .. I tried out with below xml but this doesn't work. I am newbie to Ruby and Nokogiri builder = Nokogiri::XML::Builder.with(@doc) do |xml| nodes = Nokogiri::XML::NodeSet.new(@doc, []) [].each {|nodes_one_by_one| << nodes_one_by_one.Book << nodes_one_by_one.Pen } end I need to add nodes below a root node like this <Catalog> <Book>abc</Book> <Book_Author>Benjamin</Book_author> That works good for me .. but what i exactly need is to add these Nodes after a specific position in the doc. <Catalog> <!-- <Book>abc</Book> <Book_Author>Benjamin</Book