Nokogiri recursively get all children

北城以北 提交于 2019-12-03 16:59:35

问题


The Problem

I am running some statistics against various URLS. I want to find the top level element with the most concentrated number of children. The method that I would like to follow is to identify all top level elements and then determine what percentage of all the elements on the page belong to it.

Goal

  • Recursively get all children of a given element.

Inputs: a Nokogiri Element

Outputs: an array of Nokogiri Elements OR the count of total number of children

Setup

  • Ruby 1.9.2
  • Nokogiri gem

What I ended up coming up with (this works, but isn't as pretty as my chosen answer below)

getChildCount(elem)
    children = elem.children
    return 0 unless children and children.count > 0
    child_count = children.count
    children.each do |child|
        child_count += getChildCount(child)
    end
    child_count
end

回答1:


the traverse method yields the current node and all children to a block, recursively.

# if you would like it to be returned as an array, rather than each node being yielded to a block, you can do this
result = []
doc.traverse {|node| result << node }
result

# or, 
require 'enumerator'
result = doc.enum_for(:traverse).map



回答2:


# Non-recursive
class Nokogiri::XML::Node
  def descendant_elements
    xpath('.//*')
  end
end

# Recursive 1
class Nokogiri::XML::Node
  def descendant_elements
    element_children.map{ |kid|
      [kid, kid.descendant_elements]
    }.flatten
  end
end

# Recursive 2
class Nokogiri::XML::Node
  def descendant_elements
    kids = element_children.to_a
    kids.concat(kids.map(&:descendant_elements)).flatten
  end
end


来源:https://stackoverflow.com/questions/10076190/nokogiri-recursively-get-all-children

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!