问题
The Problem
I am running some statistics against various URLS. I want to find the top level element with the most concentrated number of children. The method that I would like to follow is to identify all top level elements and then determine what percentage of all the elements on the page belong to it.
Goal
- Recursively get all children of a given element.
Inputs: a Nokogiri Element
Outputs: an array of Nokogiri Elements OR the count of total number of children
Setup
- Ruby 1.9.2
- Nokogiri gem
What I ended up coming up with (this works, but isn't as pretty as my chosen answer below)
getChildCount(elem)
children = elem.children
return 0 unless children and children.count > 0
child_count = children.count
children.each do |child|
child_count += getChildCount(child)
end
child_count
end
回答1:
the traverse method yields the current node and all children to a block, recursively.
# if you would like it to be returned as an array, rather than each node being yielded to a block, you can do this
result = []
doc.traverse {|node| result << node }
result
# or,
require 'enumerator'
result = doc.enum_for(:traverse).map
回答2:
# Non-recursive
class Nokogiri::XML::Node
def descendant_elements
xpath('.//*')
end
end
# Recursive 1
class Nokogiri::XML::Node
def descendant_elements
element_children.map{ |kid|
[kid, kid.descendant_elements]
}.flatten
end
end
# Recursive 2
class Nokogiri::XML::Node
def descendant_elements
kids = element_children.to_a
kids.concat(kids.map(&:descendant_elements)).flatten
end
end
来源:https://stackoverflow.com/questions/10076190/nokogiri-recursively-get-all-children