What is the absolutely cheapest way to select a child node in Nokogiri?

淺唱寂寞╮ 提交于 2019-12-08 13:10:37

Node#child is the fastest way to get the first child element.

However, if the node you're looking for is NOT the first (e.g., the 99th), then there is no faster way to select that node than to call #children and index into it.

You are correct in stating that it's expensive to build a NodeSet for all children if you only want the first one.

One limiting factor is that libxml2 (the XML library underlying Nokogiri) stores a node's children as a linked list. So you'll need to traverse the list (O(n)) to select the desired child node.

It would be feasible to write a method to simply return the nth child, without instantiating a NodeSet or even ruby objects for all the other children. My advice would be to open a feature request, at http://github.com/tenderlove/nokogiri/issues or send an email to the nokogiri mailing list.

You can try it yourself and benchmark the result.

I created a quick benchmark: http://gist.github.com/283825

$ ruby test.rb 
Rehearsal ---------------------------------------------------
xpath/first()     3.290000   0.030000   3.320000 (  3.321197)
xpath.first       3.360000   0.010000   3.370000 (  3.381171)
at                4.540000   0.020000   4.560000 (  4.564249)
at_xpath          3.420000   0.010000   3.430000 (  3.430933)
children.second   0.220000   0.010000   0.230000 (  0.233090)
----------------------------------------- total: 14.910000sec

                      user     system      total        real
xpath/first()     3.280000   0.000000   3.280000 (  3.288647)
xpath.first       3.350000   0.020000   3.370000 (  3.374778)
at                4.530000   0.040000   4.570000 (  4.580512)
at_xpath          3.410000   0.010000   3.420000 (  3.421551)
children.second   0.220000   0.010000   0.230000 (  0.226846)

From my tests, children appears to be the fastest method.

An approach that neither uses XPath nor results in parsing the whole parent is to use both Node#child(), Node#next_sibling() and Node#element?()

Something like this...

def first(node)
    element = node.child
    while element
       if element.element?
           return element
       else
           element = element.next
       end
    end
    nil
end
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!