What is the absolutely cheapest way to select a child node in Nokogiri?

自古美人都是妖i 提交于 2019-12-23 01:37:28

问题


I know that there are dozens of ways to select the first child element in Nokogiri, but which is the cheapest? I can't get around using Node#children, which sounds awfully expensive. Say that there are 10000 child nodes, and I don't want to touch the 9999 others...


回答1:


Node#child is the fastest way to get the first child element.

However, if the node you're looking for is NOT the first (e.g., the 99th), then there is no faster way to select that node than to call #children and index into it.

You are correct in stating that it's expensive to build a NodeSet for all children if you only want the first one.

One limiting factor is that libxml2 (the XML library underlying Nokogiri) stores a node's children as a linked list. So you'll need to traverse the list (O(n)) to select the desired child node.

It would be feasible to write a method to simply return the nth child, without instantiating a NodeSet or even ruby objects for all the other children. My advice would be to open a feature request, at http://github.com/tenderlove/nokogiri/issues or send an email to the nokogiri mailing list.




回答2:


You can try it yourself and benchmark the result.

I created a quick benchmark: http://gist.github.com/283825

$ ruby test.rb 
Rehearsal ---------------------------------------------------
xpath/first()     3.290000   0.030000   3.320000 (  3.321197)
xpath.first       3.360000   0.010000   3.370000 (  3.381171)
at                4.540000   0.020000   4.560000 (  4.564249)
at_xpath          3.420000   0.010000   3.430000 (  3.430933)
children.second   0.220000   0.010000   0.230000 (  0.233090)
----------------------------------------- total: 14.910000sec

                      user     system      total        real
xpath/first()     3.280000   0.000000   3.280000 (  3.288647)
xpath.first       3.350000   0.020000   3.370000 (  3.374778)
at                4.530000   0.040000   4.570000 (  4.580512)
at_xpath          3.410000   0.010000   3.420000 (  3.421551)
children.second   0.220000   0.010000   0.230000 (  0.226846)

From my tests, children appears to be the fastest method.




回答3:


An approach that neither uses XPath nor results in parsing the whole parent is to use both Node#child(), Node#next_sibling() and Node#element?()

Something like this...

def first(node)
    element = node.child
    while element
       if element.element?
           return element
       else
           element = element.next
       end
    end
    nil
end


来源:https://stackoverflow.com/questions/2116629/what-is-the-absolutely-cheapest-way-to-select-a-child-node-in-nokogiri

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!