Parse table using Nokogiri

后端 未结 3 1288
一向
一向 2021-01-06 13:16

I would like to parse a table using Nokogiri. I\'m doing it this way

def parse_table_nokogiri(html)

    doc = Nokogiri::HTML(html)

    doc.search(\'table &         


        
3条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-06 14:14

    Use:

    td//text()[normalize-space()]
    

    This selects all non-white-space-only text node descendents of any td child of the current node (the tr already selected in your code).

    Or if you want to select all text-node descendents, regardles whether they are white-space-only or not:

    td//text()
    

    UPDATE:

    The OP has signaled in a comment that he is getting an unwanted td with content just a ' ' (aka non-breaking space).

    To exclude also tds whose content is composed only of (one or more) nbsp characters, use:

    td//text()[translate(normalize-space(), ' ', '')]
    

提交回复
热议问题