How do I pretty-print HTML with Nokogiri?

前端 未结 7 960
梦如初夏
梦如初夏 2020-12-01 10:58

I wrote a web crawler in Ruby and I\'m using Nokogiri::HTML to parse the page. I need to print the page out and while messing around in IRB I noticed a pr

7条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-01 11:21

    The answer by @mislav is somewhat wrong. Nokogiri does support pretty-printing if you:

    • Parse the document as XML
    • Instruct Nokogiri to ignore whitespace-only nodes ("blanks") during parsing
    • Use to_xhtml or to_xml to specify pretty-printing parameters

    In action:

    html = '

    Main Section 1

    Intro

    Subhead 1.1

    Meat

    MOAR MEAT

    Subhead 1.2

    Meat

    ' require 'nokogiri' doc = Nokogiri::XML(html,&:noblanks) puts doc #=>
    #=>

    Main Section 1

    #=>

    Intro

    #=>
    #=>

    Subhead 1.1

    #=>

    Meat

    #=>

    MOAR MEAT

    #=>
    #=>
    #=>

    Subhead 1.2

    #=>

    Meat

    #=>
    #=>
    puts doc.to_xhtml( indent:3, indent_text:"." ) #=>
    #=> ...

    Main Section 1

    #=> ...

    Intro

    #=> ...
    #=> ......

    Subhead 1.1

    #=> ......

    Meat

    #=> ......

    MOAR MEAT

    #=> ...
    #=> ...
    #=> ......

    Subhead 1.2

    #=> ......

    Meat

    #=> ...
    #=>

提交回复
热议问题