How to prevent Nokogiri from adding tags?

前端 未结 2 1186
-上瘾入骨i
-上瘾入骨i 2020-12-05 17:14

I noticed something strange using Nokogiri recently. All of the HTML I had been parsing had been given start and end and

相关标签:
2条回答
  • 2020-12-05 17:51

    The to_s method on a Nokogiri::HTML::Document outputs a valid HTML page, complete with its required elements. This is not necessarily what was passed in to the parser.

    If you want to output less than a complete document, you use methods such as inner_html, inner_text, etc., on a node.

    Edit: if you are not expecting to parse a complete, well-formed XML document as input, then theTinMan's answer is best.

    0 讨论(0)
  • 2020-12-05 18:05

    The problem occurs because you're using the wrong method in Nokogiri to parse your content.

    require 'nokogiri'
    
    doc = Nokogiri::HTML('<p>foobar</p>')
    puts doc.to_html
    # >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    # >> <html><body><p>foobar</p></body></html>
    

    Rather than using HTML which results in a complete document, use HTML.fragment, which tells Nokogiri you only want the fragment parsed:

    doc = Nokogiri::HTML.fragment('<p>foobar</p>')
    puts doc.to_html
    # >> <p>foobar</p>
    
    0 讨论(0)
提交回复
热议问题