How to prevent Nokogiri from adding <DOCTYPE> tags?

爱⌒轻易说出口 提交于 2019-11-27 21:52:55

The problem occurs because you're using the wrong method in Nokogiri to parse your content.

require 'nokogiri'

doc = Nokogiri::HTML('<p>foobar</p>')
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>foobar</p></body></html>

Rather than using HTML which results in a complete document, use HTML.fragment, which tells Nokogiri you only want the fragment parsed:

doc = Nokogiri::HTML.fragment('<p>foobar</p>')
puts doc.to_html
# >> <p>foobar</p>

The to_s method on a Nokogiri::HTML::Document outputs a valid HTML page, complete with its required elements. This is not necessarily what was passed in to the parser.

If you want to output less than a complete document, you use methods such as inner_html, inner_text, etc., on a node.

Edit: if you are not expecting to parse a complete, well-formed XML document as input, then theTinMan's answer is best.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!