How can I detect errors in an HTML document fragment with Ruby?

痴心易碎 提交于 2019-12-11 01:36:14

问题


I'm collecting some HTML formatted content from a web form. Before saving this HTML content, I'd like to do a quick sanity check on it to make sure it looks well-formed (no unclosed tags, no invalid markup).

Using Ruby and/or with any popular gems, can I check an HTML fragment string like:

<p>foo</p><h1>Unclosed H1<p>bar</p>

and discover things like the unclosed h1 tag?

I thought Nokogiri would come to my rescue here, but no:

>> Nokogiri::HTML::DocumentFragment.parse("<p>foo</p><h1>Unclosed H1<p>bar</p>").errors
=> []

回答1:


Have you tried w3c_validators?

[1] pry(main)> require 'w3c_validators'
=> true
[2] pry(main)> include W3CValidators
=> Object
[3] pry(main)> p MarkupValidator.new.validate_text('<!DOCTYPE html><html><body><p>foo</p><h1>Unclosed H1<p>bar</p></body></html>');

This gives you very detailed validation result.

Or maybe if you want to just check closing tags, maybe Nokogiri::XML::Document.parse().errors instead, but this probably doesn't work unless doctype is XHTML since a few HTML elements in other doctype don't require even closing tag so. w3c_validators does better



来源:https://stackoverflow.com/questions/11661517/how-can-i-detect-errors-in-an-html-document-fragment-with-ruby

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!