问题
I'm collecting some HTML formatted content from a web form. Before saving this HTML content, I'd like to do a quick sanity check on it to make sure it looks well-formed (no unclosed tags, no invalid markup).
Using Ruby and/or with any popular gems, can I check an HTML fragment string like:
<p>foo</p><h1>Unclosed H1<p>bar</p>
and discover things like the unclosed h1
tag?
I thought Nokogiri would come to my rescue here, but no:
>> Nokogiri::HTML::DocumentFragment.parse("<p>foo</p><h1>Unclosed H1<p>bar</p>").errors
=> []
回答1:
Have you tried w3c_validators?
[1] pry(main)> require 'w3c_validators'
=> true
[2] pry(main)> include W3CValidators
=> Object
[3] pry(main)> p MarkupValidator.new.validate_text('<!DOCTYPE html><html><body><p>foo</p><h1>Unclosed H1<p>bar</p></body></html>');
This gives you very detailed validation result.
Or maybe if you want to just check closing tags, maybe Nokogiri::XML::Document.parse().errors
instead, but this probably doesn't work unless doctype is XHTML since a few HTML elements in other doctype don't require even closing tag so. w3c_validators
does better
来源:https://stackoverflow.com/questions/11661517/how-can-i-detect-errors-in-an-html-document-fragment-with-ruby