Update: html5lib (bottom of question) seems to get close, I just need to improve my understanding of how it\'s used.
html5lib
I am attempting to
I had the same problem and apparently you can hack your way trough this by loading the document as XML, and save it as HTML :)
$d = new DOMDocument; $d->loadXML(''); echo $d->saveHTML();
But of course the markup must be error-free for loadXML to work.