Update: html5lib
(bottom of question) seems to get close, I just need to improve my understanding of how it\'s used.
I am attempting to
Re: html5lib
You click on the download tab and download the PHP version of the parser.
You untar the archive in a local folder
tar -zxvf html5lib-php-0.1.tar.gz
x html5lib-php-0.1/
x html5lib-php-0.1/VERSION
x html5lib-php-0.1/docs/
... etc
You change directories and create a file named hello.php
cd html5lib-php-0.1
touch hello.php
You place the following PHP code in hello.php
$html = '
';
$dom = HTML5_Parser::parse($html);
var_dump($dom->saveXml());
echo "\nDone\n";
You run hello.php
from the command line
php hello.php
The parser will parse the document tree, and return a DOMDocument object, which can be manipulated as any other DOMDocument object.