I am trying to learn using DOMDocument for parsing HTML code.
I am just doing some simple work, I already liked gordon\'s answer on scrap data using regex and simpl
You shouldn't bother with the raw DOMDocument interface. Rather use one of the jQuery-style classes for extraction. How to parse HTML with PHP?
QueryPath seems to work fine if you use more specific selectors:
include "qp.phar";
$qp = htmlqp("http://www.nu.nl/internet/1106541/taalunie-keurt-open-sourcewoordenlijst-goed.html");
print $qp->find(".header h1")->text();
print $qp->top()->find(".article .content")->xhtml();
You might need to strip the intermingled Javascript before however (->find("script")->remove()
).