DOMDocument for parsing HTML (instead of regex)

前端 未结 2 704
无人共我
无人共我 2020-12-12 01:58

I am trying to learn using DOMDocument for parsing HTML code.

I am just doing some simple work, I already liked gordon\'s answer on scrap data using regex and simpl

2条回答
  •  执念已碎
    2020-12-12 02:51

    You shouldn't bother with the raw DOMDocument interface. Rather use one of the jQuery-style classes for extraction. How to parse HTML with PHP?

    QueryPath seems to work fine if you use more specific selectors:

    include "qp.phar";
    $qp = htmlqp("http://www.nu.nl/internet/1106541/taalunie-keurt-open-sourcewoordenlijst-goed.html");
    
    print $qp->find(".header h1")->text();
    print $qp->top()->find(".article .content")->xhtml();
    

    You might need to strip the intermingled Javascript before however (->find("script")->remove()).

提交回复
热议问题