DOMDocument remove script tags from HTML source

后端 未结 2 415
后悔当初
后悔当初 2021-01-11 11:03

I used @Alex\'s approach here to remove script tags from a HTML document using the built in DOMDocument. The problem is if I have a script tag with Javascript content and th

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-11 11:21

    Your error is actually trivial. A DOMNode object (and all its descendants - DOMElement, DOMNodeList and a few others!) is automatically updated when its parent element changes, most notably when its number of children change. This is written on a couple of lines in the PHP doc, but is mostly swept under the carpet.

    If you loop using ($k instanceof DOMNode)->length, and subsequently remove elements from the nodes, you'll notice that the length property actually changes! I had to write my own library to counteract this and a few other quirks.

    The solution:

    if($dom->loadHTML($result))
    {
        while (($r = $dom->getElementsByTagName("script")) && $r->length) {
                $r->item(0)->parentNode->removeChild($r->item(0));
        }
    echo $dom->saveHTML();
    

    I'm not actually looping - just popping the first element one at a time. The result: http://sebrenauld.co.uk/domremovescript.php

提交回复
热议问题