PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity

和自甴很熟 提交于 2019-12-22 03:21:47

问题


I trying to get the "link" elements from certain webpages. I can't figure out what i'm doing wrong though. I'm getting the following error:

Severity: Warning

Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 536

Filename: controllers/test.php

Line Number: 34

Line 34 is the following in the code:

      $dom->loadHTML($html);

my code:

            $url = "http://www.amazon.com/";

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    if($html = curl_exec($ch)){

        // parse the html into a DOMDocument
        $dom = new DOMDocument();

        $dom->recover = true;
        $dom->strictErrorChecking = false;

        $dom->loadHTML($html);

        $hrefs = $dom->getElementsByTagName('a');

        echo "<pre>";
        print_r($hrefs);
        echo "</pre>";

        curl_close($ch);


    }else{
        echo "The website could not be reached.";
    }

回答1:


It means some of the HTML code is invalid. THis is just a warning, not an error. Your script will still process it. To suppress the warnings set

 libxml_use_internal_errors(true);

Or you could just completely suppress the warning by doing

@$dom->loadHTML($html);



回答2:


This may be caused by a rogue & symbol that is immediately succeeded by a proper tag. As otherwise you would receive a missing ; error. See: Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,.

The solution is to - replace the & symbol with &amp;
or if you must have that & as it is then, may be you could enclose it in: <![CDATA[ - ]]>




回答3:


The HTML is poorly formed. If formed poorly enough loading the HTML into the DOM Document might even fail. If loadHTML is not working then suppressing the errors is pointless. I suggest using a tool like HTML Tidy to "clean up" the poorly formed HTML if you are unable to load the HTML into the DOM.

HTML Tidy can be found here http://www.htacg.org/tidy-html5/



来源:https://stackoverflow.com/questions/12328322/php-domdocumentloadhtml-domdocument-loadhtml-htmlparseentityref-no-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!