PHP SimpleXML: How can I load an HTML file?

十年热恋 提交于 2019-12-22 11:16:07

问题


When I try to load an HTML file as XML using simplexml_load_string I get many errors and warnings regarding the HTML and it fails, it there a way to properly load an html file using SimpleXML?

This HTML file may have unneeded spaces and maybe some other errors that I would like SimpleXML to ignore.


回答1:


I would suggest using PHP Simple HTML DOM. I've used it myself for anything from page scraping to manipulating HTML template files and its very simple and quite powerful and should suit your needs just fine.

Here's a few examples from their docs that show the kind of things you can do:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 



回答2:


Use DomDocument::loadHtmlFile together with simplexml_import_dom to load non-wellformed HTML pages into SimpleXML.




回答3:


check this man page, one of those options (LIBXML_NOERROR for example) might help you.. but keep in mind that a html is not necessarily a valid xml, so parsing it as xml might not work.




回答4:


Here's some quick code to load an external html page, then parse it with simple xml.

    //suppresses errors generated by poorly-formed xml
    libxml_use_internal_errors(true);

    //create the html object
    $html = new DOMDocument();

    //load the external html file
    $html->loadHtmlFile('http://blahwhatever.com/');

    //import the HTML object into simple xml
    $shtml = simplexml_import_dom($html);

    //print the result
    echo "<pre>";
    print_r($shtml);
    echo "</pre>";


来源:https://stackoverflow.com/questions/3178350/php-simplexml-how-can-i-load-an-html-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!