PHP parsing invalid html

前端 未结 3 1137
情深已故
情深已故 2020-12-06 18:16

i\'m trying to parse some html that is not on my server

    $dom = new DOMDocument();
    $dom->loadHTMLfile(\"http://www.some-site.org/page.aspx\");              


        
相关标签:
3条回答
  • 2020-12-06 18:20

    Have a look at: libxml_use_internal_errors()

    http://php.net/libxml_use_internal_errors

    0 讨论(0)
  • 2020-12-06 18:27

    Reading the docs, I see a $dom->strictErrorChecking that defaults to TRUE. What happens if you set $dom->strictErrorChecking = false?

    0 讨论(0)
  • 2020-12-06 18:38

    You should run HTML Tidy on it to clean it up before parsing it.

    $html = file_get_contents('http://www.some-site.org/page.aspx');
    $config = array(
      'clean' => 'yes',
      'output-html' => 'yes',
    );
    $tidy = tidy_parse_string($html, $config, 'utf8');
    $tidy->cleanRepair();
    $dom = new DOMDocument;
    $dom->loadHTML($tidy);
    

    See this list of options.

    0 讨论(0)
提交回复
热议问题