How to parse actual HTML from page using CURL?

后端 未结 3 952
无人共我
无人共我 2020-12-15 12:52

I am \"attempting\" to scrape a web page that has the following structures within the page:

stuff here

3条回答
  •  清酒与你
    2020-12-15 13:49

    According to comments on the PHP manual on DOM, you should use the following inside your loop:

        $tmp_dom = new DOMDocument();
        $tmp_dom->appendChild($tmp_dom->importNode($sections->item($i), true));
        $innerHTML = trim($tmp_dom->saveHTML()); 
    

    This will set $innerHTML to be the HTML content of the node.

    But I think what you really want is to get the 'a' nodes under the 'p' node, so do this:

    $sections = $newDom->getElementsByTagName('p');
    $nodeNo = $sections->length;
    for($i=0; $i<$nodeNo; $i++) {
        $sec = $sections->item($i);
        $links = $sec->getElementsByTagName('a');
        $linkNo = $links->length;
        for ($j=0; $j<$linkNo; $j++) {
            $printString = $links->item($j)->nodeValue;
            echo $printString . "
    "; } }

    This will just print the body of each link.

提交回复
热议问题