How to parse actual HTML from page using CURL?

后端 未结 3 918
无人共我
无人共我 2020-12-15 12:52

I am \"attempting\" to scrape a web page that has the following structures within the page:

stuff here

相关标签:
3条回答
  • 2020-12-15 13:32

    you might want to take a look at phpQuery for doing server-side HTML parsing things. basic example

    0 讨论(0)
  • 2020-12-15 13:43

    You can pass a node to DOMDocument::saveXML(). Try this:

    $printString = $newDom->saveXML($sections->item($i));

    0 讨论(0)
  • 2020-12-15 13:49

    According to comments on the PHP manual on DOM, you should use the following inside your loop:

        $tmp_dom = new DOMDocument();
        $tmp_dom->appendChild($tmp_dom->importNode($sections->item($i), true));
        $innerHTML = trim($tmp_dom->saveHTML()); 
    

    This will set $innerHTML to be the HTML content of the node.

    But I think what you really want is to get the 'a' nodes under the 'p' node, so do this:

    $sections = $newDom->getElementsByTagName('p');
    $nodeNo = $sections->length;
    for($i=0; $i<$nodeNo; $i++) {
        $sec = $sections->item($i);
        $links = $sec->getElementsByTagName('a');
        $linkNo = $links->length;
        for ($j=0; $j<$linkNo; $j++) {
            $printString = $links->item($j)->nodeValue;
            echo $printString . "<br>";
        }
    }
    

    This will just print the body of each link.

    0 讨论(0)
提交回复
热议问题