Extracting node values using XPath

后端 未结 3 1402
不知归路
不知归路 2021-01-24 18:42

There is a section of amazon.com from which I want to extract the data (node value only, not the link) for each item.

The value I\'m looking for is inside and <

3条回答
  •  渐次进展
    2021-01-24 19:12

    If you need to grap the categories names:

    // Suppress invalid markup warnings
    libxml_use_internal_errors(true);
    
    // Create SimpleXML object
    $doc = new DOMDocument();
    $doc->strictErrorChecking = false;
    $doc->loadHTML($html); // $html - string fetched by CURL 
    $xml = simplexml_import_dom($doc);
    
    // Find a category nodes
    $categories = $xml->xpath("//span[@class='refinementLink']");
    


    EDIT. Using DOMDocument

    $doc = new DOMDocument();
    $doc->strictErrorChecking = false;
    $doc->loadHTML($html);
    
    $xpath = new DOMXPath($doc);
    
    // Select the parent node
    $categories = $xpath->query("//span[@class='refinementLink']/..");
    
    foreach ($categories as $category) {
        echo '
    ';
        echo $category->childNodes->item(1)->firstChild->nodeValue; 
        echo $category->childNodes->item(2)->firstChild->nodeValue;
        echo '
    '; // Crafts, Hobbies & Home (19) }

提交回复
热议问题