DOMDocument Parse html

倖福魔咒の 提交于 2019-12-13 08:02:45

问题


I have one html page where there are number of <tr><td> elements like

<tr>
<td class="notextElementLabel width100">address:</td>
<td style="width: 100%" colspan="1" class="formFieldelement"><b>12284,CA</b></td>
</tr>

let say the above <tr> is at 4th position means before this elements there are 3 more <tr>

Now I want to get the value of address so I am doing

$doc = new DOMDocument();
    @$doc->loadHTML($this->siteHtmlData);
    $tdElements = $doc->getElementsByTagName("td");
    $i=0;
    foreach ($tdElements as $node) {
        if(trim($node->nodeValue) == 'address:'){
            echo "\n\ngot it\n\n";
        }else{
            echo "\n\n---no ---\n\n";
        }

    }

How can I get the value of "12284,CA". Please guide.

Thanks


回答1:


In your case, the logic behind your query is simple enough that it can be expressed entirely in XPath syntax:

//td[text()="address:"]/following-sibling::td/b/text()

This finds any <td> node that has a text equal to "address:", grabs the following <td>, goes into the <b> inside it and gets you the text it finds there.

That means you can do

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
echo $xpath->evaluate('string(//td[text()="address:"]/following-sibling::td/b)');

It will immediately output the result you are looking for.




回答2:


You have to get the <tr> elements, then parse its children, similar to:

$trElements = $doc->getElementsByTagName("tr");
foreach ($trElements as $node) {
    $children = $node->childNodes;
    foreach( $children as $child)
        echo $child->textContent; // or $child->nodeValue
}

This outputs: address: 12284,CA

Now, if there are more <tr> elements that are not the address, you will need to parse the $children list of nodes to make sure you find "address:", and then once you do, you know the value of next child is the value you're looking for.




回答3:


I got the answer by myself which is similar to nickb's answer

$tdElements = $doc->getElementsByTagName("td");
    $tdCnt  = $tdElements->length;

    for ($idx = 0; $idx < $tdCnt; $idx++) {

        if(trim($tdElements->item($idx)->nodeValue) == 'address:'){
            echo $tdElements->item($idx+1)->nodeValue;
        }
    }

Hope it will helps



来源:https://stackoverflow.com/questions/11138158/domdocument-parse-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!