DOMXpath - Get href attribute and text value of an a element

三世轮回 提交于 2019-12-17 07:31:15


So I have a HTML string like this:

<td class="name">
   <a href="/blah/somename23123">Some Name</a>
<td class="name">
   <a href="/blah/somename28787">Some Name2</a>

Using XPath I'm able to get value of href attribute using this Xpath query:

 $domXpath = new \DOMXPath($this->domPage);
 $hrefs = $domXpath->query("//td[@class='name']/a/@href");
 foreach($hrefs as $href) {...}

And It's even easier to get a text value, like this:

 // Xpath auto. strips any html tags so we are 
 // left with clean text value of a element
 $domXpath = new \DOMXPath($this->domPage);
 $names = $domXpath->query("//td[@class='name']/");
 foreach($names as $name) {...}

Now I'm curious to know, how can I combine those two queries to get both values with only one query (If it's something like that even posible?).




and then pluck the text with nodeValue and the attribute with getAttribute('href').

Apart from that, you can combine Xpath queries with the Union Operator | so you can use


as well.


To reduce the code to a single loop, try:

$anchors = $domXpath->query("//td[@class='name']/a");
foreach($anchors as $a)
    print $a->nodeValue." - ".$a->getAttribute("href")."<br/>";

As per above :) Too slow ..


Simplest way, evaluate is for this task!

The simplest way to obtain a value is by evaluate() method:

$xp = new DOMXPath($dom);
$v = $xp->evaluate("string(/etc[1]/@stringValue)");

Note: important to limit XPath returns to 1 item (the first a in this case), and cast the value with string() or round(), etc.

So, in a set of multiple items, using your foreach code,

 $names = $domXpath->query("//td[@class='name']/");
 foreach($names as $contextNode) {
    $text = $domXpath->evaluate("string(./a[1])",$contextNode);
    $href = $domXpath->evaluate("string(./a[1]/@href)",$contextNode);

PS: this example is only for evaluate's illustration... When the information already exists at the node, use what offers best performance, as methods getAttribute(), saveXML(), etc. and properties as $nodeValue, $textContent, etc. supplied by DOMNode.
See @Gordon's answer for this particular problem.
The XPath subquery (at context) is good for complex cases — or symplify your code, avoiding to check hasChildNodes() + loop for $childNodes, etc. with no significative gain in performance.

