I am aware that regex is not ideal for use with HTML strings and I have looked at the PHP Simple HTML DOM Parser but still believe this is the way to go. All the HTML tags w
Don't do this. You cannot reliably do this with Regex, no matter how consistent your HTML is.
Something like this should work, however:
load('test.xml');
$x = new DOMXPath($dom);
$nodes = $x->query("//text()[contains(., 'Amazon')][not(ancestor::a)]");
foreach ($nodes as $node) {
while (false !== strpos($node->nodeValue, 'Amazon')) {
$word = $node->splitText(strpos($node->nodeValue, 'Amazon'));
$after = $word->splitText(6);
$link = $dom->createElement('a');
$link->setAttribute('href', 'http://www.amazon.com');
$word->parentNode->replaceChild($link, $word);
$link->appendChild($word);
$node = $after;
}
}
$html = $dom->saveHTML();
echo $html;
It's verbose, but it will actually work.