Regex to match words or phrases in string but NOT match if part of a URL or inside tags. (php)

后端未结

关注

 7  1177

终归单人心 2020-12-06 23:20

I am aware that regex is not ideal for use with HTML strings and I have looked at the PHP Simple HTML DOM Parser but still believe this is the way to go. All the HTML tags w

7条回答

佛祖请我去吃肉 (楼主)

2020-12-06 23:46

Don't do this. You cannot reliably do this with Regex, no matter how consistent your HTML is.

Something like this should work, however:

load('test.xml');
$x = new DOMXPath($dom);

$nodes = $x->query("//text()[contains(., 'Amazon')][not(ancestor::a)]");

foreach ($nodes as $node) {
    while (false !== strpos($node->nodeValue, 'Amazon')) {
        $word = $node->splitText(strpos($node->nodeValue, 'Amazon'));
        $after = $word->splitText(6);

        $link = $dom->createElement('a');
        $link->setAttribute('href', 'http://www.amazon.com');

        $word->parentNode->replaceChild($link, $word);
        $link->appendChild($word);

        $node = $after;
    }
}

$html = $dom->saveHTML();
echo $html;

It's verbose, but it will actually work.

0 讨论(0)

查看其它7个回答