Regex to match words or phrases in string but NOT match if part of a URL or inside tags. (php)

后端 未结 7 1177
终归单人心
终归单人心 2020-12-06 23:20

I am aware that regex is not ideal for use with HTML strings and I have looked at the PHP Simple HTML DOM Parser but still believe this is the way to go. All the HTML tags w

7条回答
  •  佛祖请我去吃肉
    2020-12-06 23:46

    Don't do this. You cannot reliably do this with Regex, no matter how consistent your HTML is.

    Something like this should work, however:

    load('test.xml');
    $x = new DOMXPath($dom);
    
    $nodes = $x->query("//text()[contains(., 'Amazon')][not(ancestor::a)]");
    
    foreach ($nodes as $node) {
        while (false !== strpos($node->nodeValue, 'Amazon')) {
            $word = $node->splitText(strpos($node->nodeValue, 'Amazon'));
            $after = $word->splitText(6);
    
            $link = $dom->createElement('a');
            $link->setAttribute('href', 'http://www.amazon.com');
    
            $word->parentNode->replaceChild($link, $word);
            $link->appendChild($word);
    
            $node = $after;
        }
    }
    
    $html = $dom->saveHTML();
    echo $html;
    

    It's verbose, but it will actually work.

提交回复
热议问题