How can I replace strings NOT within a link tag?

后端 未结 3 1803
我在风中等你
我在风中等你 2020-12-06 20:43

I am working on this PHP function. The idea is to wrap certain words occuring in a string into certain tags (both, words and tags, given in an array). It works OK!, but when

相关标签:
3条回答
  • 2020-12-06 21:03

    Definitely use a dom parser to isolate the qualifying text nodes before attempting to replace with a regex pattern that respects: word boundries, case-insensitivity, and unicode characters. If you are planning to specifically target words with unicode characters, then you will need to add mb_ to some of the string functions.

    After leveraging the following insights, I tailored a solution for your scenario.

    • https://stackoverflow.com/a/64077957/2943403
    • https://stackoverflow.com/a/20675396/2943403

    Code: (Demo)

    $html = <<<HTML
    foo <a href='http://test.com'>fóo</a> lórem
    bár ipsum bar food foo bark. <a>bar</a> not á test
    HTML;
    
    $lookup = [
        'foo' => 'h3',
        'bar' => 'h2'
    ];
    
    libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    $xpath = new DOMXPath($dom);
    
    $regexNeedles = [];
    foreach ($lookup as $word => $tagName) {
        $regexNeedles[] = preg_quote($word, '~');
    }
    $pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;
    
    foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
        $newNodes = [];
        $hasReplacement = false;
        foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
            $fragmentLower = strtolower($fragment);
            if (isset($lookup[$fragmentLower])) {
                $hasReplacement = true;
                $a = $dom->createElement($lookup[$fragmentLower]);
                $a->nodeValue = $fragment;
                $newNodes[] = $a;
            } else {
                $newNodes[] = $dom->createTextNode($fragment);
            }
        }
        if ($hasReplacement) {
            $newFragment = $dom->createDocumentFragment();
            foreach ($newNodes as $newNode) {
                $newFragment->appendChild($newNode);
            }
            $textNode->parentNode->replaceChild($newFragment, $textNode);
        }
    }
    echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);
    

    Output:

    <h3>foo</h3> <a href="http://test.com">fóo</a> lórem
    bár ipsum <h2>bar</h2> food <h3>foo</h3> bark. <a>bar</a> not á test
    
    0 讨论(0)
  • 2020-12-06 21:08

    To the answer you pointed, in JS, it's basically the same. You just have to specify it's a string.

    $regexp = "/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi";
    

    Also note that you may be need another preg_replace function to replace the word 'empresarios' in case it's capitalized (Empresarios) or like weird stuff (EmPreSAriOS).

    Also take care of your HTML. <h2> are block elements and may be interpretated this way:

    string where the word empresarios should be replaced;

    And replaced

    string where the word

    empresarios

    should be replaced;

    Maybe what you'll need to use is a <big> tag.

    0 讨论(0)
  • 2020-12-06 21:11

    Use the DOM and only modify text nodes:

    $s = "foo <a href='http://test.com'>foo</a> lorem bar ipsum foo. <a>bar</a> not a test";
    echo htmlentities($s) . '<hr>';
    
    $d = new DOMDocument;
    $d->loadHTML($s);
    
    $x = new DOMXPath($d);
    $t = $x->evaluate("//text()");
    
    $wrap = array(
        'foo' => 'h1',
        'bar' => 'h2'
    );
    
    $preg_find = '/\b(' . implode('|', array_keys($wrap)) . ')\b/';
    
    foreach($t as $textNode) {
        if( $textNode->parentNode->tagName == "a" ) {
            continue;
        }
    
        $sections = preg_split( $preg_find, $textNode->nodeValue, null, PREG_SPLIT_DELIM_CAPTURE);
    
        $parentNode = $textNode->parentNode;
    
        foreach($sections as $section) {  
            if( !isset($wrap[$section]) ) {
                $parentNode->insertBefore( $d->createTextNode($section), $textNode );
                continue;
            }
    
            $tagName = $wrap[$section];
            $parentNode->insertBefore( $d->createElement( $tagName, $section ), $textNode );
        }
    
        $parentNode->removeChild( $textNode );
    }
    
    echo htmlentities($d->saveHTML());
    

    Edited to replace DOMText with DOMText and DOMElement as necessary.

    0 讨论(0)
提交回复
热议问题