How can I replace strings NOT within a link tag?

后端 未结 3 1808
我在风中等你
我在风中等你 2020-12-06 20:43

I am working on this PHP function. The idea is to wrap certain words occuring in a string into certain tags (both, words and tags, given in an array). It works OK!, but when

3条回答
  •  旧时难觅i
    2020-12-06 21:03

    Definitely use a dom parser to isolate the qualifying text nodes before attempting to replace with a regex pattern that respects: word boundries, case-insensitivity, and unicode characters. If you are planning to specifically target words with unicode characters, then you will need to add mb_ to some of the string functions.

    After leveraging the following insights, I tailored a solution for your scenario.

    • https://stackoverflow.com/a/64077957/2943403
    • https://stackoverflow.com/a/20675396/2943403

    Code: (Demo)

    $html = <<fóo lórem
    bár ipsum bar food foo bark. bar not á test
    HTML;
    
    $lookup = [
        'foo' => 'h3',
        'bar' => 'h2'
    ];
    
    libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    $xpath = new DOMXPath($dom);
    
    $regexNeedles = [];
    foreach ($lookup as $word => $tagName) {
        $regexNeedles[] = preg_quote($word, '~');
    }
    $pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;
    
    foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
        $newNodes = [];
        $hasReplacement = false;
        foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
            $fragmentLower = strtolower($fragment);
            if (isset($lookup[$fragmentLower])) {
                $hasReplacement = true;
                $a = $dom->createElement($lookup[$fragmentLower]);
                $a->nodeValue = $fragment;
                $newNodes[] = $a;
            } else {
                $newNodes[] = $dom->createTextNode($fragment);
            }
        }
        if ($hasReplacement) {
            $newFragment = $dom->createDocumentFragment();
            foreach ($newNodes as $newNode) {
                $newFragment->appendChild($newNode);
            }
            $textNode->parentNode->replaceChild($newFragment, $textNode);
        }
    }
    echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);
    

    Output:

    foo

    fóo lórem bár ipsum

    bar

    food

    foo

    bark. bar not á test

提交回复
热议问题