How can I replace strings NOT within a link tag?

后端未结

关注

 3  1803

I am working on this PHP function. The idea is to wrap certain words occuring in a string into certain tags (both, words and tags, given in an array). It works OK!, but when

相关标签:

3条回答

旧时难觅i

2020-12-06 21:03

Definitely use a dom parser to isolate the qualifying text nodes before attempting to replace with a regex pattern that respects: word boundries, case-insensitivity, and unicode characters. If you are planning to specifically target words with unicode characters, then you will need to add mb_ to some of the string functions.

After leveraging the following insights, I tailored a solution for your scenario.

https://stackoverflow.com/a/64077957/2943403
https://stackoverflow.com/a/20675396/2943403

Code: (Demo)

$html = <<<HTML
foo <a href='http://test.com'>fóo</a> lórem
bár ipsum bar food foo bark. <a>bar</a> not á test
HTML;

$lookup = [
    'foo' => 'h3',
    'bar' => 'h2'
];

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

$regexNeedles = [];
foreach ($lookup as $word => $tagName) {
    $regexNeedles[] = preg_quote($word, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;

foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $newNodes = [];
    $hasReplacement = false;
    foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
        $fragmentLower = strtolower($fragment);
        if (isset($lookup[$fragmentLower])) {
            $hasReplacement = true;
            $a = $dom->createElement($lookup[$fragmentLower]);
            $a->nodeValue = $fragment;
            $newNodes[] = $a;
        } else {
            $newNodes[] = $dom->createTextNode($fragment);
        }
    }
    if ($hasReplacement) {
        $newFragment = $dom->createDocumentFragment();
        foreach ($newNodes as $newNode) {
            $newFragment->appendChild($newNode);
        }
        $textNode->parentNode->replaceChild($newFragment, $textNode);
    }
}
echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);

Output:

<h3>foo</h3> <a href="http://test.com">fóo</a> lórem
bár ipsum <h2>bar</h2> food <h3>foo</h3> bark. <a>bar</a> not á test

0 讨论(0)

傲寒

2020-12-06 21:08
To the answer you pointed, in JS, it's basically the same. You just have to specify it's a string.
```
$regexp = "/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi";
```
Also note that you may be need another preg_replace function to replace the word 'empresarios' in case it's capitalized (Empresarios) or like weird stuff (EmPreSAriOS).

Also take care of your HTML. <h2> are block elements and may be interpretated this way:

string where the word empresarios should be replaced;

And replaced

string where the word

empresarios

should be replaced;

Maybe what you'll need to use is a <big> tag.
0 讨论(0)
发布评论:

提交评论
- 加载中...

粉色の甜心

2020-12-06 21:11

Use the DOM and only modify text nodes:

$s = "foo <a href='http://test.com'>foo</a> lorem bar ipsum foo. <a>bar</a> not a test";
echo htmlentities($s) . '<hr>';

$d = new DOMDocument;
$d->loadHTML($s);

$x = new DOMXPath($d);
$t = $x->evaluate("//text()");

$wrap = array(
    'foo' => 'h1',
    'bar' => 'h2'
);

$preg_find = '/\b(' . implode('|', array_keys($wrap)) . ')\b/';

foreach($t as $textNode) {
    if( $textNode->parentNode->tagName == "a" ) {
        continue;
    }

    $sections = preg_split( $preg_find, $textNode->nodeValue, null, PREG_SPLIT_DELIM_CAPTURE);

    $parentNode = $textNode->parentNode;

    foreach($sections as $section) {  
        if( !isset($wrap[$section]) ) {
            $parentNode->insertBefore( $d->createTextNode($section), $textNode );
            continue;
        }

        $tagName = $wrap[$section];
        $parentNode->insertBefore( $d->createElement( $tagName, $section ), $textNode );
    }

    $parentNode->removeChild( $textNode );
}

echo htmlentities($d->saveHTML());

Edited to replace DOMText with DOMText and DOMElement as necessary.

0 讨论(0)

How can I replace strings NOT within a link tag?

empresarios