Highlight keywords in a paragraph

前端 未结 7 1589
梦如初夏
梦如初夏 2020-11-29 13:40

I need to highlight a keyword in a paragraph, as google does in its search results. Let\'s assume that I have a MySQL db with blog posts. When a user searches for a certain

7条回答
  •  萌比男神i
    2020-11-29 14:29

    If it contains html (note that this is a pretty robust solution):

    $string = '

    foobar

    '; $keyword = 'foo'; $dom = new DomDocument(); $dom->loadHtml($string); $xpath = new DomXpath($dom); $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]'); foreach ($elements as $element) { foreach ($element->childNodes as $child) { if (!$child instanceof DomText) continue; $fragment = $dom->createDocumentFragment(); $text = $child->textContent; $stubs = array(); while (($pos = stripos($text, $keyword)) !== false) { $fragment->appendChild(new DomText(substr($text, 0, $pos))); $word = substr($text, $pos, strlen($keyword)); $highlight = $dom->createElement('span'); $highlight->appendChild(new DomText($word)); $highlight->setAttribute('class', 'highlight'); $fragment->appendChild($highlight); $text = substr($text, $pos + strlen($keyword)); } if (!empty($text)) $fragment->appendChild(new DomText($text)); $element->replaceChild($fragment, $child); } } $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);

    Results in:

    foobar

    And with:

    $string = '

    foobarbazbar

    '; $keyword = 'bar';

    You get (broken onto multiple lines for readability):

    foo bar baz bar

    Beware of non-dom solutions (like regex or str_replace) since highlighting something like "div" has a tendency of completely destroying your HTML... This will only ever "highlight" strings in the body, never inside of a tag...


    Edit Since you want Google style results, here's one way of doing it:

    function getKeywordStubs($string, array $keywords, $maxStubSize = 10) {
        $dom = new DomDocument();
        $dom->loadHtml($string);
        $xpath = new DomXpath($dom);
        $results = array();
        $maxStubHalf = ceil($maxStubSize / 2);
        foreach ($keywords as $keyword) {
            $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
            $replace = ''.$keyword.'';
            foreach ($elements as $element) {
                $stub = $element->textContent;
                $regex = '#^.*?((\w*\W*){'.
                     $maxStubHalf.'})('.
                     preg_quote($keyword, '#').
                     ')((\w*\W*){'.
                     $maxStubHalf.'}).*?$#ims';
                preg_match($regex, $stub, $match);
                var_dump($regex, $match);
                $stub = preg_replace($regex, '\\1\\3\\4', $stub);
                $stub = str_ireplace($keyword, $replace, $stub);
                $results[] = $stub;
            }
        }
        $results = array_unique($results);
        return $results;
    }
    

    Ok, so what that does is return an array of matches with $maxStubSize words around it (namely up to half that number before, and half after)...

    So, given a string:

    a whole bunch of text here for us to foo bar baz replace out from this string bar

    Calling getKeywordStubs($string, array('bar', 'bunch')) will result in:

    array(4) {
      [0]=>
      string(75) "here for us to foo bar baz replace out from "
      [3]=>
      string(34) "bar"
      [4]=>
      string(62) "a whole bunch of text here for "
      [7]=>
      string(39) "bunch of"
    }
    

    So, then you could build your result blurb by sorting the list by strlen and then picking the two longest matches... (assuming php 5.3+):

    usort($results, function($str1, $str2) { 
        return strlen($str2) - strlen($str1);
    });
    $description = implode('...', array_slice($results, 0, 2));
    

    Which results in:

    here for us to foo bar baz replace out...a whole bunch of text here for 
    

    I hope that helps... (I do feel this is a bit... bloated... I'm sure there are better ways to do this, but here's one way)...

提交回复
热议问题