I need to highlight a keyword in a paragraph, as google does in its search results. Let\'s assume that I have a MySQL db with blog posts. When a user searches for a certain
Maybe you could do something like this when you're connected to the database:
$keyword = $_REQUEST["keyword"]; //fetch the keyword from the request
$result = mysql_query("SELECT * FROM `posts` WHERE `content` LIKE '%".
mysql_real_escape_string($keyword)."%'"); //ask the database for the posttexts
while ($row = mysql_fetch_array($result)) {//do the following for each result:
$text = $row["content"];//we're only interested in the content at the moment
$text=substr ($text, strrpos($text, $keyword)-150, 300); //cut out
$text=str_replace($keyword, '<strong>'.$keyword.'</strong>', $text); //highlight
echo htmlentities($text); //print it
echo "<hr>";//draw a line under it
}
If you wish to cut out the relevant paragraphs, after doing the above mentions str_replace function, you can use stripos() to find the position of these strong sections, and use an offset of that location with substr() to cut out a section of the paragraph, such as:
$searchterms; foreach($searchterms as $search) { $paragraph = str_replace($search, "<strong>$search</strong>", $paragraph); } $pos = 0; for($i = 0; $i < 4; $i++) { $pos = stripos($paragraph, "<strong>", $pos); $section[$i] = substr($paragraph, $pos - 100, 200); }
which will give you an array of small sentences (200 characters each) to use how you wish. It may also be beneficial to search for the nearest space from the cutting locations, and cut from there to prevent half-words. Oh, and you also need to check for errors, but I'll leave that but up to you.
You could try exploding your database search result set into an array using explode
and then usearray_search()
on each search result. Set the $distance
variable in the example below to how many words you'd like to appear on either side of the first match of the $keyword
.
In the example, I've included lorum ipsum text as an example database result paragraph and set the $keyword
to 'scelerisque'. You'd obviously replace these in your code.
//example paragraph text
$lorum = 'Nunc nec magna at nibh imperdiet dignissim quis eu velit.
vel mattis odio rutrum nec. Etiam sit amet tortor nibh, molestie
vestibulum tortor. Integer condimentum magna dictum purus vehicula
et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero,
tristique et pellentesque sed, mattis eget dui. Cum sociis natoque
penatibus et magnis dis parturient montes, nascetur ridiculus mus.
.';
//turn paragraph into array
$ipsum = explode(' ',$lorum);
//set keyword
$keyword = 'scelerisque';
//set excerpt distance
$distance = 10;
//look for keyword in paragraph array, return array key of first match
$match_key = array_search($keyword,$ipsum);
if(!empty($match_key)){
foreach($ipsum as $key=>$value){
//if paragraph array key inside excerpt distance
if($key > $match_key-$distance and $key< $match_key+$distance){
//if array key matches keyword key, bold the word
if($key == $match_key){
$word = '<b>'.$value.'</b>';
}
else{
$word = $value;
}
//create excerpt array to hold words within distance
$excerpt[] = $word;
}
}
//turn excerpt array into a string
$excerpt = implode(' ',$excerpt);
}
//print the string
echo $excerpt;
$excerpt
returns:
"vestibulum tortor. Integer condimentum magna dictum purus vehicula et scelerisque mauris viverra. Nullam in lorem erat. Ut dolor libero,"
Here’s a solution for plain text:
$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
$keywords = array('co');
$wordspan = 5;
$keywordsPattern = implode('|', array_map(function($val) { return preg_quote($val, '/'); }, $keywords));
$matches = preg_split("/($keywordsPattern)/ui", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
for ($i = 0, $n = count($matches); $i < $n; ++$i) {
if ($i % 2 == 0) {
$words = preg_split('/(\s+)/u', $matches[$i], -1, PREG_SPLIT_DELIM_CAPTURE);
if (count($words) > ($wordspan+1)*2) {
$matches[$i] = '…';
if ($i > 0) {
$matches[$i] = implode('', array_slice($words, 0, ($wordspan+1)*2)) . $matches[$i];
}
if ($i < $n-1) {
$matches[$i] .= implode('', array_slice($words, -($wordspan+1)*2));
}
}
} else {
$matches[$i] = '<b>'.$matches[$i].'</b>';
}
}
echo implode('', $matches);
With the current pattern "/($keywordsPattern)/ui"
subwords are matched and highlighted. But you can change that if you want to:
If you want to match only whole words and not just subwords, use word boundaries \b
:
"/\b($keywordsPattern)\b/ui"
If you want to match subwords but highlight the whole word, use put optional word characters \w
in front and after the keywords:
"/(\w*?(?:$keywordsPattern)\w*)/ui"
If it contains html (note that this is a pretty robust solution):
$string = '<p>foo<b>bar</b></p>';
$keyword = 'foo';
$dom = new DomDocument();
$dom->loadHtml($string);
$xpath = new DomXpath($dom);
$elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
foreach ($elements as $element) {
foreach ($element->childNodes as $child) {
if (!$child instanceof DomText) continue;
$fragment = $dom->createDocumentFragment();
$text = $child->textContent;
$stubs = array();
while (($pos = stripos($text, $keyword)) !== false) {
$fragment->appendChild(new DomText(substr($text, 0, $pos)));
$word = substr($text, $pos, strlen($keyword));
$highlight = $dom->createElement('span');
$highlight->appendChild(new DomText($word));
$highlight->setAttribute('class', 'highlight');
$fragment->appendChild($highlight);
$text = substr($text, $pos + strlen($keyword));
}
if (!empty($text)) $fragment->appendChild(new DomText($text));
$element->replaceChild($fragment, $child);
}
}
$string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
Results in:
<p><span class="highlight">foo</span><b>bar</b></p>
And with:
$string = '<body><p>foobarbaz<b>bar</b></p></body>';
$keyword = 'bar';
You get (broken onto multiple lines for readability):
<p>foo
<span class="highlight">bar</span>
baz
<b>
<span class="highlight">bar</span>
</b>
</p>
Beware of non-dom solutions (like regex
or str_replace
) since highlighting something like "div" has a tendency of completely destroying your HTML... This will only ever "highlight" strings in the body, never inside of a tag...
Edit Since you want Google style results, here's one way of doing it:
function getKeywordStubs($string, array $keywords, $maxStubSize = 10) {
$dom = new DomDocument();
$dom->loadHtml($string);
$xpath = new DomXpath($dom);
$results = array();
$maxStubHalf = ceil($maxStubSize / 2);
foreach ($keywords as $keyword) {
$elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
$replace = '<span class="highlight">'.$keyword.'</span>';
foreach ($elements as $element) {
$stub = $element->textContent;
$regex = '#^.*?((\w*\W*){'.
$maxStubHalf.'})('.
preg_quote($keyword, '#').
')((\w*\W*){'.
$maxStubHalf.'}).*?$#ims';
preg_match($regex, $stub, $match);
var_dump($regex, $match);
$stub = preg_replace($regex, '\\1\\3\\4', $stub);
$stub = str_ireplace($keyword, $replace, $stub);
$results[] = $stub;
}
}
$results = array_unique($results);
return $results;
}
Ok, so what that does is return an array of matches with $maxStubSize
words around it (namely up to half that number before, and half after)...
So, given a string:
<p>a whole
<b>bunch of</b> text
<a>here for</a>
us to foo bar baz replace out from this string
<b>bar</b>
</p>
Calling getKeywordStubs($string, array('bar', 'bunch'))
will result in:
array(4) {
[0]=>
string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from "
[3]=>
string(34) "<span class="highlight">bar</span>"
[4]=>
string(62) "a whole <span class="highlight">bunch</span> of text here for "
[7]=>
string(39) "<span class="highlight">bunch</span> of"
}
So, then you could build your result blurb by sorting the list by strlen
and then picking the two longest matches... (assuming php 5.3+):
usort($results, function($str1, $str2) {
return strlen($str2) - strlen($str1);
});
$description = implode('...', array_slice($results, 0, 2));
Which results in:
here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for
I hope that helps... (I do feel this is a bit... bloated... I'm sure there are better ways to do this, but here's one way)...
If you're a beginner this will not be super easy as someone might think...
I think you should do the following steps:
In the third step you can use some regular expression to replace the user searched keywords with a bolded equivalent. str_replace could work too...
I hope this helps... If you could provide your database structure maybe I can give you some more precise hints...