DOM Parser to highlight keywords not working

吃可爱长大的小学妹 提交于 2019-12-07 10:27:52

问题


This question is related with one I have made before but because the topic is now closed and I need to ask something further I will start a new question by hoping that's fine.

In my previous answer I simplified the problem enough and resulted in simple but not fully working solutions. I realized it these days when I was implementing my code.

The problem with the solutions in the previous post is that the HTML tags are broken by the replacing functions. I have read in many posts of this site that I need to use a DOM Parser. I am very unfamiliar with this and I tried the code suggested by the user “ircmaxell” in this post, but it does not work for me.

Here is sample of what I did:

echo '<style type="text/css">
       .ht{
         background-color: yellow;
       }
     </style>'; 


/* taken from user ircmaxell at https://stackoverflow.com/questions/4081372/highlight-keywords-in-a-paragraph

I just modified line $highlight->setAttribute('class', 'highlight') to $highlight->setAttribute('class', 'ht') and commented the first 2 lines   */

function highlight_paragraph($string, $keyword) {
  //$string = '<p>foo<b>bar</b></p>';
  //$keyword = 'foo';
  $dom = new DomDocument();
  $dom->loadHtml($string);
  $xpath = new DomXpath($dom);
  $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
  foreach ($elements as $element) {
   foreach ($element->childNodes as $child) {
     if (!$child instanceof DomText) continue;
     $fragment = $dom->createDocumentFragment();
     $text = $child->textContent;
     $stubs = array();
     while (($pos = stripos($text, $keyword)) !== false) {
       $fragment->appendChild(new DomText(substr($text, 0, $pos)));
       $word = substr($text, $pos, strlen($keyword));
       $highlight = $dom->createElement('span');
       $highlight->appendChild(new DomText($word));
       $highlight->setAttribute('class', 'ht');
       $fragment->appendChild($highlight);
       $text = substr($text, $pos + strlen($keyword));
     }
     if (!empty($text)) $fragment->appendChild(new DomText($text));
     $element->replaceChild($fragment, $child);
   }
 }
 $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
 return $string;
}


$string = '<p>This book has been written against a background of both reckless optimism and reckless despair.</p>
<p>It holds that Progress and Doom are two sides of the same medal; that both are articles of superstition, not of faith. It was written out of the conviction that it should be possible to discover the hidden mechanics by which all traditional elements of our political and spiritual world were dissolved into a conglomeration where everything seems to have lost specific value, and has become unrecognizable for human comprehension, unusable for human purpose.</p>
<p> Hannah Arendt, The Origins of Totalitarianism (New York: Harcourt Brace Jovanovich, Inc., 1973 ed.), p.vii, Preface to the First Edition.</p>';

$keywords = array('This', 'book', 'has', 'been', 'written', 'background', 'reckless', 'optimism', 'despair.', 'holds', 'Progress', 'Doom ', 'two', 'sides', 'medal;', 'articles', 'superstition,', 'faith.', 'lost', 'Arendt,', 'Totalitarianism');

foreach ($keywords as $kw) {
  $string = highlight_paragraph($string, $kw);
}

echo $string;

echo $string only returns:

This book has been written against a background of both reckless optimism and reckless despair.

And only the first two words, 'This' and 'book' are highlighted.

Normally it should have outputted all the initial string with the keywords highlighted.

I have searched a lot in stackoverflow and google and did not find an easy to use code to achieve my purpose even if there are lots of people that have asked the same thing before.

I really need a help over here. Thanks in advance!


回答1:


You are lucky that I was very bored when I saw this question. ;)

The code you received as an answer didn't seem to have been tested - I don't know how it could have possibly worked correctly. Anyway, I fixed all the problems and present you a working version - tested on my locally installed Apache Server with PHP 5.3:

function highlight_paragraph($string, $keyword) {
  $dom = new DOMDocument();
  $dom->loadHtml($string);

  // Search for all text blocks containing the keyword
  $xpath = new DOMXpath($dom);
  $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

  foreach ($textNodes as $textNode) {
    $fragment = $dom->createDocumentFragment();
    $text = $textNode->nodeValue;
    $stubs = array();

    while (($pos = stripos($text, $keyword)) !== false) {
      $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
      $word = substr($text, $pos, strlen($keyword));

      $highlight = $dom->createElement('span');
      $highlight->appendChild(new DOMText($word));
      $highlight->setAttribute('class', 'ht');
      $fragment->appendChild($highlight);

      $text = substr($text, $pos + strlen($keyword));
    }

    if (!empty($text))
      $fragment->appendChild(new DOMText($text));

    $textNode->parentNode->replaceChild($fragment, $textNode);
 }

 return $dom->saveHTML();
}


来源:https://stackoverflow.com/questions/9335689/dom-parser-to-highlight-keywords-not-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!