How to split each word in (html) considering other elements inside

问题

I'm new here. I try to explain you my problem. I'm developing an extension for Chrome that manage DOM. I have to split up each single word inside  tag element, to apply after some css features on each word, but preserving other tag elements (<a>, , , etc.) that could be in  tag.

Example of possible text in a web page:

<p> 
   Sed ut <a> perspiciatis unde omnis </a> 
   iste natus <em> error sit </em> 
   voluptatem <strong> accusantium </strong> 
   doloremque laudantium 
</p>

Using jQuery, I've thought to put a  tag around each word to define a class attribute to use with css. I found this code that splits the words (belonging to ) correctly but doesn't consider other possible elements inside .

Code used (that doesn't do what I need):


 $("p").each(function() {
    var originalText = $(this).text().split(' ');
    var spannedText = [];

    for (var i = 0; i < originalText.length; i += 1) {
        if(originalText[i] != ""){
           spannedText[i] = ('<span class="...">' + originalText.slice(i,i+1).join(' ') + '</span>');
         }
     }

     $(this).html(spannedText.join(' '));
 });

In the example shown above this codes generate the following output, removing the other tag elements:

<p> 
    <span>Sed</span> 
    <span>ut</span> 
    <span>perspiciatis</span> 
    <span>unde</span> 
    <span>omnis</span> 
    <span>iste</span> 
    <span>natus</span> 
    <span>error</span> 
    <span>sit</span> 
    <span>voluptatem</span> 
    <span>accusantium</span>
    <span>doloremque</span> 
    <span>laudantium</span> 
</p>

It is close to solution I need but in this case all the tags present in the example (<a>, , ) are removed and substituted with  tag.

Instead I want to keep the html structure of  and insert only ... for each word.

This it the output I would like to achieve:

<p> 
    <span>Sed</span> 
    <span>ut</span> 
    <a> <span>perspiciatis</span> <span>unde</span> <span>omnis</span> </a>
    <span>iste</span> 
    <span>natus</span> 
    <em> <span>error</span> <span>sit</span> </em>
    <span>voluptatem</span> 
    <strong> <span>accusantium</span> </strong>
    <span>doloremque</span> 
    <span>laudantium</span> 
</p>

Can you help me?

回答1:

Never replace HTML via innerHTML or jQuery's html()

Replacing HTML destroys all event listeners added in JavaScript to the child elements and makes the browser re-parse the entire thing which is a CPU-intensive operation so it can be slow on slower devices. Don't do this.

Process only the text nodes recursively:

const span = document.createElement('span');
span.className = 'foo';
span.appendChild(document.createTextNode(''));

// these will display <span> as a literal text per HTML specification
const skipTags = ['textarea', 'rp'];

for (const p of document.getElementsByTagName('p')) {
  const walker = document.createTreeWalker(p, NodeFilter.SHOW_TEXT);
  // collect the nodes first because we can't insert new span nodes while walking
  const textNodes = [];
  for (let n; (n = walker.nextNode());) {
    if (n.nodeValue.trim() && !skipTags.includes(n.parentNode.localName)) {
      textNodes.push(n);
    }
  }
  for (const n of textNodes) {
    const fragment = document.createDocumentFragment();
    for (const s of n.nodeValue.split(/(\s+)/)) {
      if (s.trim()) {
        span.firstChild.nodeValue = s;
        fragment.appendChild(span.cloneNode(true));
      } else {
        fragment.appendChild(document.createTextNode(s));
      }
    }
    n.parentNode.replaceChild(fragment, n);
  }
}

Since we may be replacing thousands of nodes, this code tries to be fast: it uses TreeWalker API, DOM cloning, skipping the potentially superlong sequences of spaces and line breaks via a simple regular expression \s+, and DocumentFragment to place the new nodes in one mutation operation. And of course not using jQuery.

P.S. There are advanced libraries for much more complex matching and processing like mark.js.

回答2:

Welcome to Stack Overflow. I would not use .text() as this only reads the Text of the element whereas .html() will read the markup and the text. Basically, you will want to iterate the content of the  and find each word (looking for spaces " ") and wrapping them with  tags. Also you want to be aware of HTML Elements and they should be retained, but the content of those elements should also get wrapped in a similar manner.

$(function() {
  function wrapWord(w) {
    return "<span>" + w + "</span>";
  }

  function markupText(str) {
    var html = $.parseHTML(str);
    var nodes = [];
    $.each(html, function(i, el) {
      nodes.push({
        name: el.nodeName,
        type: el.nodeType,
        content: el.nodeValue,
        element: el
      });
    });
    console.log(html, nodes);
    var parts = [];
    var t;
    $.each(nodes, function(k, item) {
      if (item.type == 3) {
        t = item.content.trim().split(" ");
        $.each(t, function(j, s) {
          parts.push(wrapWord(s));
        });
      } else {
        t = $(item.element).text().trim().split(" ");
        $(item.element).html("");
        $.each(t, function(j, s) {
          $(item.element).append(wrapWord(s) + " ");
        });
        parts.push($(item.element).prop("outerHTML"));
      }
    });
    console.log(parts);
    return parts.join(" ");
  }

  $("p.highlight").html(markupText($("p.highlight").prop("innerHTML")));
});

p.highlight span {
  background-color: #FF0;
}

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut tristique ex vitae nisi sodales accumsan. Aenean eget maximus purus. Donec ornare bibendum purus, et tincidunt nibh accumsan sed. Sed mi purus, aliquam et varius eu, congue vel neque. Vivamus bibendum velit ut posuere semper.</p>
<p class="highlight">Sed ut <a>perspiciatis unde omnis</a> iste natus <em>error sit</em> voluptatem <strong>accusantium</strong> doloremque laudantium</p>

From the example above, we get an array of parts:

[
  "<span>Sed</span>",
  "<span>ut</span>",
  "<a><span>perspiciatis</span> <span>unde</span> <span>omnis</span> </a>",
  "<span>iste</span>",
  "<span>natus</span>",
  "<em><span>error</span> <span>sit</span> </em>",
  "<span>voluptatem</span>",
  "<strong><span>accusantium</span> </strong>",
  "<span>doloremque</span>",
  "<span>laudantium</span>"
]

We can then simply .join() them with a " " and feed it back into the  as new HTML content.

Hope that helps.

来源：https://stackoverflow.com/questions/57913199/how-to-split-each-word-in-p-html-considering-other-elements-inside

标签

javascript