Whole word regex matching and hyperlinking in Javascript

孤街浪徒 提交于 2021-02-04 08:10:35

问题


I need a little help with Regular Expressions.

I'm using Javascript and JQuery to hyperlink terms within an HTML document, to do this I'm using the following code. I'm doing this for a number of terms in a massive document.

var searchterm = "Water";

jQuery('#content p').each(function() {

  var content = jQuery(this),
      txt = content.html(),
      found = content.find(searchterm).length,
      regex = new RegExp('(' + searchterm + ')(?![^(<a.*?>).]*?<\/a>)','gi');

  if (found != -1) {
    //hyperlink the search term
    txt = txt.replace(regex, '<a href="/somelink">$1</a>');
    content.html(txt);
  }
});

There are however a number of instances I do not want to match and due to time constraints and brain melt, I'm reaching out for some assistance.


EDIT: I've updated the codepen below based on the excellent example provided by @ggorlen, thank you!

Example https://codepen.io/julian-young/pen/KKwyZMr


回答1:


Dumping the entire DOM to raw text and parsing it with regex circumvents the primary purpose of jQuery (and JS, by extension), which is to traverse and manipulate the DOM as an abstract tree of nodes.

Text nodes have a nodeType Node.TEXT_NODE which we can use in a traversal to identify the non-link nodes you're interested in.

After obtaining a text node, regex can be applied appropriately (parsing text, not HTML). I used <mark> for demonstration purposes, but you can make this an anchor tag or whatever you need.

jQuery gives you a replaceWith method that replaces the content of a node after you've made the desired regex substitution.

$('#content li').contents().each(function () {
  if (this.nodeType === Node.TEXT_NODE) {    
    var pattern = /(\b[Ww]aters?(?!-)\b)/g;
    var replacement = '<mark>$1</mark>';
    $(this).replaceWith(this.nodeValue.replace(pattern, replacement));
  }
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Example Content</h1>
<div id="content">
  <ul>
    <li>Water is a fascinating subject. - <strong>match</strong></li>
    <li>We all love water. - <strong>match</strong></li>
    <li>ice; water; steam - <strong>match</strong></li>
    <li>The beautiful waters of the world - <strong>match</strong> (including the s)</li>
    <li>and all other water-related subjects - <strong>no match</strong></li>
    <li>and this watery topic of - <strong>no match</strong></li>
    <li>of WaterStewardship looks at how best - <strong>no match</strong></li>
    <li>On the topic of <a href="/governance">water governance</a> - <strong>no match</strong></li>
    <li>and other <a href="/water">water</a> related things - <strong>no match</strong></li>
    <li>the best of <a href="/allthingswater">all things water</a> - <strong>no match</strong></li>
  </ul>
</div>

You can do it without jQ and apply to everything in the document:

for (const parent of document.querySelectorAll("body *:not(a)")) {
  for (const child of parent.childNodes) {
    if (child.nodeType === Node.TEXT_NODE) {
      const pattern = /(\b[Ww]aters?(?!-)\b)/g;
      const replacement = "<mark>$1</mark>";
      const subNode = document.createElement("span");
      subNode.innerHTML = child.textContent.replace(pattern, replacement);
      parent.insertBefore(subNode, child);
      parent.removeChild(child);
    }    
  }
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div>
  hello water
  <div>
    <div>
      I love Water.
      <a href="">more water</a>
    </div>
    watership down
    <h4>watery water</h4>
    <p>
      waters
    </p>
    foobar <a href="">water</a> water
  </div>
</div>


来源:https://stackoverflow.com/questions/59581570/whole-word-regex-matching-and-hyperlinking-in-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!