Using a TreeWalker to retrieve non-Javascript text nodes

无人久伴 提交于 2019-12-08 06:35:59

问题


This question teaches how to get all TextNodes inside the document, and this is getting me the Javascript texts as well. What is the best way to filter out all the Nodes that are Javascript code?


回答1:


Text inside <script> tags has only one thing in common: their parent is a <script> element.

if (node.parentNode.nodeName !== 'SCRIPT')

Another approach is to use the filter:

var rejectScriptTextFilter = {
  acceptNode: function(node) {
    if (node.parentNode.nodeName !== 'SCRIPT') {
      return NodeFilter.FILTER_ACCEPT;
    }
  }
};

var walker = document.createTreeWalker(
  document.body, 
  NodeFilter.SHOW_TEXT, 
  rejectScriptTextFilter,
  false
);

var node;
var textNodes = [];

while(node = walker.nextNode()) {
  textNodes.push(node.nodeValue);
}

console.log(textNodes);
<script> var str = "script here"; </script>
<p> text here </p>



回答2:


You could clone the original document, remove <script> elements at cloned document, then iterate remaining nodes of cloned document



来源:https://stackoverflow.com/questions/37178091/using-a-treewalker-to-retrieve-non-javascript-text-nodes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!