How do I find the string index of a tag (an element) without counting expanded entities?

问题

I've got a large piece of text which I want to be able to select, storing the selected part by its startindex and endindex. (For example, selecting or in word would give me startindex 1 and endindex 2.)

This all works properly, but I've got a problem with HTML entities such as & (the ampersand).

I've created a little case in which the issue consists. You can see in the fiddle below that the startIndex inflates if you select anything beyond the &, because it doesn't count the & as a single character, but rather 5 characters: &.

Is there a way to make it count properly special characters like the ampersand, without screwing up the index?

http://jsfiddle.net/Eqct4/

JavaScript

$(document).ready(function() {
    $('#textBlock').mouseup(function() {
        var selectionRange = window.getSelection();
        if (!selectionRange.isCollapsed) {
            selectedText = selectionRange.getRangeAt(0).toString();
        }

        document.getElementById('textBlock').setAttribute('contenteditable', true);
        document.execCommand('strikethrough', false);
        var startIndex = $('#textBlock').html().indexOf('<strike>');
         $('#startindex').html('the startindex is: ' + startIndex);
        done();
    });
});

function done() {
    document.getElementById('textBlock').setAttribute('contenteditable', false);
    document.getSelection().removeAllRanges();
    removeStrikeFromElement($('#textBlock'));
}

function removeStrikeFromElement (element) {
    element.find('strike').each(function() {
        jQuery(this).replaceWith(removeStrikeFromElement(jQuery(this)));
    });
    return element.html();
}

I think/know it has to do with the $('#textBlock').html() used to do the indexOf instead of text(). The best way to get a start and endindex was to <strike> through the selected text since the execCommand let's me do that and it's a HTML tag never used in the application.

回答1:

If you really want to use your code and just modifying it a little you could replace all special characters with the visible equivalent, while keeping the html tags... Change your declaration of startIndex to this:

var startIndex = $('#textBlock').html().replace(/&amp;/g, "&").replace(/&quot;/g, "\"").indexOf('<strike>');

you can append the replaces() functions with other special characters you want to count as normal characters not the HTML version of them. In my example i replaced the & and the " characters.

There are more optimalisations possible in your code this is a simple way to fix your problem.

Hope this helps a bit, see the forked fiddle here http://jsfiddle.net/vQNyv/

回答2:

The Problem

Using html() returns:

This is a cool test &amp; <strike>stuff like</strike> that

Using text(), however, would return:

This is a cool test & stuff like that

So, html() is necessary in order to see the string, <strike>, but then of course all special entities are escaped, which they should be. There are ways to hack around this problem, but imagine what would happen if, say, the text was describing HTML itself:

Use the <strike></strike> tags to strike out text.

In this case, you want the interpretation,

Use the &lt;strike&gt;&lt;/strike&gt; tag to strike out text.

That's why the only correct way to approach this would be to iterate through DOM nodes.

The jQuery/DOM Solution

Here's a jsFiddle of my solution, and here's the code:

jQuery.fn.indexOfTag = function (tag) {
    var nodes = this[0].childNodes;
    var chars = 0;
    for (var i = 0; nodes && i < nodes.length; i++) {
        var node = nodes[i];
        var type = node.nodeType;
        if (type == 3 || type == 4 || type == 5) {
            // alert('advancing ' + node.nodeValue.length + ' chars');
            chars += node.nodeValue.length;
        } else if (type == 1) {
            if (node.tagName == tag.toUpperCase()) {
                // alert('found <' + node.tagName + '> at ' + chars + ', returning');
                return chars;
            } else {
                // alert('found <' + node.tagName + '>, recursing');
                var subIndexOfTag = $(node).indexOfTag(tag);
                if (subIndexOfTag == -1) {
                    // alert('did not find <' + tag.toUpperCase() + '> in <' + node.tagName + '>');
                    chars += $(node).text().length;
                } else {
                    // alert('found <' + tag.toUpperCase() + '> in <' + node.tagName + '>');
                    chars += subIndexOfTag;
                    return chars;
                }
            }
        }
    }
    return -1;
}

Uncomment the alert()s to gain insight into what's going on. Here's a reference on the nodeTypes.

The jQuery/DOM Solution counting outerHTML

Based on your comments, I think you're saying you do want to count HTML tags (character-wise), but just not the HTML entities. Here's a new jsFiddle of the function itself, and here's a new jsFiddle of it applied to your problem.

来源：https://stackoverflow.com/questions/16359314/how-do-i-find-the-string-index-of-a-tag-an-element-without-counting-expanded-e

标签

javascript

jquery

dom

innerhtml

html-entities