Regular Expression for accurate word-count using JavaScript

后端 未结 7 1005
陌清茗
陌清茗 2020-12-01 07:15

I\'m trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.

One solution I had found is as fo

相关标签:
7条回答
  • 2020-12-01 07:16

    This should do what you're after:

    value.match(/\S+/g).length;
    

    Rather than splitting the string, you're matching on any sequence of non-whitespace characters.

    There's the added bonus of being easily able to extract each word if needed ;)

    0 讨论(0)
  • 2020-12-01 07:17

    The correct regexp would be /s+/ in order to discard non-words:

    'Lorem ipsum dolor , sit amet'.split(/\S+/g).length
    7
    'Lorem ipsum dolor , sit amet'.split(/\s+/g).length
    6
    
    0 讨论(0)
  • 2020-12-01 07:21

    Try to count anything that is not whitespace and with a word boundary:

    value.split(/\b\S+\b/g).length
    

    You could also try to use unicode ranges, but I am not sure if the following one is complete:

    value.split(/[\u0080-\uFFFF\w]+/g).length
    
    0 讨论(0)
  • 2020-12-01 07:26

    Try

        value.match(/\w+/g).length;
    

    This will match a string of characters that can be in a word. Whereas something like:

        value.match(/\S+/g).length;
    

    will result in an incorrect count if the user adds commas or other punctuation that is not followed by a space - or adds a comma with a space either side of it.

    0 讨论(0)
  • 2020-12-01 07:36

    For me this gave the best results:

    value.split(/\b\W+\b/).length
    

    with

    var words = value.split(/\b\W+\b/)
    

    you get all words.

    Explanation:

    • \b is a word boundary
    • \W is a NON-word character, capital usually means the negation
    • '+' means 1 or more characters or the prefixed character class

    I recommend learning regular expressions. It's a great skill to have because they are so powerful. ;-)

    0 讨论(0)
  • 2020-12-01 07:36

    you could extend/change you methods like this

    document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1; if you want to match things like email-addresses as well

    and

    document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.trim().split(/\s+/g).length -1;

    also try using \s as its the \w for unicode

    source:http://www.regular-expressions.info/charclass.html

    0 讨论(0)
提交回复
热议问题