Regular Expression for accurate word-count using JavaScript

后端未结

关注

 7  1005

I\'m trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.

One solution I had found is as fo

相关标签:

7条回答

渐次进展

2020-12-01 07:16
This should do what you're after:
```
value.match(/\S+/g).length;
```
Rather than splitting the string, you're matching on any sequence of non-whitespace characters.

There's the added bonus of being easily able to extract each word if needed ;)
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-01 07:17
The correct regexp would be /s+/ in order to discard non-words:
```
'Lorem ipsum dolor , sit amet'.split(/\S+/g).length
7
'Lorem ipsum dolor , sit amet'.split(/\s+/g).length
6
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-01 07:21
Try to count anything that is not whitespace and with a word boundary:
```
value.split(/\b\S+\b/g).length
```
You could also try to use unicode ranges, but I am not sure if the following one is complete:
```
value.split(/[\u0080-\uFFFF\w]+/g).length
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-12-01 07:26
Try
```
    value.match(/\w+/g).length;
```
This will match a string of characters that can be in a word. Whereas something like:
```
    value.match(/\S+/g).length;
```
will result in an incorrect count if the user adds commas or other punctuation that is not followed by a space - or adds a comma with a space either side of it.
0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-01 07:36
For me this gave the best results:
```
value.split(/\b\W+\b/).length
```
with
```
var words = value.split(/\b\W+\b/)
```
you get all words.

Explanation:
- \b is a word boundary
- \W is a NON-word character, capital usually means the negation
- '+' means 1 or more characters or the prefixed character class
I recommend learning regular expressions. It's a great skill to have because they are so powerful. ;-)
0 讨论(0)
发布评论:

提交评论
- 加载中...
余生分开走

2020-12-01 07:36

you could extend/change you methods like this

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1; if you want to match things like email-addresses as well

and

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.trim().split(/\s+/g).length -1;

also try using \s as its the \w for unicode

source:http://www.regular-expressions.info/charclass.html

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页