发表新帖

发表新帖

Javascript - regex - word boundary (\b) issue

前端未结

关注

 3  1134

悲哀的现实 2020-12-01 16:00

I have a difficulty using \\b and greek characters in a regex.

At this example [a-zA-ZΆΈ-ώἀ-ῼ]* succeeds to mark all the words I want (both

3条回答

鱼传尺愫 (楼主)

2020-12-01 16:53
You can use \S

Rather than write a match for "word characters plus these characters" it may be appropriate to use a regex that matches not-whitespace:
```
\S
```
It's broader in scope, but simpler to write/use.

If that's too broad - use an exclusive list rather than an inclusive list:
```
[^\s\.]
```
That is - any character that is not whitespace and not a dot. In this way it's also easy to add to the exceptions.

Don't try to use \b

Word boundaries don't work with none-ascii characters which is easy to demonstrate:
```
> "yay".match(/\b.*\b/)
["yay"]
> "γaγ".match(/\b.*\b/)
["a"]
```
Therefore it's not possible to use \b to detect words with greek characters - every character is a matching boundary.

Match 2 character words

The following pattern can be used to match two character words:
```
pattern = /(^|[\s\.,])(\S{2})(?=$|[\s\.,])/g;
```
(More accurately: to match two none-whitespace sequences).

That is:
```
(^|[\s\.,]) - start of string or whitespace/punctuation (back reference 1)
(\S{2})     - two not-whitespace characters (back reference 2)
($|[\s\.,]) - end of string or whitespace/punctuation (positive lookahead)
```
That pattern can be used like so to remove matching words:
```
"input string".replace(pattern);
```
Here's a jsfiddle demonstrating the patterns use on the texts in the question.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题