JavaScript regular expression for word boundaries, tolerating in-word hyphens and apostrophes

会有一股神秘感。 提交于 2019-12-13 07:26:31

问题


I'm looking for a Regular Expression for JavaScript that will identify word boundaries in English, while accepting hyphens and apostrophes that appear inside words, but excluding those that appear alone or at the beginning or end of a word.

For example, for the sentence ...
  She said - 'That'll be all, Two-Fry.'
... I want the characters shown in grey below to be detected:
  Shesaid- 'That'llbeall,Two-Fry.'

If I use the regex /[^A-Za-z'-]/g, then "loose" hyphens and apostrophes are not detected.
  Shesaid-'That'llbeall,Two-Fry.'

How can I alter my regex so that it detects apostrophes and hyphens that don't have a word character on both sides?

You can test my regex here: https://regex101.com/r/bR8sV1/2

Note: the text I will be working on may contain other writing scripts, like руский and ไทอ so it will not be feasible to simply include all the characters that are not part of any English word.


回答1:


You can organize your word-boundary characters into two groups.

  1. Characters that cannot be alone.
  2. Characters that can be alone.

A regex that works with your example would be:

[\s.,'-]{2,}|[\s.]

Regex101 Demo

Now all that's left is to keep adding all non-word characters into those two groups until it fits all of your needs. So you might start adding symbols and more punctuation to those character classes.




回答2:


You could write something like that:

(\s|[!-/]|[:-@]|[\[-`]|[\{-~])*\s(\s|[!-/]|[:-@]|[\[-`]|[\{-~])*

Or the compact version:

(\s|[!-/:-@\[-`\{-~])*\s(\s|[!-/:-@\[-`\{-~])*

The RegExp requires one \s (Space character) and selects als spaces and non alphanumeric chars before and after it.

https://regex101.com/r/bR8sV1/4

  • \s matches all spaces
  • !-/ every char from ! to /
  • :-@ every char from : to @
  • \[-`` every char from [ to ``
  • \{-~ every char from { to ~


来源:https://stackoverflow.com/questions/38935627/javascript-regular-expression-for-word-boundaries-tolerating-in-word-hyphens-an

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!