How to get possibly overlapping matches in a string

后端未结

关注

 8  2165

I\'m looking for a way, either in Ruby or Javascript, that will give me all matches, possibly overlapping, within a string against a regexp.

Let\'s say I have

相关标签:

8条回答

终归单人心

2020-12-03 17:53
This JavaScript approach offers an advantage over Wiktor's answer by lazily iterating the substrings of a given string using a generator function, which allows you to consume a single match at a time for very large input strings using a for...of loop, rather than generating a whole array of matches at once, which could lead to out-of-memory exceptions since the amount of substrings for a string grows quadratically with length:
```
function * substrings (str) {
  for (let length = 1; length <= str.length; length++) {
    for (let i = 0; i <= str.length - length; i++) {
      yield str.slice(i, i + length);
    }
  }
}

function * matchSubstrings (str, re) {
  const subre = new RegExp(`^${re.source}$`, re.flags);
  
  for (const substr of substrings(str)) {
    if (subre.test(substr)) yield substr;
  }
}

for (const match of matchSubstrings('abcabc', /a.*c/)) {
  console.log(match);
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-12-03 17:55
```
▶ str = "abcadc"
▶ from = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'a' }.compact
▶ to   = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'c' }.compact
▶ from.product(to).select { |f,t| f < t }.map { |f,t| str[f..t] }
#⇒ [
#  [0] "abc",
#  [1] "abcadc",
#  [2] "adc"
# ]
```
I believe, that there is a fancy way to find all indices of a character in a string, but I was unable to find it :( Any ideas?

Splitting on “unicode char boundary” makes it to work with strings like 'ábĉ' or 'Üve Østergaard'.

For more generic solution, that accepts any “from” and “to” sequences, one should introduce just a little modification: find all indices of “from” and “to” in the string.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2