I\'m looking for a way, either in Ruby or Javascript, that will give me all matches, possibly overlapping, within a string against a regexp.
Let\'s say I have
This JavaScript approach offers an advantage over Wiktor's answer by lazily iterating the substrings of a given string using a generator function, which allows you to consume a single match at a time for very large input strings using a for...of loop, rather than generating a whole array of matches at once, which could lead to out-of-memory exceptions since the amount of substrings for a string grows quadratically with length:
function * substrings (str) {
for (let length = 1; length <= str.length; length++) {
for (let i = 0; i <= str.length - length; i++) {
yield str.slice(i, i + length);
}
}
}
function * matchSubstrings (str, re) {
const subre = new RegExp(`^${re.source}$`, re.flags);
for (const substr of substrings(str)) {
if (subre.test(substr)) yield substr;
}
}
for (const match of matchSubstrings('abcabc', /a.*c/)) {
console.log(match);
}
▶ str = "abcadc"
▶ from = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'a' }.compact
▶ to = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'c' }.compact
▶ from.product(to).select { |f,t| f < t }.map { |f,t| str[f..t] }
#⇒ [
# [0] "abc",
# [1] "abcadc",
# [2] "adc"
# ]
I believe, that there is a fancy way to find all indices of a character in a string, but I was unable to find it :( Any ideas?
Splitting on “unicode char boundary” makes it to work with strings like 'ábĉ' or 'Üve Østergaard'.
For more generic solution, that accepts any “from” and “to” sequences, one should introduce just a little modification: find all indices of “from” and “to” in the string.