Given sentences such as;
Boy has a dog and a cat.
Boy microwaves a gerbil.
Sally owns a cat.
For each sentence I want a list of animals (de
You may use a positive lookbehind:
(?<=^Boy.*?)(?:dog|cat|gerbil)
Or, a variation with word boundaries to match the animals as whole words:
(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b
See the regex demo
The (?<=^Boy.*?) positive lookbehind will require the Boy at the start of the string for the consuming pattern to match.
If your input contains LF (newline) chars, pass the RegexOptions.Singleline option for . to match newlines, too.
C# usage:
var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
C# demo:
var strs = new List<string>() { "Boy has a dog and a cat.",
"Boy something a gerbil.",
"Sally owns a cat." };
foreach (var s in strs)
{
var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
if (results.Count > 0) {
Console.WriteLine("{0}:\n[{1}]\n------", s, string.Join(", ", results));
}
else
{
Console.WriteLine("{0}:\nNO MATCH!\n------", s);
}
}
Output:
Boy has a dog and a cat.:
[dog, cat]
------
Boy something a gerbil.:
[gerbil]
------
Sally owns a cat.:
NO MATCH!
------
There is an alternative: match any string starting with Boy and then after each successful match only:
(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b
See this regex demo (or a regex101 link here)
You would just need to grab Group 1 contents:
var results = Regex.Matches(s, @"(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See this C# demo.
Here,
(?:\G(?!\A)|^Boy\b) - either the end of the precvious match (\G(?!\A)) or the start of the string followed with the whole word Boy.*? - any 0+ chars other than a newline (if no RegexOptions.Singleline is passed to the Regex constructor) as few as possible\b(dog|cat|gerbil)\b - a whole word dog, cat or gerbilBascially, these regexps are similar, although \G based regex might turn out a bit faster.