How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

让人想犯罪 __ 提交于 2020-02-02 02:55:56

问题


How can I use lookbehind in a C# Regex in order to skip matches of repeated prefix patterns?

Example - I'm trying to have the expression match all the b characters following any number of a characters:

Regex expression = new Regex("(?<=a).*");

foreach (Match result in expression.Matches("aaabbbb"))
  MessageBox.Show(result.Value);

returns aabbbb, the lookbehind matching only an a. How can I make it so that it would match all the as in the beginning?

I've tried

Regex expression = new Regex("(?<=a+).*");

and

Regex expression = new Regex("(?<=a)+.*");

with no results...

What I'm expecting is bbbb.


回答1:


Are you looking for a repeated capturing group?

(.)\1*

This will return two matches.

Given:

aaabbbb

This will result in:

aaa
bbbb

This:

(?<=(.))(?!\1).*

Uses the above principal, first checking that the finding the previous character, capturing it into a back reference, and then asserting that that character is not the next character.

That matches:

bbbb



回答2:


I figured it out eventually:

Regex expression = new Regex("(?<=a+)[^a]+");

foreach (Match result in expression.Matches(@"aaabbbb"))
   MessageBox.Show(result.Value);

I must not allow the as to me matched by the non-lookbehind group. This way, the expression will only match those b repetitions that follow a repetitions.

Matching aaabbbb yields bbbb and matching aaabbbbcccbbbbaaaaaabbzzabbb results in bbbbcccbbbb, bbzz and bbb.




回答3:


The reason the look-behind is skipping the "a" is because it is consuming the first "a" (but no capturing it), then it captures the rest.

Would this pattern work for you instead? New pattern: \ba+(.+)\b It uses a word boundary \b to anchor either ends of the word. It matches at least one "a" followed by the rest of the characters till the word boundary ends. The remaining characters are captured in a group so you can reference them easily.

string pattern = @"\ba+(.+)\b";

foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
    Console.WriteLine("Match: " + m.Value);
    Console.WriteLine("Group capture: " + m.Groups[1].Value);
}

UPDATE: If you want to skip the first occurrence of any duplicated letters, then match the rest of the string, you could do this:

string pattern = @"\b(.)(\1)*(?<Content>.+)\b";

foreach (Match m in Regex.Matches("aaabbbb", pattern))
{
    Console.WriteLine("Match: " + m.Value);
    Console.WriteLine("Group capture: " + m.Groups["Content"].Value);
}


来源:https://stackoverflow.com/questions/3839702/how-can-i-use-lookbehind-in-a-c-sharp-regex-in-order-to-skip-matches-of-repeated

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!