Capture outer paren groups while ignoring inner paren groups

本小妞迷上赌 提交于 2020-08-10 19:34:09

问题


I'm using C# and regex, trying capture outer paren groups while ignoring inner paren groups. I have legacy-generated text files containing thousands of string constructions like the following:

([txtData] of COMPOSITE
(dirty FALSE)
(composite [txtModel])
(view [star3])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName txtData)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)

([txtUI] of COMPOSITE
(dirty FALSE)
(composite [txtModel])
(view [star2])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName ApplicationWindow)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)

([star38] of COMPOSITE
(dirty FALSE)
(composite [txtUI])
(view [star39])
(creationIndex 26)
(creationProps composite [txtUI] sName Bestellblatt)
(instanceNameSpecified TRUE)
(containsObject COMPOSITE)
(sName Bestellblatt)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)

I am looking for a regex that will capture the 3 groupings in the example above, and here is what I have tried so far:

Regex regex = new Regex(@"\((.*?)\)");
return regex.Matches(str);

The problem with the regex above is that it finds inner paren groupings such as dirty FALSE and composite [txtModel]. But what I want it to match is each of the outer groupings, such as the 3 shown above. The definition of an outer grouping is simple:

  1. Opening paren is either the first character in the file, or it follows a line feed and/or carriage return.
  2. Closing paren is either the last character in the file, or it is followed by a line feed or carriage return.

I want the regex pattern to ignore all paren-groupings that don't obey numbers 1 and 2 above. By "ignore" I mean that they shouldn't be seen as a match - but they should be returned as part of the outer grouping match.

So, for my objective to be met, when my C# regex runs against the example above, I should get back a regex MatchCollection with exactly 3 matches, just as shown above.

How is it done? (Thanks in advance.)


回答1:


You can achieve it via Balancing Groups.

Here is a demo to match outer brackets.

string sentence = @"([txtData] of COM ..."; // your text

string pattern = @"\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)";
Regex rgx = new Regex(pattern);

foreach (Match match in rgx.Matches(sentence))
{
    Console.WriteLine(match.Value);
    Console.WriteLine("--------");
}


来源:https://stackoverflow.com/questions/63024714/capture-outer-paren-groups-while-ignoring-inner-paren-groups

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!