How to parse a comma delimited string when comma and parenthesis exists in field

别等时光非礼了梦想. 提交于 2019-12-20 02:58:23

问题


I have this string in C#

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO

I want to use a RegEx to parse it to get the following:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

In addition to the above example, I tested with the following, but am still unable to parse it correctly.

"%exc.uns: 8 hours let  @ = ABC, DEF", "exc_it = 1 day"  , " summ=graffe ", " a,b,(c,d)" 

The new text will be in one string

string mystr = @"""%exc.uns: 8 hours let  @ = ABC, DEF"", ""exc_it = 1 day""  , "" summ=graffe "", "" a,b,(c,d)"""; 

回答1:


string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
    if (str[i] == ',' && scopeLevel == 0)
    {
        resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
        firstIndex = i + 1;
    }
    else if (str[i] == '(') scopeLevel++;
    else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));



回答2:


Event faster:

([^,]*\x28[^\x29]*\x29|[^,]+)

That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
                  ^                   ^  ^      ^                  ^

The Carets symbolize where the grouping stops.




回答3:


Just this regex:

[^,()]+(\([^()]*\))?

A test example:

var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(@"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
    .Cast<Match>()
    .Select(m => m.Value);

returns

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
 NG/CL
 5 value of CL(JK)
 HO



回答4:


If you simply must use Regex, then you can split the string on the following:

,                # match a comma
(?=              # that is followed by
  (?:            # either
    [^\(\)]*     #  no parens at all
    |            # or
    (?:          #  
      [^\(\)]*   #  ...
      \(         #  (
      [^\(\)]*   #     stuff in parens
      \)         #  )
      [^\(\)]*   #  ...
    )+           #  any number of times
  )$             # until the end of the string
)

It breaks your input into the following:

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.




回答5:


Another way to implement what Snowbear was doing:

    public static string[] SplitNest(this string s, char src, string nest, string trg)
    {
        int scope = 0;
        if (trg == null || nest == null) return null;
        if (trg.Length == 0 || nest.Length < 2) return null;
        if (trg.IndexOf(src) >= 0) return null;
        if (nest.IndexOf(src) >= 0) return null;

        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] == src && scope == 0)
            {
                s = s.Remove(i, 1).Insert(i, trg);
            }
            else if (s[i] == nest[0]) scope++;
            else if (s[i] == nest[1]) scope--;
        }

        return s.Split(trg);
    }

The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use

string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");

The function would first turn your string into

adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO

then split on the ~, ignoring the nested commas.




回答6:


Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:

MatchCollection matches = Regex.Matches(data, @"(?:[^(),]|\([^)]*\))+");



回答7:


var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";  
var result = string.Join(@"\n",Regex.Split(s, @"(?<=\)),|,\s"));  

The pattern matches for ) and excludes it from the match then matches , or matches , followed by a space.

result =

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO




回答8:


The TextFieldParser (msdn) class seems to have the functionality built-in:

TextFieldParser Class: - Provides methods and properties for parsing structured text files.

Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.

The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.

See the article which helped me find that




回答9:


Here's a stronger option, which parses the whole text, including nested parentheses:

string pattern = @"
\A
(?>
    (?<Token>
        (?:
            [^,()]              # Regular character
            |
            (?<Paren> \( )      # Opening paren - push to stack
            |
            (?<-Paren> \) )     # Closing paren - pop
            |
            (?(Paren),)         # If inside parentheses, match comma.
        )*?
    )
    (?(Paren)(?!))    # If we are not inside parentheses,
    (?:,|\Z)          # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
    #  though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);

You can get all matches by iterating over match.Groups["Token"].Captures.



来源:https://stackoverflow.com/questions/5477767/how-to-parse-a-comma-delimited-string-when-comma-and-parenthesis-exists-in-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!