How to extract Key-Value pairs from a string, when the Key itself is the separator?

此生再无相见时 提交于 2019-12-25 02:30:13

问题


Say there is a string in the loose "format",

string str = "V1,B=V1,C=V1,V2,V3,D=V1,V2,A=V1,=V2,V3";

and a known set of Keys

List<string> lst = new List<string>() { "A", "B", "C", "D" };

How can the Key-Value pairs shown below be extracted? (Any text before the first Key should be treated as the Value for the null Key. Also the Values shown below have any trailing comma removed.)

Key     Value
(null)  V1
A       V1,=V2,V3     (The = here is, unfortunately, part of the value)
B       V1
C       V1,V2,V3 
D       V1,V2

This problem is difficult because it is not possible to split immediately on either the = or ,.


回答1:


Ignoring the known set of keys, and assuming each key appears only once:

string str = "V1,B=V1,C=V1,V2,V3,D=V1,V2,A=V1,=V2,V3";

var splitByEqual = new[] {'='};

var values = Regex.Split(str, @",(?=\w+=)")
    .Select(token => token.Split(splitByEqual, 2))
    .ToDictionary(pair => pair.Length == 1 ? "" : pair.First(),
                  pair => pair.Last());
  • The regex is pretty simple: split by commas that are followed by a key (any alphanumeric) and an equal sign. (If we allow A=V1,V2=V3 this wouldn't work)
  • Now we have the collection {V1,B=V1,C=V1,V2,V3,D=V1,V2,A=V1,=V2,V3}. We split that by =, but not more than once.
  • Next we create a dictionary. This line is a little ugly, but isn't too important - we already have the data we need. I'm also using an empty string instead of null.

If we do want to use the known list of keys, we can change the pattern to:

var splitPattern = @",(?=(?:" + String.Join("|", keys.Select(Regex.Escape))) + ")=)";

and use Regex.Split(str, splitPattern).




回答2:


Assuming the keys do not also occur in the values:

  • For each key,
    • search for the regexp ",|^" + KEY + "="
  • split the string at the found locations
  • then process each split string individually. anything before the first = is the key, anything after is the value



回答3:


Can't you remove the leading = before you split? Here's an approach using String.Split and LINQ:

var pairs = str.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)
    .Select(x => new { KeyVals = x.TrimStart('=').Split('=') })
    .Select(x => new
    {
        Key = x.KeyVals.Length == 1 ? null : x.KeyVals[0].Trim(),
        Value = x.KeyVals.Last().Trim()
    })
    .GroupBy(x => x.Key)
    .Select(g => new { g.Key, Values=g.Select(x => x.Value) });

Output:

foreach (var keyVal in pairs)
    Console.WriteLine("Key:{0} Values:{1}", keyVal.Key, string.Join(",", keyVal.Values)); 

Key: Values:V1,V2,V3,V2,V2,V3
Key:B Values:V1
Key:C Values:V1
Key:D Values:V1
Key:A Values:V1

The result is different to your desired, so maybe i'm on the wrong track. It's also not clear why you need the "known set of Keys". If you want to filter by them add a Where before the GroupBy.




回答4:


I hate myself for going all old-school, but try replacing the leading = with another character before the split then put it back afterwards:

Debug view of result:

    private static List<KeyValuePair<string, string>> ExtractData(string dataString, List<string> keys)
    {
        // Convert any leading "=" to another character avoid losing it :)
        dataString = dataString.Replace(",=", ",+");

        List<KeyValuePair<string, string>> result = new List<KeyValuePair<string, string>>();

        // Split on equals and comma
        var entries = dataString.Split(new char[] { '=', ',' }, StringSplitOptions.RemoveEmptyEntries);

        // Start with null key
        string key = null;

        // Start with blank value for each key
        string value = "";
        foreach (string entry in entries)
        {
            // Put back any removed '='
            string text = entry.Replace('+', '=');
            if (keys.Contains(entry))
            {
                // Save previous key value
                if (!string.IsNullOrEmpty(value))
                {
                    result.Add(new KeyValuePair<string, string>(key, value.TrimEnd(new char[] { ',' })));
                }
                key = entry;
                value = "";
            }
            else
            {
                value += text + ",";
            }
        }
        // save last result
        result.Add(new KeyValuePair<string,string>(key, value.TrimEnd(new char[]{','})));
        return result;
    }

I know this can be shortened with LINQ etc, but no time to make it pretty :)



来源:https://stackoverflow.com/questions/24136021/how-to-extract-key-value-pairs-from-a-string-when-the-key-itself-is-the-separat

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!