Split String in C#

前端 未结 9 1006
萌比男神i
萌比男神i 2020-12-28 09:10

I thought this will be trivial but I can\'t get this to work.

Assume a line in a CSV file: \"Barack Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC

9条回答
  •  醉话见心
    2020-12-28 09:39

    You might have to write your own split function.

    • Iterate through each char in the string
    • When you hit a " character, toggle a boolean
    • When you hit a comma, if the bool is true, ignore it, else, you have your token

    Here's an example:

    public static class StringExtensions
    {
        public static string[] SplitQuoted(this string input, char separator, char quotechar)
        {
            List tokens = new List();
    
            StringBuilder sb = new StringBuilder();
            bool escaped = false;
            foreach (char c in input)
            {
                if (c.Equals(separator) && !escaped)
                {
                    // we have a token
                    tokens.Add(sb.ToString().Trim());
                    sb.Clear();
                }
                else if (c.Equals(separator) && escaped)
                {
                    // ignore but add to string
                    sb.Append(c);
                }
                else if (c.Equals(quotechar))
                {
                    escaped = !escaped;
                    sb.Append(c);
                }
                else
                {
                    sb.Append(c);
                }
            }
            tokens.Add(sb.ToString().Trim());
    
            return tokens.ToArray();
        }
    }
    

    Then just call:

    string[] tokens = line.SplitQuoted(',','\"');
    

    Benchmarks

    Results of benchmarking my code and Dan Tao's code are below. I'm happy to benchmark any other solutions if people want them?

    Code:

    string input = "\"Barak Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC\""; // Console.ReadLine()
    string[] tokens = null;
    
    // run tests
    DateTime start = DateTime.Now;
    for (int i = 0; i < 1000000; i++)
        tokens = input.SplitWithQualifier(',', '\"', false);
    Console.WriteLine("1,000,000 x SplitWithQualifier = {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);
    
    start = DateTime.Now;
    for (int i = 0; i<1000000;i++)
        tokens = input.SplitQuoted(',', '\"');
    Console.WriteLine("1,000,000 x SplitQuoted =        {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);
    

    Output:

    1,000,000 x SplitWithQualifier = 8156.25ms
    1,000,000 x SplitQuoted =        2406.25ms
    

提交回复
热议问题