Split String in C#

前端 未结 9 987
萌比男神i
萌比男神i 2020-12-28 09:10

I thought this will be trivial but I can\'t get this to work.

Assume a line in a CSV file: \"Barack Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC

相关标签:
9条回答
  • That would be the expected behavior as quotes are just another string character in C#. Looks like what you are after is the quoted tokens or numeric tokens.

    I think you might need to use Regex to split the strings unless some one else knows a better way.

    Or you could just loop through the string one character at a time building up the string as you go and build the tokens that way. It's old school but may be the most reliable way in your case.

    0 讨论(0)
  • 2020-12-28 09:39

    You might have to write your own split function.

    • Iterate through each char in the string
    • When you hit a " character, toggle a boolean
    • When you hit a comma, if the bool is true, ignore it, else, you have your token

    Here's an example:

    public static class StringExtensions
    {
        public static string[] SplitQuoted(this string input, char separator, char quotechar)
        {
            List<string> tokens = new List<string>();
    
            StringBuilder sb = new StringBuilder();
            bool escaped = false;
            foreach (char c in input)
            {
                if (c.Equals(separator) && !escaped)
                {
                    // we have a token
                    tokens.Add(sb.ToString().Trim());
                    sb.Clear();
                }
                else if (c.Equals(separator) && escaped)
                {
                    // ignore but add to string
                    sb.Append(c);
                }
                else if (c.Equals(quotechar))
                {
                    escaped = !escaped;
                    sb.Append(c);
                }
                else
                {
                    sb.Append(c);
                }
            }
            tokens.Add(sb.ToString().Trim());
    
            return tokens.ToArray();
        }
    }
    

    Then just call:

    string[] tokens = line.SplitQuoted(',','\"');
    

    Benchmarks

    Results of benchmarking my code and Dan Tao's code are below. I'm happy to benchmark any other solutions if people want them?

    Code:

    string input = "\"Barak Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC\""; // Console.ReadLine()
    string[] tokens = null;
    
    // run tests
    DateTime start = DateTime.Now;
    for (int i = 0; i < 1000000; i++)
        tokens = input.SplitWithQualifier(',', '\"', false);
    Console.WriteLine("1,000,000 x SplitWithQualifier = {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);
    
    start = DateTime.Now;
    for (int i = 0; i<1000000;i++)
        tokens = input.SplitQuoted(',', '\"');
    Console.WriteLine("1,000,000 x SplitQuoted =        {0}ms", DateTime.Now.Subtract(start).TotalMilliseconds);
    

    Output:

    1,000,000 x SplitWithQualifier = 8156.25ms
    1,000,000 x SplitQuoted =        2406.25ms
    
    0 讨论(0)
  • 2020-12-28 09:43

    I have a SplitWithQualifier extension method that I use here and there, which utilizes Regex.

    I make no claim as to the robustness of this code, but it has worked all right for me for a while.

    // mangled code horribly to fit without scrolling
    public static class CsvSplitter
    {
        public static string[] SplitWithQualifier(this string text,
                                                  char delimiter,
                                                  char qualifier,
                                                  bool stripQualifierFromResult)
        {
            string pattern = string.Format(
                @"{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
                Regex.Escape(delimiter.ToString()),
                Regex.Escape(qualifier.ToString())
            );
    
            string[] split = Regex.Split(text, pattern);
    
            if (stripQualifierFromResult)
                return split.Select(s => s.Trim().Trim(qualifier)).ToArray();
            else
                return split;
        }
    }
    

    Usage:

    string csv = "\"Barak Obama\", 48, \"President\", \"1600 Penn Ave, Washington DC\"";
    string[] values = csv.SplitWithQualifier(',', '\"', true);
    
    foreach (string value in values)
        Console.WriteLine(value);
    

    Output:

    Barak Obama
    48
    President
    1600 Penn Ave, Washington DC
    
    0 讨论(0)
  • 2020-12-28 09:43

    I would recommend using a regular expression instead. It will allow you to extract more complicated substrings in a much more versatile manner (precisely as you want).

    http://www.c-sharpcorner.com/uploadfile/prasad_1/regexppsd12062005021717am/regexppsd.aspx

    http://oreilly.com/windows/archive/csharp-regular-expressions.html

    0 讨论(0)
  • 2020-12-28 09:44

    I see from the bigger picture that you are actually trying to parse CSV input. So instead of advising on how to split the string properly, I would instead recommend you to use a CSV parser to do this kind of thing.

    A Fast CSV Reader

    One that I would recommend is the library (source code available) that you can get from this CodeProject page: http://www.codeproject.com/KB/database/CsvReader.aspx

    I personally use it myself and like it. It's a .NET native code and a lot faster than using OLEDB (which also can do the CSV parsing for you, but believe me, it's slow).

    0 讨论(0)
  • 2020-12-28 09:44

    You can't parse a CSV line with a simple Split on commas, because some cell contents will contain commas that aren't meant to delineate data but are actually part of the cell contents themselves.

    Here is a link to a simple regex-based C# method that will convert your CSV into a handly DataTable:

    http://www.hotblue.com/article0000.aspx?a=0006

    Working with DataTables is very easy - let me know if you need a code sample for that.

    0 讨论(0)
提交回复
热议问题