Split using delimiter except when delimiter is escaped

前端 未结 5 1289
我寻月下人不归
我寻月下人不归 2020-12-11 05:49

I\'m reading clipboard data coming from excel using

var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSepara

相关标签:
5条回答
  • 2020-12-11 06:16

    There are a lot of ways to do this. One inelegant way that would work is:

    1. Convert \",\" to tab or some other delimiter (I assume you left out a few \" in your example because otherwise the string is not consistent
    2. Strip all remaining commas
    3. Strip all remaining \"
    4. Convert your delimiter (e.g. tab) back into a comma

    Now you have what you wanted in first place

    0 讨论(0)
  • 2020-12-11 06:20

    You could try to use a bit of LINQ:

    string excelData = "\\\" 1,234,123.00 \\\",\\\" 2,345.00 \\\", 342.00 ,\\\" 12,345.00 \\\"";
    
    IEnumerable<string> cells = from x in excelData.Split(new string[] { "\\\"" }, StringSplitOptions.RemoveEmptyEntries)
                                let y = x.Trim(',').Trim()
                                where !string.IsNullOrWhiteSpace(y)
                                select y;
    

    Alternatively, if you don't like this suggestion, try to implement a similar pattern with RegEx.

    0 讨论(0)
  • 2020-12-11 06:22

    I agree with Kyle regarding your string probably not being consistent.

    Instead of Kyle's first step you could use

    string[] vals = Regex.Split(value, @"\s*\"",\s*");
    
    0 讨论(0)
  • 2020-12-11 06:23

    First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like "". So assuming all of that here's how I've dealt with this in the past

    public static IEnumerable<string> SplitExcelRow(this string value)
    {
        value = value.Replace("\"\"", "&quot;");
        bool quoted = false;
        int currStartIndex = 0;
        for (int i = 0; i < value.Length; i++)
        {
            char currChar = value[i];
            if (currChar == '"')
            {
                quoted = !quoted;       
            }
            else if (currChar == ',')
            {
                if (!quoted)
                {
                    yield return value.Substring(currStartIndex, i - currStartIndex)
                        .Trim()
                        .Replace("\"","")
                        .Replace("&quot;","\"");
                    currStartIndex = i + 1;
                }
            }
        }
        yield return value.Substring(currStartIndex, value.Length - currStartIndex)
            .Trim()
            .Replace("\"", "")
            .Replace("&quot;", "\"");
    }
    

    Of course this assumes the data coming in is valid so if you have something like "fo,o"b,ar","bar""foo" this will not work. Additionally if your data contains &quot; then it will be turned into a " which may or may not be desirable.

    0 讨论(0)
  • 2020-12-11 06:36

    From your input example, we can see that there are three "unwanted" sequences of characters:

    \"
    \",
    ,\"
    

    So, add all these sequences to the input array for the Split method:

    string[] result = clipData.Split(new[] { @",\""", @"\"",", @"\""" }, 
        StringSplitOptions.None);
    

    This will give you an array containing a few empty elements. If that is a problem, use StringSplitOptions.RemoveEmptyEntries instead of StringSplitOptions.None:

    string[] result = clipData.Split(new[] { @",\""", @"\"",", @"\""" }, 
        StringSplitOptions.RemoveEmptyEntries);
    
    0 讨论(0)
提交回复
热议问题