Split a string ignoring quoted sections

前端 未结 13 2345
别跟我提以往
别跟我提以往 2020-12-06 00:15

Given a string like this:

a,\"string, with\",various,\"values, and some\",quoted

What is a good algorithm to split this based on

相关标签:
13条回答
  • 2020-12-06 00:20

    If my language of choice didn't offer a way to do this without thinking then I would initially consider two options as the easy way out:

    1. Pre-parse and replace the commas within the string with another control character then split them, followed by a post-parse on the array to replace the control character used previously with the commas.

    2. Alternatively split them on the commas then post-parse the resulting array into another array checking for leading quotes on each array entry and concatenating the entries until I reached a terminating quote.

    These are hacks however, and if this is a pure 'mental' exercise then I suspect they will prove unhelpful. If this is a real world problem then it would help to know the language so that we could offer some specific advice.

    0 讨论(0)
  • 2020-12-06 00:21

    I use this to parse strings, not sure if it helps here; but with some minor modifications perhaps?

    function getstringbetween($string, $start, $end){
        $string = " ".$string;
        $ini = strpos($string,$start);
        if ($ini == 0) return "";
        $ini += strlen($start);   
        $len = strpos($string,$end,$ini) - $ini;
        return substr($string,$ini,$len);
    }
    
    $fullstring = "this is my [tag]dog[/tag]";
    $parsed = getstringbetween($fullstring, "[tag]", "[/tag]");
    
    echo $parsed; // (result = dog) 
    

    /mp

    0 讨论(0)
  • 2020-12-06 00:26

    The author here dropped in a blob of C# code that handles the scenario you're having a problem with:

    CSV File Imports in .Net

    Shouldn't be too difficult to translate.

    0 讨论(0)
  • 2020-12-06 00:26

    Here's one in pseudocode (a.k.a. Python) in one pass :-P

    def parsecsv(instr):
        i = 0
        j = 0
    
        outstrs = []
    
        # i is fixed until a match occurs, then it advances
        # up to j. j inches forward each time through:
    
        while i < len(instr):
    
            if j < len(instr) and instr[j] == '"':
                # skip the opening quote...
                j += 1
                # then iterate until we find a closing quote.
                while instr[j] != '"':
                    j += 1
                    if j == len(instr):
                        raise Exception("Unmatched double quote at end of input.")
    
            if j == len(instr) or instr[j] == ',':
                s = instr[i:j]  # get the substring we've found
                s = s.strip()    # remove extra whitespace
    
                # remove surrounding quotes if they're there
                if len(s) > 2 and s[0] == '"' and s[-1] == '"':
                    s = s[1:-1]
    
                # add it to the result
                outstrs.append(s)
    
                # skip over the comma, move i up (to where
                # j will be at the end of the iteration)
                i = j+1
    
            j = j+1
    
        return outstrs
    
    def testcase(instr, expected):
        outstr = parsecsv(instr)
        print outstr
        assert expected == outstr
    
    # Doesn't handle things like '1, 2, "a, b, c" d, 2' or
    # escaped quotes, but those can be added pretty easily.
    
    testcase('a, b, "1, 2, 3", c', ['a', 'b', '1, 2, 3', 'c'])
    testcase('a,b,"1, 2, 3" , c', ['a', 'b', '1, 2, 3', 'c'])
    
    # odd number of quotes gives a "unmatched quote" exception
    #testcase('a,b,"1, 2, 3" , "c', ['a', 'b', '1, 2, 3', 'c'])
    
    0 讨论(0)
  • 2020-12-06 00:28

    Here's a simple python implementation based on Pat's pseudocode:

    def splitIgnoringSingleQuote(string, split_char, remove_quotes=False):
        string_split = []
        current_word = ""
        inside_quote = False
        for letter in string:
          if letter == "'":
            if not remove_quotes:
               current_word += letter
            if inside_quote:
              inside_quote = False
            else:
              inside_quote = True
          elif letter == split_char and not inside_quote:
            string_split.append(current_word)
            current_word = ""
          else:
            current_word += letter
        string_split.append(current_word)
        return string_split
    
    0 讨论(0)
  • 2020-12-06 00:34

    Of course using a CSV parser is better but just for the fun of it you could:

    Loop on the string letter by letter.
        If current_letter == quote : 
            toggle inside_quote variable.
        Else if (current_letter ==comma and not inside_quote) : 
            push current_word into array and clear current_word.
        Else 
            append the current_letter to current_word
    When the loop is done push the current_word into array 
    
    0 讨论(0)
提交回复
热议问题