Regular expression to match comma separated list of key=value where value can contain commas

前端 未结 5 688
一生所求
一生所求 2020-12-20 15:54

I have a naive \"parser\" that simply does something like:
[x.split(\'=\') for x in mystring.split(\',\')]

However mystring can be something like<

相关标签:
5条回答
  • 2020-12-20 16:27

    daramarak's answer either very nearly works, or works as-is; it's hard to tell from the way the sample output is formatted and the vague descriptions of the steps. But if it's the very-nearly-works version, it's easy to fix.

    Putting it into code:

    >>> bits=[x.rsplit(',', 1) for x in s.split('=')]
    >>> kv = [(bits[i][-1], bits[i+1][0]) for i in range(len(bits)-1)]
    

    The first line is (I believe) daramarak's answer. By itself, the first line gives you pairs of (value_i, key_i+1) instead of (key_i, value_i). The second line is the most obvious fix for that. With more intermediate steps, and a bit of output, to see how it works:

    >>> s = 'foo=bar,breakfast=spam,eggs,blt=bacon,lettuce,tomato,spam=spam'
    >>> bits0 = s.split('=')
    >>> bits0
    ['foo', 'bar,breakfast', 'spam,eggs,blt', 'bacon,lettuce,tomato,spam', 'spam']
    >>> bits = [x.rsplit(',', 1) for x in bits0]
    >>> bits
    [('foo'), ('bar', 'breakfast'), ('spam,eggs', 'blt'), ('bacon,lettuce,tomato', 'spam'), ('spam')]
    >>> kv = [(bits[i][-1], bits[i+1][0]) for i in range(len(bits)-1)]
    >>> kv
    [('foo', 'bar'), ('breakfast', 'spam,eggs'), ('blt', 'bacon,lettuce,tomato'), ('spam', 'spam')]
    
    0 讨论(0)
  • 2020-12-20 16:28

    Can you try this, it worked for me:

    mystring = "foo=bar,breakfast=spam,eggs,e=a"
    n = []
    i = 0
    
    for x in mystring.split(','):
        if '=' not in x:
            n[i-1] = "{0},{1}".format(n[i-1], x)
        else:
            n.append(x)
            i += 1
    print n
    

    You get result like:

      ['foo=bar', 'breakfast=spam,eggs', 'e=a']
    

    Then you can simply go over list and do what you want.

    0 讨论(0)
  • 2020-12-20 16:28

    Assuming that the name of the key never contains ,, you can split at , when the next sequence without , and = is succeeded by =.

    re.split(r',(?=[^,=]+=)', inputString)
    

    (This is the same as my original solution. I expect re.split to be used, rather than re.findall or str.split).

    The full solution can be done in one-liner:

    [re.findall('(.*?)=(.*)', token)[0] for token in re.split(r',(?=[^,=]+=)', inputString)]
    
    0 讨论(0)
  • Just for comparison purposes, here's a regex that seems to solve the problem as well:

    ([^=]+)    # key
    =          # equals is how we tokenise the original string
    ([^=]+)    # value
    (?:,|$)    # value terminator, either comma or end of string
    

    The trick here it to restrict what you're capturing in your second group. .+ swallows the = sign, which is the character we can use to distinguish keys from values. The full regex doesn't rely on any back-tracking (so it should be compatible with something like re2, if that's desirable) and can work on abarnert's examples.

    Usage as follows:

    re.findall(r'([^=]+)=([^=]+)(?:,|$)', 'foo=bar,breakfast=spam,eggs,blt=bacon,lettuce,tomato,spam=spam')
    

    Which returns:

    [('foo', 'bar'), ('breakfast', 'spam,eggs'), ('blt', 'bacon,lettuce,tomato'), ('spam', 'spam')]
    
    0 讨论(0)
  • 2020-12-20 16:37

    Could I suggest that you use the split operations as before. But split at the equals first, then splitting at the rightmost comma, to make a single list of left and right strings.

    input =
    "bob=whatever,king=kong,banana=herb,good,yellow,thorn=hurts"
    

    will at first split become

    first_split = input.split("=")
    #first_split = ['bob' 'whatever,king' 'kong,banana' 'herb,good,yellow,thorn' 'hurts']
    

    then splitting at rightmost comma gives you:

    second_split = [single_word for sublist in first_split for item in sublist.rsplit(",",1)]
    #second_split = ['bob' 'whatever' 'king' 'kong' 'banana' 'herb,good,yellow' 'thorn' 'hurts']
    

    then you just gather the pairs like this:

    pairs = dict(zip(second_split[::2],second_split[1::2]))
    
    0 讨论(0)
提交回复
热议问题