How to split comma-separated key-value pairs with quoted commas

后端未结

关注

 5  1495

终归单人心 2021-01-02 13:03

I know there are a lot of other posts about parsing comma-separated values, but I couldn\'t find one that splits key-value pairs and handles quoted commas.

I have st

5条回答

长发绾君心 (楼主)

2021-01-02 13:54

You could abuse Python tokenizer to parse the key-value list:

#!/usr/bin/env python
from tokenize import generate_tokens, NAME, NUMBER, OP, STRING, ENDMARKER

def parse_key_value_list(text):
    key = value = None
    for type, string, _,_,_ in generate_tokens(lambda it=iter([text]): next(it)):
        if type == NAME and key is None:
            key = string
        elif type in {NAME, NUMBER, STRING}:
            value = {
                NAME: lambda x: x,
                NUMBER: int,
                STRING: lambda x: x[1:-1]
            }[type](string)
        elif ((type == OP and string == ',') or
              (type == ENDMARKER and key is not None)):
            yield key, value
            key = value = None

text = '''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"'''
print(dict(parse_key_value_list(text)))

Output

{'phrase': "I'm cool!", 'age': 12, 'name': 'bob', 'hobbies': 'games,reading'}

You could use a finite-state machine (FSM) to implement a stricter parser. The parser uses only the current state and the next token to parse input:

#!/usr/bin/env python
from tokenize import generate_tokens, NAME, NUMBER, OP, STRING, ENDMARKER

def parse_key_value_list(text):
    def check(condition):
        if not condition:
            raise ValueError((state, token))

    KEY, EQ, VALUE, SEP = range(4)
    state = KEY
    for token in generate_tokens(lambda it=iter([text]): next(it)):
        type, string = token[:2]
        if state == KEY:
            check(type == NAME)
            key = string
            state = EQ
        elif state == EQ:
            check(type == OP and string == '=')
            state = VALUE
        elif state == VALUE:
            check(type in {NAME, NUMBER, STRING})
            value = {
                NAME: lambda x: x,
                NUMBER: int,
                STRING: lambda x: x[1:-1]
            }[type](string)
            state = SEP
        elif state == SEP:
            check(type == OP and string == ',' or type == ENDMARKER)
            yield key, value
            state = KEY

text = '''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"'''
print(dict(parse_key_value_list(text)))

0 讨论(0)

查看其它5个回答