How to split comma-separated key-value pairs with quoted commas

后端 未结 5 1495
终归单人心
终归单人心 2021-01-02 13:03

I know there are a lot of other posts about parsing comma-separated values, but I couldn\'t find one that splits key-value pairs and handles quoted commas.

I have st

5条回答
  •  长发绾君心
    2021-01-02 13:54

    You could abuse Python tokenizer to parse the key-value list:

    #!/usr/bin/env python
    from tokenize import generate_tokens, NAME, NUMBER, OP, STRING, ENDMARKER
    
    def parse_key_value_list(text):
        key = value = None
        for type, string, _,_,_ in generate_tokens(lambda it=iter([text]): next(it)):
            if type == NAME and key is None:
                key = string
            elif type in {NAME, NUMBER, STRING}:
                value = {
                    NAME: lambda x: x,
                    NUMBER: int,
                    STRING: lambda x: x[1:-1]
                }[type](string)
            elif ((type == OP and string == ',') or
                  (type == ENDMARKER and key is not None)):
                yield key, value
                key = value = None
    
    text = '''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"'''
    print(dict(parse_key_value_list(text)))
    

    Output

    {'phrase': "I'm cool!", 'age': 12, 'name': 'bob', 'hobbies': 'games,reading'}
    

    You could use a finite-state machine (FSM) to implement a stricter parser. The parser uses only the current state and the next token to parse input:

    #!/usr/bin/env python
    from tokenize import generate_tokens, NAME, NUMBER, OP, STRING, ENDMARKER
    
    def parse_key_value_list(text):
        def check(condition):
            if not condition:
                raise ValueError((state, token))
    
        KEY, EQ, VALUE, SEP = range(4)
        state = KEY
        for token in generate_tokens(lambda it=iter([text]): next(it)):
            type, string = token[:2]
            if state == KEY:
                check(type == NAME)
                key = string
                state = EQ
            elif state == EQ:
                check(type == OP and string == '=')
                state = VALUE
            elif state == VALUE:
                check(type in {NAME, NUMBER, STRING})
                value = {
                    NAME: lambda x: x,
                    NUMBER: int,
                    STRING: lambda x: x[1:-1]
                }[type](string)
                state = SEP
            elif state == SEP:
                check(type == OP and string == ',' or type == ENDMARKER)
                yield key, value
                state = KEY
    
    text = '''age=12,name=bob,hobbies="games,reading",phrase="I'm cool!"'''
    print(dict(parse_key_value_list(text)))
    

提交回复
热议问题