Python, how to parse strings to look like sys.argv

后端 未结 2 1017
南旧
南旧 2020-12-13 06:14

I would like to parse a string like this:

-o 1  --long \"Some long string\"  

into this:

[\"-o\", \"1\", \"--long\", \'Some         


        
相关标签:
2条回答
  • 2020-12-13 06:49

    Before I was aware of shlex.split, I made the following:

    import sys
    
    _WORD_DIVIDERS = set((' ', '\t', '\r', '\n'))
    
    _QUOTE_CHARS_DICT = {
        '\\':   '\\',
        ' ':    ' ',
        '"':    '"',
        'r':    '\r',
        'n':    '\n',
        't':    '\t',
    }
    
    def _raise_type_error():
        raise TypeError("Bytes must be decoded to Unicode first")
    
    def parse_to_argv_gen(instring):
        is_in_quotes = False
        instring_iter = iter(instring)
        join_string = instring[0:0]
    
        c_list = []
        c = ' '
        while True:
            # Skip whitespace
            try:
                while True:
                    if not isinstance(c, str) and sys.version_info[0] >= 3:
                        _raise_type_error()
                    if c not in _WORD_DIVIDERS:
                        break
                    c = next(instring_iter)
            except StopIteration:
                break
            # Read word
            try:
                while True:
                    if not isinstance(c, str) and sys.version_info[0] >= 3:
                        _raise_type_error()
                    if not is_in_quotes and c in _WORD_DIVIDERS:
                        break
                    if c == '"':
                        is_in_quotes = not is_in_quotes
                        c = None
                    elif c == '\\':
                        c = next(instring_iter)
                        c = _QUOTE_CHARS_DICT.get(c)
                    if c is not None:
                        c_list.append(c)
                    c = next(instring_iter)
                yield join_string.join(c_list)
                c_list = []
            except StopIteration:
                yield join_string.join(c_list)
                break
    
    def parse_to_argv(instring):
        return list(parse_to_argv_gen(instring))
    

    This works with Python 2.x and 3.x. On Python 2.x, it works directly with byte strings and Unicode strings. On Python 3.x, it only accepts [Unicode] strings, not bytes objects.

    This doesn't behave exactly the same as shell argv splitting—it also allows quoting of CR, LF and TAB characters as \r, \n and \t, converting them to real CR, LF, TAB (shlex.split doesn't do that). So writing my own function was useful for my needs. I guess shlex.split is better if you just want plain shell-style argv splitting. I'm sharing this code in case it's useful as a baseline for doing something slightly different.

    0 讨论(0)
  • 2020-12-13 06:54

    I believe you want the shlex module.

    >>> import shlex
    >>> shlex.split('-o 1 --long "Some long string"')
    ['-o', '1', '--long', 'Some long string']
    
    0 讨论(0)
提交回复
热议问题