Python split string without splitting escaped character

后端 未结 10 1369
谎友^
谎友^ 2020-12-08 20:56

Is there a way to split a string without splitting escaped character? For example, I have a string and want to split by \':\' and not by \'\\:\'

http\\://ww         


        
相关标签:
10条回答
  • 2020-12-08 21:20

    There is a much easier way using a regex with a negative lookbehind assertion:

    re.split(r'(?<!\\):', str)
    
    0 讨论(0)
  • 2020-12-08 21:21

    There is no builtin function for that. Here's an efficient, general and tested function, which even supports delimiters of any length:

    def escape_split(s, delim):
        i, res, buf = 0, [], ''
        while True:
            j, e = s.find(delim, i), 0
            if j < 0:  # end reached
                return res + [buf + s[i:]]  # add remainder
            while j - e and s[j - e - 1] == '\\':
                e += 1  # number of escapes
            d = e // 2  # number of double escapes
            if e != d * 2:  # odd number of escapes
                buf += s[i:j - d - 1] + s[j]  # add the escaped char
                i = j + 1  # and skip it
                continue  # add more to buf
            res.append(buf + s[i:j - d])
            i, buf = j + len(delim), ''  # start after delim
    
    0 讨论(0)
  • 2020-12-08 21:22

    Here is an efficient solution that handles double-escapes correctly, i.e. any subsequent delimiter is not escaped. It ignores an incorrect single-escape as the last character of the string.

    It is very efficient because it iterates over the input string exactly once, manipulating indices instead of copying strings around. Instead of constructing a list, it returns a generator.

    def split_esc(string, delimiter):
        if len(delimiter) != 1:
            raise ValueError('Invalid delimiter: ' + delimiter)
        ln = len(string)
        i = 0
        j = 0
        while j < ln:
            if string[j] == '\\':
                if j + 1 >= ln:
                    yield string[i:j]
                    return
                j += 1
            elif string[j] == delimiter:
                yield string[i:j]
                i = j + 1
            j += 1
        yield string[i:j]
    

    To allow for delimiters longer than a single character, simply advance i and j by the length of the delimiter in the "elif" case. This assumes that a single escape character escapes the entire delimiter, rather than a single character.

    Tested with Python 3.5.1.

    0 讨论(0)
  • 2020-12-08 21:26

    The edited version of Henry's answer with Python3 compatibility, tests and fix some issues:

    def split_unescape(s, delim, escape='\\', unescape=True):
        """
        >>> split_unescape('foo,bar', ',')
        ['foo', 'bar']
        >>> split_unescape('foo$,bar', ',', '$')
        ['foo,bar']
        >>> split_unescape('foo$$,bar', ',', '$', unescape=True)
        ['foo$', 'bar']
        >>> split_unescape('foo$$,bar', ',', '$', unescape=False)
        ['foo$$', 'bar']
        >>> split_unescape('foo$', ',', '$', unescape=True)
        ['foo$']
        """
        ret = []
        current = []
        itr = iter(s)
        for ch in itr:
            if ch == escape:
                try:
                    # skip the next character; it has been escaped!
                    if not unescape:
                        current.append(escape)
                    current.append(next(itr))
                except StopIteration:
                    if unescape:
                        current.append(escape)
            elif ch == delim:
                # split! (add current to the list and reset it)
                ret.append(''.join(current))
                current = []
            else:
                current.append(ch)
        ret.append(''.join(current))
        return ret
    
    0 讨论(0)
  • 2020-12-08 21:28

    Note that : doesn't appear to be a character that needs escaping.

    The simplest way that I can think of to accomplish this is to split on the character, and then add it back in when it is escaped.

    Sample code (In much need of some neatening.):

    def splitNoEscapes(string, char):
        sections = string.split(char)
        sections = [i + (char if i[-1] == "\\" else "") for i in sections]
        result = ["" for i in sections]
        j = 0
        for s in sections:
            result[j] += s
            j += (1 if s[-1] != char else 0)
        return [i for i in result if i != ""]
    
    0 讨论(0)
  • 2020-12-08 21:30

    building on @user629923's suggestion, but being much simpler than other answers:

    import re
    DBL_ESC = "!double escape!"
    
    s = r"Hello:World\:Goodbye\\:Cruel\\\:World"
    
    map(lambda x: x.replace(DBL_ESC, r'\\'), re.split(r'(?<!\\):', s.replace(r'\\', DBL_ESC)))
    
    0 讨论(0)
提交回复
热议问题