Python split string without splitting escaped character

后端 未结 10 1372
谎友^
谎友^ 2020-12-08 20:56

Is there a way to split a string without splitting escaped character? For example, I have a string and want to split by \':\' and not by \'\\:\'

http\\://ww         


        
10条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-08 21:38

    As Ignacio says, yes, but not trivially in one go. The issue is that you need lookback to determine if you're at an escaped delimiter or not, and the basic string.split doesn't provide that functionality.

    If this isn't inside a tight loop so performance isn't a significant issue, you can do it by first splitting on the escaped delimiters, then performing the split, and then merging. Ugly demo code follows:

    # Bear in mind this is not rigorously tested!
    def escaped_split(s, delim):
        # split by escaped, then by not-escaped
        escaped_delim = '\\'+delim
        sections = [p.split(delim) for p in s.split(escaped_delim)] 
        ret = []
        prev = None
        for parts in sections: # for each list of "real" splits
            if prev is None:
                if len(parts) > 1:
                    # Add first item, unless it's also the last in its section
                    ret.append(parts[0])
            else:
                # Add the previous last item joined to the first item
                ret.append(escaped_delim.join([prev, parts[0]]))
            for part in parts[1:-1]:
                # Add all the items in the middle
                ret.append(part)
            prev = parts[-1]
        return ret
    
    s = r'http\://www.example.url:ftp\://www.example.url'
    print (escaped_split(s, ':')) 
    # >>> ['http\\://www.example.url', 'ftp\\://www.example.url']
    

    Alternately, it might be easier to follow the logic if you just split the string by hand.

    def escaped_split(s, delim):
        ret = []
        current = []
        itr = iter(s)
        for ch in itr:
            if ch == '\\':
                try:
                    # skip the next character; it has been escaped!
                    current.append('\\')
                    current.append(next(itr))
                except StopIteration:
                    pass
            elif ch == delim:
                # split! (add current to the list and reset it)
                ret.append(''.join(current))
                current = []
            else:
                current.append(ch)
        ret.append(''.join(current))
        return ret
    

    Note that this second version behaves slightly differently when it encounters double-escapes followed by a delimiter: this function allows escaped escape characters, so that escaped_split(r'a\\:b', ':') returns ['a\\\\', 'b'], because the first \ escapes the second one, leaving the : to be interpreted as a real delimiter. So that's something to watch out for.

提交回复
热议问题