Most Pythonic Way to Split an Array by Repeating Elements

前端 未结 11 1526
星月不相逢
星月不相逢 2021-02-13 09:51

I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. F

11条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-13 10:10

    Use a generator function to maintain state of your iterator through the list, and the count of the number of separator chars seen so far:

    l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'] 
    
    def splitOn(ll, x, n):
        cur = []
        splitcount = 0
        for c in ll:
            if c == x:
                splitcount += 1
                if splitcount == n:
                    yield cur
                    cur = []
                    splitcount = 0
            else:
                cur.append(c)
                splitcount = 0
        yield cur
    
    print list(splitOn(l, 'X', 2))
    print list(splitOn(l, 'X', 1))
    print list(splitOn(l, 'X', 3))
    
    l += ['X','X']
    print list(splitOn(l, 'X', 2))
    print list(splitOn(l, 'X', 1))
    print list(splitOn(l, 'X', 3))
    

    prints:

    [['a', 'b'], ['c', 'd'], ['f', 'g']]
    [['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
    [['a', 'b', 'c', 'd', 'f', 'g']]
    [['a', 'b'], ['c', 'd'], ['f', 'g'], []]
    [['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
    [['a', 'b', 'c', 'd', 'f', 'g']]
    

    EDIT: I'm also a big fan of groupby, here's my go at it:

    from itertools import groupby
    def splitOn(ll, x, n):
        cur = []
        for isdelim,grp in groupby(ll, key=lambda c:c==x):
            if isdelim:
                nn = sum(1 for c in grp)
                while nn >= n:
                    yield cur
                    cur = []
                    nn -= n
            else:
                cur.extend(grp)
        yield cur
    

    Not too different from my earlier answer, just lets groupby take care of iterating over the input list, creating groups of delimiter-matching and not-delimiter-matching characters. The non-matching characters just get added onto the current element, the matching character groups do the work of breaking up new elements. For long lists, this is probably a bit more efficient, as groupby does all its work in C, and still only iterates over the list once.

提交回复
热议问题