Python: split a list based on a condition?

前端 未结 30 2293
误落风尘
误落风尘 2020-11-22 06:56

What\'s the best way, both aesthetically and from a performance perspective, to split a list of items into multiple lists based on a conditional? The equivalent of:

30条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-22 07:22

    My take on it. I propose a lazy, single-pass, partition function, which preserves relative order in the output subsequences.

    1. Requirements

    I assume that the requirements are:

    • maintain elements' relative order (hence, no sets and dictionaries)
    • evaluate condition only once for every element (hence not using (i)filter or groupby)
    • allow for lazy consumption of either sequence (if we can afford to precompute them, then the naïve implementation is likely to be acceptable too)

    2. split library

    My partition function (introduced below) and other similar functions have made it into a small library:

    • python-split

    It's installable normally via PyPI:

    pip install --user split
    

    To split a list base on condition, use partition function:

    >>> from split import partition
    >>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
    >>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
    >>> images, other = partition(lambda f: f[-1] in image_types, files)
    >>> list(images)
    [('file1.jpg', 33L, '.jpg')]
    >>> list(other)
    [('file2.avi', 999L, '.avi')]
    

    3. partition function explained

    Internally we need to build two subsequences at once, so consuming only one output sequence will force the other one to be computed too. And we need to keep state between user requests (store processed but not yet requested elements). To keep state, I use two double-ended queues (deques):

    from collections import deque
    

    SplitSeq class takes care of the housekeeping:

    class SplitSeq:
        def __init__(self, condition, sequence):
            self.cond = condition
            self.goods = deque([])
            self.bads = deque([])
            self.seq = iter(sequence)
    

    Magic happens in its .getNext() method. It is almost like .next() of the iterators, but allows to specify which kind of element we want this time. Behind the scene it doesn't discard the rejected elements, but instead puts them in one of the two queues:

        def getNext(self, getGood=True):
            if getGood:
                these, those, cond = self.goods, self.bads, self.cond
            else:
                these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
            if these:
                return these.popleft()
            else:
                while 1: # exit on StopIteration
                    n = self.seq.next()
                    if cond(n):
                        return n
                    else:
                        those.append(n)
    

    The end user is supposed to use partition function. It takes a condition function and a sequence (just like map or filter), and returns two generators. The first generator builds a subsequence of elements for which the condition holds, the second one builds the complementary subsequence. Iterators and generators allow for lazy splitting of even long or infinite sequences.

    def partition(condition, sequence):
        cond = condition if condition else bool  # evaluate as bool if condition == None
        ss = SplitSeq(cond, sequence)
        def goods():
            while 1:
                yield ss.getNext(getGood=True)
        def bads():
            while 1:
                yield ss.getNext(getGood=False)
        return goods(), bads()
    

    I chose the test function to be the first argument to facilitate partial application in the future (similar to how map and filter have the test function as the first argument).

提交回复
热议问题