How to split a sequence according to a predicate?

后端 未结 6 1244
既然无缘
既然无缘 2020-12-20 10:52

I very often run into the need to split a sequence into the two subsequences of elements that satisfy and don\'t satisfy a given predicate (preserving the original relative

相关标签:
6条回答
  • 2020-12-20 11:29

    In built-in module more_itertools there is a function called partition, which does exactly what topicstarter asked for.

    from more_itertools import partition
    
    numbers = [1, 2, 3, 4, 5, 6, 7]
    predicate_false, predicate_true = partition(lambda x: x % 2 == 0, numbers)
    
    print(list(predicate_false), list(predicate_true))
    

    The result is [1, 3, 5, 7] [2, 4, 6].

    0 讨论(0)
  • 2020-12-20 11:35

    Partitioning is one of those itertools recipes that does just that. It uses tee() to make sure it's iterating the collection in one pass despite the multiple iterators, the builtin filter() function to grab items that satisfies the predicate as well as filterfalse() to get the opposite effect of the filter. This is as close as you're going to get at a standard/builtin method.

    def partition(pred, iterable):
        'Use a predicate to partition entries into false entries and true entries'
        # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
        t1, t2 = tee(iterable)
        return filterfalse(pred, t1), filter(pred, t2)
    
    0 讨论(0)
  • 2020-12-20 11:36

    A slight variation of one of the OP's implementations and another commenter's implementation above using groupby:

    groups = defaultdict(list, { k : list(ks) for k, ks in groupby(items, f) })
    
    groups[True] == the matching items, or [] if none returned True
    groups[False] == the non-matching items, or [] if none returned False
    

    Sadly, as you point out, groupby requires that the items be sorted by the predicate first, so if that's not guaranteed, you need this:

    groups = defaultdict(list, { k : list(ks) for k, ks in groupby(sorted(items, key=f), f) })
    

    Quite a mouthful, but it is a single expression that partitions a list by a predicate using only built-in functions.

    I don't think you can just use sorted without the key parameter, because groupby creates a new group when it hits a new value from the key function. So sorted will only work if the items sort naturally by the predicate provided.

    0 讨论(0)
  • 2020-12-20 11:36

    As a slightly more general solution to partitioning, consider grouping. Consider the following function, inspired by clojure's group-by function.

    You give it a collection of items to group, and a function that will be used to group them. Here's the code:

    def group_by(seq, f):
    
        groupings = {}
    
        for item in seq:
            res = f(item)
            if res in groupings:
                groupings[res].append(item)
            else:
                groupings[res] = [item]
    
        return groupings
    

    For the OP's original case:

    y = group_by(range(14), lambda i: int(i) % 3 == 2)
    {False: [0, 1, 3, 4, 6, 7, 9, 10, 12, 13], True: [2, 5, 8, 11]}
    

    A more general case of grouping elements in a sequence by string length:

    x = group_by(["x","xx","yy","zzz","z","7654321"], len)
    {1: ['x', 'z'], 2: ['xx', 'yy'], 3: ['zzz'], 7: ['7654321']}
    

    This can be extended to many cases, and is a core functionality of functional languages. It works great with the dynamically typed python, as the keys in the resulting map can be any type. Enjoy!

    0 讨论(0)
  • 2020-12-20 11:48

    I know you said you didn't want to write your own function, but I can't imagine why. Your solutions involve writing your own code, you just aren't modularizing them into functions.

    This does exactly what you want, is understandable, and only evaluates the predicate once per element:

    def splitter(data, pred):
        yes, no = [], []
        for d in data:
            if pred(d):
                yes.append(d)
            else:
                no.append(d)
        return [yes, no]
    

    If you want it to be more compact (for some reason):

    def splitter(data, pred):
        yes, no = [], []
        for d in data:
            (yes if pred(d) else no).append(d)
        return [yes, no]
    
    0 讨论(0)
  • 2020-12-20 11:52

    If you don't care about efficiency, I think groupby (or any "putting data into n bins" functions) has some nice correspondence,

    by_bins_iter = itertools.groupby(sorted(data, key=pred), key=pred)
    by_bins = dict((k, tuple(v)) for k, v in by_bins_iter)
    

    You can then get to your solution by,

    return by_bins.get(True, ()), by_bins.get(False, ())
    
    0 讨论(0)
提交回复
热议问题