zip_longest without fillvalue

偶尔善良 提交于 2019-11-26 22:01:41

问题


I am searching for a middle ground between Python's zip and zip_longest functions (from the itertools module), that exhausts all given iterators, but does not fill in anything. So, for example, it should transpose tuples like so:

(11, 12, 13    ),        (11, 21, 31, 41),
(21, 22, 23, 24),  -->   (12, 22, 32, 42),
(31, 32        ),        (13, 23,     43),
(41, 42, 43, 44),        (    24,     44)

(Spaces added for nicer graphical alignment.)

I managed to compose a crude a solution by cleaning out the fillvalues after zip_longest.

def zip_discard(*iterables, sentinel = object()):
    return map(
            partial(filter, partial(is_not, sentinel)), 
            zip_longest(*iterables, fillvalue=sentinel))

Is there a way to do this without introducing the sentinels to begin with? Can this be improved using yield? Which approach seems most efficient?


回答1:


Both zip and zip_longest were designed to always generate tuples of equal length, you can define your own generator that doesn't care about the len with something like this:

def _one_pass(iters):
    for it in iters:
        try:
            yield next(it)
        except StopIteration:
            pass #of some of them are already exhausted then ignore it.

def zip_varlen(*iterables):
    iters = [iter(it) for it in iterables]
    while True: #broken when an empty tuple is given by _one_pass
        val = tuple(_one_pass(iters))
        if val:
            yield val
        else:
            break

If the data being zipped together is fairly large then skipping the exhausted iterators every time can be expensive, it may be more efficient to remove the finished iterators from iters in the _one_pass function like this:

def _one_pass(iters):
    i = 0
    while i<len(iters):
        try:
            yield next(iters[i])
        except StopIteration:
            del iters[i]
        else:
            i+=1

both of these versions would remove the need to create intermediate results or use temporary filler values.




回答2:


You approach is good. I think using a sentinel is elegant. I might be considered a bit more pythonic to use a nested generator expression:

def zip_discard_gen(*iterables, sentinel=object()):
    return ((entry for entry in iterable if entry is not sentinel)
            for iterable in zip_longest(*iterables, fillvalue=sentinel))

This needs fewer imports because there is no need for partial() or ne().

It is a bit faster too:

data = [(11, 12, 13    ),
        (21, 22, 23, 24),
        (31, 32        ),
        (41, 42, 43, 44)]

%timeit [list(x) for x in zip_discard(*data)]  
10000 loops, best of 3: 17.5 µs per loop

%timeit [list(x) for x in zip_discard_gen(*data)]
100000 loops, best of 3: 14.2 µs per loop

EDIT

A list comprehension version is a bit faster:

def zip_discard_compr(*iterables, sentinel=object()):
    return [[entry for entry in iterable if entry is not sentinel]
            for iterable in zip_longest(*iterables, fillvalue=sentinel)]

Timing:

%timeit zip_discard_compr(*data)
100000 loops, best of 3: 6.73 µs per loop

A Python 2 version:

from itertools import izip_longest

SENTINEL = object()

def zip_discard_compr(*iterables):
    sentinel = SENTINEL
    return [[entry for entry in iterable if entry is not sentinel]
            for iterable in izip_longest(*iterables, fillvalue=sentinel)]

Timings

This version returns the same data structure as zip_varlen by Tadhg McDonald-Jensen:

def zip_discard_gen(*iterables, sentinel=object()):
    return (tuple([entry for entry in iterable if entry is not sentinel])
            for iterable in zip_longest(*iterables, fillvalue=sentinel))

It is about twice as fast:

%timeit list(zip_discard_gen(*data))
100000 loops, best of 3: 9.37 µs per loop

%timeit list(zip_varlen(*data))
10000 loops, best of 3: 18 µs per loop


来源:https://stackoverflow.com/questions/38054593/zip-longest-without-fillvalue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!