问题
Here are two functions that split iterable items to sub-lists. I believe that this type of task is programmed many times. I use them to parse log files that consist of repr
lines like ('result', 'case', 123, 4.56) and ('dump', ..) and so on.
I would like to change these so that they will yield iterators rather than lists. Because the list may grow pretty large, but I may be able to decide to take it or skip it based on first few items. Also, if the iter version is available I would like to nest them, but with these list versions that would waste some memory by duplicating parts.
But deriving multiple generators from an iterable source wan't easy for me, so I ask for help. If possible, I wish to avoid introducing new classes.
Also, if you know a better title for this question, please tell me.
Thank you!
def cleave_by_mark (stream, key_fn, end_with_mark=False):
'''[f f t][t][f f] (true) [f f][t][t f f](false)'''
buf = []
for item in stream:
if key_fn(item):
if end_with_mark: buf.append(item)
if buf: yield buf
buf = []
if end_with_mark: continue
buf.append(item)
if buf: yield buf
def cleave_by_change (stream, key_fn):
'''[1 1 1][2 2][3][2 2 2 2]'''
prev = None
buf = []
for item in stream:
iden = key_fn(item)
if prev is None: prev = iden
if prev != iden:
yield buf
buf = []
prev = iden
buf.append(item)
if buf: yield buf
edit: my own answer
Thanks to everyone's answer, I could write what I asked for! Of course, as for the "cleave_for_change" function I could also use itertools.groupby
.
def cleave_by_mark (stream, key_fn, end_with_mark=False):
hand = []
def gen ():
key = key_fn(hand[0])
yield hand.pop(0)
while 1:
if end_with_mark and key: break
hand.append(stream.next())
key = key_fn(hand[0])
if (not end_with_mark) and key: break
yield hand.pop(0)
while 1:
# allow StopIteration in the main loop
if not hand: hand.append(stream.next())
yield gen()
for cl in cleave_by_mark (iter((1,0,0,1,1,0)), lambda x:x):
print list(cl), # start with 1
# -> [1, 0, 0] [1] [1, 0]
for cl in cleave_by_mark (iter((0,1,0,0,1,1,0)), lambda x:x):
print list(cl),
# -> [0] [1, 0, 0] [1] [1, 0]
for cl in cleave_by_mark (iter((1,0,0,1,1,0)), lambda x:x, True):
print list(cl), # end with 1
# -> [1] [0, 0, 1] [1] [0]
for cl in cleave_by_mark (iter((0,1,0,0,1,1,0)), lambda x:x, True):
print list(cl),
# -> [0, 1] [0, 0, 1] [1] [0]
/
def cleave_by_change (stream, key_fn):
'''[1 1 1][2 2][3][2 2 2 2]'''
hand = []
def gen ():
headkey = key_fn(hand[0])
yield hand.pop(0)
while 1:
hand.append(stream.next())
key = key_fn(hand[0])
if key != headkey: break
yield hand.pop(0)
while 1:
# allow StopIteration in the main loop
if not hand: hand.append(stream.next())
yield gen()
for cl in cleave_by_change (iter((1,1,1,2,2,2,3,2)), lambda x:x):
print list(cl),
# -> [1, 1, 1] [2, 2, 2] [3] [2]
CAUTION: If anyone's going to use these, be sure to exhaust the generators at every level, as Andrew pointed out. Because otherwise the outer generator-yielding loop will restart right where the inner generator left instead of where the next "block" begins.
stream = itertools.product('abc','1234', 'ABCD')
for a in iters.cleave_by_change(stream, lambda x:x[0]):
for b in iters.cleave_by_change(a, lambda x:x[1]):
print b.next()
for sink in b: pass
for sink in a: pass
('a', '1', 'A')
('b', '1', 'A')
('c', '1', 'A')
回答1:
adam's answer is good. this is just in case you're curious how to do it by hand:
def cleave_by_change(stream):
def generator():
head = stream[0]
while stream and stream[0] == head:
yield stream.pop(0)
while stream:
yield generator()
for g in cleave_by_change([1,1,1,2,2,3,2,2,2,2]):
print list(g)
which gives:
[1, 1, 1]
[2, 2]
[3]
[2, 2, 2, 2]
(previous version required a hack or, in python 3, nonlocal
because i assigned to stream
inside generator()
which made (a second variable also called) stream
local to generator()
by default - credit to gnibbler in the comments).
note that this approach is dangerous - if you don't "consume" the generators that are returned then you will get more and more, because stream is not getting any smaller.
回答2:
For your second function, you can use itertools.groupby
to accomplish this fairly easily.
Here's an alternate implementation that now yields generators instead of lists:
from itertools import groupby
def cleave_by_change2(stream, key_fn):
return (group for key, group in groupby(stream, key_fn))
Here is it in action (with liberal printing along the way, so you can see what's going on):
main_gen = cleave_by_change2([1,1,1,2,2,3,2,2,2,2], lambda x: x)
print main_gen
for sub_gen in main_gen:
print sub_gen
print list(sub_gen)
Which yields:
<generator object <genexpr> at 0x7f17c7727e60>
<itertools._grouper object at 0x7f17c77247d0>
[1, 1, 1]
<itertools._grouper object at 0x7f17c7724850>
[2, 2]
<itertools._grouper object at 0x7f17c77247d0>
[3]
<itertools._grouper object at 0x7f17c7724850>
[2, 2, 2, 2]
回答3:
I implemented what I described:
If what you want is to reject a list before it is returned or even build, by providing a filter argument to the functions that would be possible. When this filter rejects a list prefix the function would toss out the current output list and skip appending to the output list until the next group is started.
def cleave_by_change (stream, key_fn, filter=None):
'''[1 1 1][2 2][3][2 2 2 2]'''
S = object()
skip = False
prev = S
buf = []
for item in stream:
iden = key_fn(item)
if prev is S:
prev = iden
if prev != iden:
if not skip:
yield buf
buf = []
prev = iden
skip = False
if not skip and filter is not None:
skip = not filter(item)
if not skip:
buf.append(item)
if buf: yield buf
print list(cleave_by_change([1, 1, 1, 2, 2, 3, 2, 2, 2, 2], lambda a: a, lambda i: i != 2))
# => [[1, 1, 1], [3]]
print list(cleave_by_change([1, 1, 1, 2, 2, 3, 2, 2, 2, 2], lambda a: a, lambda i: i == 2))
# => [[2, 2], [2, 2, 2, 2]]
来源:https://stackoverflow.com/questions/10748331/can-yield-produce-multiple-consecutive-generators