I am trying to learn how to use itertools.groupby in Python and I wanted to find the size of each group of characters. At first I tried to see if I could find the length of
The reason that your first approach doesn't work is that the the groups get "consumed" when you create that list with
list(groupby("cccccaaaaatttttsssssss"))
To quote from the groupby docs
The returned group is itself an iterator that shares the underlying iterable with
groupby(). Because the source is shared, when thegroupby()object is advanced, the previous group is no longer visible.
Let's break it down into stages.
from itertools import groupby
a = list(groupby("cccccaaaaatttttsssssss"))
print(a)
b = a[0][1]
print(b)
print('So far, so good')
print(list(b))
print('What?!')
output
[('c', ), ('a', ), ('t', ), ('s', )]
So far, so good
[]
What?!
Our itertools._grouper object at 0xb715104c is empty because it shares its contents with the "parent" iterator returned by groupby, and those items are now gone because that first list call iterated over the parent.
It's really no different to what happens if you try to iterate twice over any iterator, eg a simple generator expression.
g = (c for c in 'python')
print(list(g))
print(list(g))
output
['p', 'y', 't', 'h', 'o', 'n']
[]
BTW, here's another way to get the length of a groupby group if you don't actually need its contents; it's a little cheaper (and uses less RAM) than building a list just to find its length.
from itertools import groupby
for k, g in groupby("cccccaaaaatttttsssssss"):
print(k, sum(1 for _ in g))
output
c 5
a 5
t 5
s 7