I have a class with both an __iter__ and a __len__ methods. The latter uses the former to count all elements.
It works like the following:
class A: def __iter__(self): print("iter") for _ in range(5): yield "something" def __len__(self): print("len") n = 0 for _ in self: n += 1 return n Now if we take e.g. the length of an instance it prints len and iter, as expected:
>>> len(A()) len iter 5 But if we call list() it calls both __iter__ and __len__:
>>> list(A()) len iter iter ['something', 'something', 'something', 'something', 'something'] It works as expected if we make a generator expression:
>>> list(x for x in A()) iter ['something', 'something', 'something', 'something', 'something'] I would assume list(A()) and list(x for x in A()) to work the same but they don’t.
Note that it appears to first call __iter__, then __len__, then loop over the iterator:
class B: def __iter__(self): print("iter") def gen(): print("gen") yield "something" return gen() def __len__(self): print("len") return 1 print(list(B())) Output:
iter len gen ['something'] How can I get list() not to call __len__ so that my instance’s iterator is not consumed twice? I could define e.g. a length or size method and one would then call A().size() but that’s less pythonic.
I tried to compute the length in __iter__ and cache it so that subsequent calls to __len__ don’t need to iter again but list() calls __len__ without starting to iterate so it doesn’t work.
Note that in my case I work on very large data collections so caching all items is not an option.