I have a long python generator that I want to \"thin out\" by randomly selecting a subset of values. Unfortunately, random.sample() will not work with arbitrary
One possible method is to build a generator around the iterator to select random elements:
def random_wrap(iterator, threshold):
for item in iterator:
if random.random() < threshold:
yield item
This method would be useful when you don't know the length and the possible size of the iterator would be prohibitive. Note that guaranteeing the size of the final list is problematic.
Some sample runs:
>>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
['f', 'h', 'i', 'r', 'w', 'x']
>>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
['j', 'r', 's', 'u', 'x']
>>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
['c', 'e', 'h', 'n', 'o', 'r', 'z']
>>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
['b', 'c', 'e', 'h', 'j', 'p', 'r', 's', 'u', 'v', 'x']