Random sample from a very long iterable, in python

后端 未结 5 1816
南笙
南笙 2020-12-11 07:58

I have a long python generator that I want to \"thin out\" by randomly selecting a subset of values. Unfortunately, random.sample() will not work with arbitrary

5条回答
  •  半阙折子戏
    2020-12-11 08:21

    One possible method is to build a generator around the iterator to select random elements:

    def random_wrap(iterator, threshold):
        for item in iterator:
            if random.random() < threshold:
                yield item
    

    This method would be useful when you don't know the length and the possible size of the iterator would be prohibitive. Note that guaranteeing the size of the final list is problematic.

    Some sample runs:

    >>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
    ['f', 'h', 'i', 'r', 'w', 'x']
    
    >>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
    ['j', 'r', 's', 'u', 'x']
    
    >>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
    ['c', 'e', 'h', 'n', 'o', 'r', 'z']
    
    >>> list(random_wrap(iter('abcdefghijklmnopqrstuvwxyz'), 0.25))
    ['b', 'c', 'e', 'h', 'j', 'p', 'r', 's', 'u', 'v', 'x']
    

提交回复
热议问题