How to incrementally sample without replacement?

前端 未结 13 1622
别跟我提以往
别跟我提以往 2020-12-05 00:38

Python has my_sample = random.sample(range(100), 10) to randomly sample without replacement from [0, 100).

Suppose I have sampled n

13条回答
  •  鱼传尺愫
    2020-12-05 01:22

    This is a rewritten version of @necromancer's cool solution. Wraps it in a class to make it much easier to use correctly, and uses more dict methods to cut the lines of code.

    from random import randrange
    
    class Sampler:
        def __init__(self, n):
            self.n = n # number remaining from original range(n)
            # i is a key iff i < n and i already returned;
            # in that case, state[i] is a value to return
            # instead of i.
            self.state = dict()
    
        def get(self):
            n = self.n
            if n <= 0:
                raise ValueError("range exhausted")
            result = i = randrange(n)
            state = self.state
            # Most of the fiddling here is just to get
            # rid of state[n-1] (if it exists).  It's a
            # space optimization.
            if i == n - 1:
                if i in state:
                    result = state.pop(i)
            elif i in state:
                result = state[i]
                if n - 1 in state:
                    state[i] = state.pop(n - 1)
                else:
                    state[i] = n - 1
            elif n - 1 in state:
                state[i] = state.pop(n - 1)
            else:
                state[i] = n - 1
            self.n = n-1
            return result
    

    Here's a basic driver:

    s = Sampler(100)
    allx = [s.get() for _ in range(100)]
    assert sorted(allx) == list(range(100))
    
    from collections import Counter
    c = Counter()
    for i in range(6000):
        s = Sampler(3)
        one = tuple(s.get() for _ in range(3))
        c[one] += 1
    for k, v in sorted(c.items()):
        print(k, v)
    

    and sample output:

    (0, 1, 2) 1001
    (0, 2, 1) 991
    (1, 0, 2) 995
    (1, 2, 0) 1044
    (2, 0, 1) 950
    (2, 1, 0) 1019
    

    By eyeball, that distribution is fine (run a chi-squared test if you're skeptical). Some of the solutions here don't give each permutation with equal probability (even though they return each k-subset of n with equal probability), so are unlike random.sample() in that respect.

提交回复
热议问题