How to incrementally sample without replacement?

前端 未结 13 1629
别跟我提以往
别跟我提以往 2020-12-05 00:38

Python has my_sample = random.sample(range(100), 10) to randomly sample without replacement from [0, 100).

Suppose I have sampled n

13条回答
  •  时光说笑
    2020-12-05 01:02

    Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.

    Aaaaaand for completeness sake: This is the concept of necromancer’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid, before we generate numbers.

    • This is time O(f+k) and memory O(f+k). Obviously this is the fastest thing possible without requirements towards the format of forbid (sorted/set). I think this makes this a winner in some way ^^.
    • If forbid is a set, the repeated guessing method is faster with O(k⋅n/(n-(f+k))), which is very close to O(k) for f+k not very close to n.
    • If forbid is sorted, my ridiculous algorithm is faster with:
      O(k⋅(log(f+k)+log²(n/(n-(f+k))))
    import random
    def sample_gen(n, forbid):
        state = dict()
        track = dict()
        for (i, o) in enumerate(forbid):
            x = track.get(o, o)
            t = state.get(n-i-1, n-i-1)
            state[x] = t
            track[t] = x
            state.pop(n-i-1, None)
            track.pop(o, None)
        del track
        for remaining in xrange(n-len(forbid), 0, -1):
            i = random.randrange(remaining)
            yield state.get(i, i)
            state[i] = state.get(remaining - 1, remaining - 1)
            state.pop(remaining - 1, None)
    

    usage:

    gen = sample_gen(10, [1, 2, 4, 8])
    print gen.next()
    print gen.next()
    print gen.next()
    print gen.next()
    

提交回复
热议问题