Algorithm for sampling without replacement?

前端 未结 6 1637
情歌与酒
情歌与酒 2020-12-02 13:56

I am trying to test the likelihood that a particular clustering of data has occurred by chance. A robust way to do this is Monte Carlo simulation, in which the associations

6条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-02 14:38

    Inspired by @John D. Cook's answer, I wrote an implementation in Nim. At first I had difficulties understanding how it works, so I commented extensively also including an example. Maybe it helps to understand the idea. Also, I have changed the variable names slightly.

    iterator uniqueRandomValuesBelow*(N, M: int) =
      ## Returns a total of M unique random values i with 0 <= i < N
      ## These indices can be used to construct e.g. a random sample without replacement
      assert(M <= N)
    
      var t = 0 # total input records dealt with
      var m = 0 # number of items selected so far
    
      while (m < M):
        let u = random(1.0) # call a uniform(0,1) random number generator
    
        # meaning of the following terms:
        # (N - t) is the total number of remaining draws left (initially just N)
        # (M - m) is the number how many of these remaining draw must be positive (initially just M)
        # => Probability for next draw = (M-m) / (N-t)
        #    i.e.: (required positive draws left) / (total draw left)
        #
        # This is implemented by the inequality expression below:
        # - the larger (M-m), the larger the probability of a positive draw
        # - for (N-t) == (M-m), the term on the left is always smaller => we will draw 100%
        # - for (N-t) >> (M-m), we must get a very small u
        #
        # example: (N-t) = 7, (M-m) = 5
        # => we draw the next with prob 5/7
        #    lets assume the draw fails
        # => t += 1 => (N-t) = 6
        # => we draw the next with prob 5/6
        #    lets assume the draw succeeds
        # => t += 1, m += 1 => (N-t) = 5, (M-m) = 4
        # => we draw the next with prob 4/5
        #    lets assume the draw fails
        # => t += 1 => (N-t) = 4
        # => we draw the next with prob 4/4, i.e.,
        #    we will draw with certainty from now on
        #    (in the next steps we get prob 3/3, 2/2, ...)
        if (N - t)*u >= (M - m).toFloat: # this is essentially a draw with P = (M-m) / (N-t)
          # no draw -- happens mainly for (N-t) >> (M-m) and/or high u
          t += 1
        else:
          # draw t -- happens when (M-m) gets large and/or low u
          yield t # this is where we output an index, can be used to sample
          t += 1
          m += 1
    
    # example use
    for i in uniqueRandomValuesBelow(100, 5):
      echo i
    

提交回复
热议问题