Sample with a max

前端 未结 5 1249
日久生厌
日久生厌 2021-01-19 01:57

If I want to sample numbers to create a vector I do:

set.seed(123)
x <- sample(1:100,200, replace = TRUE)
sum(x)
# [1] 10228

What if I

5条回答
  •  余生分开走
    2021-01-19 02:34

    Here's another attempt. It doesn't use sample, but uses runif. I've added an optional "message" to the output showing the sum, which can be triggered using the showSum argument. There is also a Tolerance argument that specifies how close to the target is required.

    SampleToSum <- function(Target = 100, VecLen = 10, 
                            InRange = 1:100, Tolerance = 2, 
                            showSum = TRUE) {
      Res <- vector()
      while ( TRUE ) {
        Res <- round(diff(c(0, sort(runif(VecLen - 1)), 1)) * Target)
        if ( all(Res > 0)  & 
             all(Res >= min(InRange)) &
             all(Res <= max(InRange)) &
             abs((sum(Res) - Target)) <= Tolerance ) { break }
      }
      if (isTRUE(showSum)) cat("Total = ", sum(Res), "\n")
      Res
    }
    

    Here are some examples.

    Notice the difference between the default setting and setting Tolerance = 0

    set.seed(1)
    SampleToSum()
    # Total =  101 
    #  [1] 20  6 11 20  6  3 24  1  4  6
    SampleToSum(Tolerance=0)
    # Total =  100 
    #  [1] 19 15  4 10  1 11  7 16  4 13
    

    You can verify this behavior by using replicate. Here's the result of setting Tolerance = 0 and running the function 5 times.

    system.time(output <- replicate(5, SampleToSum(
      Target = 1376,
      VecLen = 13,
      InRange = 10:200,
      Tolerance = 0)))
    # Total =  1376 
    # Total =  1376 
    # Total =  1376 
    # Total =  1376 
    # Total =  1376 
    #    user  system elapsed 
    #   0.144   0.000   0.145
    output
    #       [,1] [,2] [,3] [,4] [,5]
    #  [1,]   29   46   11   43  171
    #  [2,]  103  161  113  195  197
    #  [3,]  145  134   91  131  147
    #  [4,]  154  173  138   19   17
    #  [5,]  197   62  173   11   87
    #  [6,]  101  142   87  173   99
    #  [7,]  168   61   97   40  121
    #  [8,]  140  121   99  135  117
    #  [9,]   46   78   31  200   79
    # [10,]  140  168  146   17   56
    # [11,]   21  146  117  182   85
    # [12,]   63   30  180  179   78
    # [13,]   69   54   93   51  122
    

    And the same for setting Tolerance = 5 and running the function 5 times.

    system.time(output <- replicate(5, SampleToSum(
      Target = 1376,
      VecLen = 13,
      InRange = 10:200,
      Tolerance = 5)))
    # Total =  1375 
    # Total =  1376 
    # Total =  1374 
    # Total =  1374 
    # Total =  1376 
    #    user  system elapsed 
    #   0.060   0.000   0.058 
    output
    #       [,1] [,2] [,3] [,4] [,5]
    #  [1,]   65  190  103   15   47
    #  [2,]  160   95   98  196  183
    #  [3,]  178  169  134   15   26
    #  [4,]   49   53  186   48   41
    #  [5,]  104   81  161  171  180
    #  [6,]   54  126   67  130  182
    #  [7,]   34  131   49  113   76
    #  [8,]   17   21  107   62   95
    #  [9,]  151  136  132  195  169
    # [10,]  194  187   91  163   22
    # [11,]   23   69   54   97   30
    # [12,]  190   14  134   43  150
    # [13,]  156  104   58  126  175
    

    Not surprisingly, setting the tolerance to 0 would make the function slower.


    Speed (Or lack thereof)

    Note that since this is a "random" process, it's hard to guess how long it would take to find the right combination of numbers. For example, using set.seed(123), I ran the following test three times in a row:

    system.time(SampleToSum(Target = 1163,
                            VecLen = 15,
                            InRange = 50:150))
    

    The first run took just over 9 seconds. The second took just over 7.5 seconds. The third took... just under 381 seconds! That's a lot of variation!

    Out of curiosity, I added a counter into the function, and the first run took 55026 attempts to arrive at a vector that satisfied all of our conditions! (I didn't bother trying for the second and third attempts.)

    It might be good to add some error or sanity checking into the function to make sure the inputs are reasonable. For example, one should not be able to enter SampleToSum(Target = 100, VecLen = 10, InRange = 15:50) since with a range of 15 to 50, there's no way to get to 100 AND have 10 values in your vector.

提交回复
热议问题