Correctly setting random seeds for repeatability

让人想犯罪 __ 提交于 2019-12-12 13:23:10

问题


The method for setting random seeds using the Fortran 90 subroutine random_seed is quite straightforward.

call random_seed( put=seed )

But I can't find any information about guidelines for setting the seed (which is absolutely necessary when you want repeatability). Folklore I've heard in the past suggested that scalar seeds should be large. E.g. 123456789 is a better seed than 123. The only support for this I can find on the web is that it is suggested for the ifort extension function ran() that one use a "large, odd integer value"

I understand this might be implementation specific and am using gfortran 4.8.5 but am also interested in ifort and (if possible) general guidelines that are independent of implementation. Here's some example code:

# for compactness, assume seed size of 4, but it will depend on 
# the implementation (e.g. for my version of gfortran 4.8.5 it is 12)

seed1(1:4) = [ 123456789, 987654321, 456789123, 7891234567 ]
seed2(1:4) =   123456789
seed3(1:4) = [         1,         2,         3,          4 ]

I'd guess that seed1 is fine, but pretty verbose if you're setting it manually (as I am) as seed length can be 12 or 33 or whatever. And I'm not even sure that it's fine because I haven't been able to find any guidelines at all about setting these seeds. I.e. these seeds should be negative for all I know, or 3 digit even numbers, etc. although I guess you'd hope the implementation would warn you about that (?).

seed2 and seed3 are obviously more convenient to set and for all I know are just as good. @Ross suggests that seed2 is in fact fine in his answer here: Random number generator (RNG/PRNG) that returns updated value of seed

So my question in summary is just: How can I correctly set the seed? Are any or all of seed1 to seed3 acceptable?


回答1:


Guidelines for setting the seed depend on the PRNG algorithm that RANDOM_NUMBER uses, but in general the more "entropy" you provide the better.

If you have a single scalar value, you can use some simple PRNG to expand that to the full seed array required by RANDOM_SEED. See e.g. the function lcg in the example code at https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gfortran/RANDOM_005fSEED.html

Current versions of GFortran have some protections against bad seeds, and it should be relatively immune towards "dumb" seeding (e.g. all values of seed(:) identical, or all values small or even zero), but for portability to other compilers following something like I suggested above might still be a good idea.




回答2:


What you provide to random_seed( put=... ) is used to determine the starting state of the generator, which (as janneb states) should have as much entropy as reasonably possible. You could construct some relatively sophisticated method of generating this entropy - grabbing from the system somehow is a good choice for this. The code janneb links is a good example.

However, I typically like to be able to reproduce a single run from a given seed if necessary. This is useful for debugging and regression testing. Then, for production runs, the code can pull a single seed 'randomly' somehow. Therefore, I want to get good RNG from a single 'seed'. In my experience, this is easily achieved by providing this single seed then letting the generator add entropy by generating numbers. Consider the following example:

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer, parameter :: n_discard = 100

   integer :: state_size, i
   integer, allocatable, dimension(:) :: state
   real(wp) :: ran, oldran

   call random_seed( size=state_size )
   write(*,*) '-- state size is: ', state_size

   allocate(state(state_size))

   ! -- Simple method of initializing seed from single scalar
   state = 20180815
   call random_seed( put=state )

   ! -- 'Prime' the generator by pulling the first few numbers
   ! -- In reality, these would be discarded but I will print them for demonstration
   ran = 0.5_wp
   do i=1,n_discard
      oldran = ran
      call random_number(ran)
      write(*,'(a,i3,2es26.18)') 'iter, ran, diff: ', i, ran, ran-oldran
   enddo

   ! Now the RNG is 'ready'
end program main

Here, I give a single seed, and then generate a random number 100 times. Typically, I would discard these initial, potentially corrupted, numbers. In this example, I'm printing them to see whether they look non-random. Running with PGI 15.10:

enet-mach5% pgfortran --version

pgfortran 15.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved.
enet-mach5% pgfortran main.f90 && ./a.out
 -- state size is:            34
iter, ran, diff:   1  8.114813341476008191E-01  3.114813341476008191E-01
iter, ran, diff:   2  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   3  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   4  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   5  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   6  2.172220012214012286E-01 -5.942593329261995905E-01
iter, ran, diff:   7  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   8  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   9  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  10  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  11  6.229626682952016381E-01  4.057406670738004095E-01
iter, ran, diff:  12  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  13  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  14  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  15  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  16  2.870333536900204763E-02 -5.942593329261995905E-01
iter, ran, diff:  17  2.870333536900204763E-02  0.000000000000000000E+00
iter, ran, diff:  18  4.344440024428024572E-01  4.057406670738004095E-01
iter, ran, diff:  19  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  20  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  21  8.401846695166028667E-01  4.057406670738004095E-01
iter, ran, diff:  22  8.401846695166028667E-01  0.000000000000000000E+00
iter, ran, diff:  23  6.516660036642036857E-01 -1.885186658523991809E-01
iter, ran, diff:  24  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  25  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  26  5.740667073800409526E-02 -5.942593329261995905E-01
iter, ran, diff:  27  5.740667073800409526E-02  0.000000000000000000E+00
iter, ran, diff:  28  2.746286719594053238E-01  2.172220012214012286E-01
iter, ran, diff:  29  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  30  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  31  6.803693390332057334E-01  4.057406670738004095E-01
iter, ran, diff:  32  6.803693390332057334E-01  0.000000000000000000E+00
iter, ran, diff:  33  3.033320073284073715E-01 -3.770373317047983619E-01
iter, ran, diff:  34  3.033320073284073715E-01  0.000000000000000000E+00
iter, ran, diff:  35  7.090726744022077810E-01  4.057406670738004095E-01
iter, ran, diff:  36  1.148133414760081905E-01 -5.942593329261995905E-01
iter, ran, diff:  37  1.148133414760081905E-01  0.000000000000000000E+00
iter, ran, diff:  38  1.435166768450102381E-01  2.870333536900204763E-02
iter, ran, diff:  39  1.435166768450102381E-01  0.000000000000000000E+00
iter, ran, diff:  40  3.607386780664114667E-01  2.172220012214012286E-01
iter, ran, diff:  41  7.664793451402118762E-01  4.057406670738004095E-01
iter, ran, diff:  42  7.664793451402118762E-01  0.000000000000000000E+00
iter, ran, diff:  43  2.009233475830143334E-01 -5.655559975571975428E-01
iter, ran, diff:  44  2.009233475830143334E-01  0.000000000000000000E+00
iter, ran, diff:  45  6.353673500258167905E-01  4.344440024428024572E-01
iter, ran, diff:  46  4.110801709961720007E-02 -5.942593329261995905E-01
iter, ran, diff:  47  4.110801709961720007E-02  0.000000000000000000E+00
iter, ran, diff:  48  8.812926866162200668E-01  8.401846695166028667E-01
iter, ran, diff:  49  8.812926866162200668E-01  0.000000000000000000E+00
iter, ran, diff:  50  9.386993573542241620E-01  5.740667073800409526E-02
iter, ran, diff:  51  3.444400244280245715E-01 -5.942593329261995905E-01
iter, ran, diff:  52  7.501806915018249811E-01  4.057406670738004095E-01
iter, ran, diff:  53  9.961060280922282573E-01  2.459253365904032762E-01
iter, ran, diff:  54  9.961060280922282573E-01  0.000000000000000000E+00
iter, ran, diff:  55  8.221603419923440015E-02 -9.138899938929938571E-01
iter, ran, diff:  56  4.879567012730348097E-01  4.057406670738004095E-01
iter, ran, diff:  57  1.109193695682364478E-01 -3.770373317047983619E-01
iter, ran, diff:  58  7.625853732324401335E-01  6.516660036642036857E-01
iter, ran, diff:  59  7.625853732324401335E-01  0.000000000000000000E+00
iter, ran, diff:  60  2.831393817822487335E-01 -4.794459914501914000E-01
iter, ran, diff:  61  6.888800488560491431E-01  4.057406670738004095E-01
iter, ran, diff:  62  7.462867195940532383E-01  5.740667073800409526E-02
iter, ran, diff:  63  8.036933903320573336E-01  5.740667073800409526E-02
iter, ran, diff:  64  8.036933903320573336E-01  0.000000000000000000E+00
iter, ran, diff:  65  1.644320683984688003E-01 -6.392613219335885333E-01
iter, ran, diff:  66  5.701727354722692098E-01  4.057406670738004095E-01
iter, ran, diff:  67  6.849860769482774003E-01  1.148133414760081905E-01
iter, ran, diff:  68  1.481334147600819051E-01 -5.368526621881954952E-01
iter, ran, diff:  69  5.538740818338823146E-01  4.057406670738004095E-01
iter, ran, diff:  70  1.605380964906970576E-01 -3.933359853431852571E-01
iter, ran, diff:  71  5.662787635644974671E-01  4.057406670738004095E-01
iter, ran, diff:  72  7.672021111475118005E-01  2.009233475830143334E-01
iter, ran, diff:  73  6.360901160331167148E-01 -1.311119951143950857E-01
iter, ran, diff:  74  6.647934514021187624E-01  2.870333536900204763E-02
iter, ran, diff:  75  9.231234697231371911E-01  2.583300183210184287E-01
iter, ran, diff:  76  3.288641367969376006E-01 -5.942593329261995905E-01
iter, ran, diff:  77  5.034149292976053403E-02 -2.785226438671770666E-01
iter, ran, diff:  78  3.249701648891658579E-01  2.746286719594053238E-01
iter, ran, diff:  79  4.110801709961720007E-01  8.611000610700614288E-02
iter, ran, diff:  80  7.268168600551945246E-01  3.157366890590225239E-01
iter, ran, diff:  81  1.325575271289949342E-01 -5.942593329261995905E-01
iter, ran, diff:  82  2.147735613282293343E-01  8.221603419923440015E-02
iter, ran, diff:  83  8.951429003614350677E-01  6.803693390332057334E-01
iter, ran, diff:  84  9.606624794444940107E-02 -7.990766524169856666E-01
iter, ran, diff:  85  8.749502748152764298E-01  7.788840268708270287E-01
iter, ran, diff:  86  6.864316089628772488E-01 -1.885186658523991809E-01
iter, ran, diff:  87  3.753116578189263919E-01 -3.111199511439508569E-01
iter, ran, diff:  88  4.614216639259325348E-01  8.611000610700614288E-02
iter, ran, diff:  89  8.632683590919612016E-01  4.018466951660286668E-01
iter, ran, diff:  90  5.110403908483931446E-01 -3.522279682435680570E-01
iter, ran, diff:  91  3.512250603649960112E-01 -1.598153304833971333E-01
iter, ran, diff:  92  2.984351275420635830E-01 -5.278993282293242828E-02
iter, ran, diff:  93  7.902858007228701354E-01  4.918506731808065524E-01
iter, ran, diff:  94  9.136098520217217356E-01  1.233240512988516002E-01
iter, ran, diff:  95  8.360105557375590024E-01 -7.759929628416273317E-02
iter, ran, diff:  96  7.623052313611680120E-01 -7.370532437639099044E-02
iter, ran, diff:  97  2.525198759725810760E-02 -7.370532437639099044E-01
iter, ran, diff:  98  9.228433278518650695E-01  8.975913402546069619E-01
iter, ran, diff:  99  1.283834133499510699E-01 -7.944599145019139996E-01
iter, ran, diff: 100  7.311534560989940701E-01  6.027700427490430002E-01

8 of the first 10 numbers generated are the repeated! This is a good illustration of why some generators require a high-entropy state in the first place. However, after 'some' time, the numbers start to look reasonable.

For my applications, 100 or so random numbers is a very small cost, so whenever I seed a generator, I prime them in this manner. I didn't observe this obviously bad behavior on ifort 16.0, gfortran 4.8, or gfortran 8.1. Non-repeating numbers is a pretty low bar, though. So I would prime for all compilers, not just ones I have observed bad behavior for.

From the comments, some compilers attempt to eliminate bad behavior by processing the input state in some way to yield the actual internal state. Gfortran uses an "xor cipher". The operation is reversed on a get.



来源:https://stackoverflow.com/questions/51893720/correctly-setting-random-seeds-for-repeatability

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!