Is there an elegant and efficient way to implement weighted random choices in golang? Details on current implementation and issues inside

扶醉桌前 提交于 2020-01-01 17:08:14

问题


tl;dr: I'm looking for methods to implement a weighted random choice based on the relative magnitude of values (or functions of values) in an array in golang. Are there standard algorithms or recommendable packages for this? Is so how do they scale?

Goals

I'm trying to write 2D and 3D markov process programs in golang. A simple 2D example of such is the following: Imagine one has a lattice, and on each site labeled by index (i,j) there are n(i,j) particles. At each time step, the program chooses a site and moves one particle from this site to a random adjacent site. The probability that a site is chosen is proportional to its population n(i,j) at that time.

Current Implementation

My current algorithm, e.g. for the 2D case on an L x L lattice, is the following:

  • Convert the starting array into a slice of length L^2 by concatenating rows in order, e.g. cdfpop[i L +j]=initialpopulation[i][j].
  • Convert the 1D slice into a cdf by running a for loop over cdfpop[i]+=cdfpop[i-1].
  • Generate a two random numbers, Rsite whose range is from 1 to the largest value in the cdf (this is just the last value, cdfpop[L^2-1]), and Rhop whose range is between 1 and 4. The first random number chooses a weighted random site, and the second number a random direction to hop in
  • Use a binary search to find the leftmost index indexhop of cdfpop that is greater than Rsite. The index being hopped to is either indexhop +-1 for x direction hops or indexhop +- L for y direction hops.
  • Finally, directly change the values of cdfpop to reflect the hop process. This means subtracting one from (adding one to) all values in cdfpop between the index being hopped from (to) and the index being hopped to (from) depending on order.
  • Rinse and repeat in for loop. At the end reverse the cdf to determine the final population.

Edit: Requested Pseudocode looks like:

main(){

       //import population LxL array
       population:= import(population array)

       //turn array into slice
       for i number of rows{
          cdf[ith slice of length L] = population[ith row]
          }
       //compute cumulant array
       for i number of total sites{
          cdf[i] = cdf[i-1]+cdf[i]
          }

       for i timesteps{
          site = Randomhopsite(cdf)
          cdf = Dohop(cdf, site)
          } 

       Convertcdftoarrayandsave(cdf)
       }

Randomhopsite(cdf) site{

      //Choose random number in range of the cummulant
      randomnumber=RandomNumber(Range 1 to Max(cdf))


      site = binarysearch(cdf) // finds leftmost index such that                                           
                               // cdf[i] > random number

      return site
      }

Dohop(cdf,site) cdf{ 

      //choose random hop direction and calculate coordinate
      randomnumber=RandomNumber(Range 1 to 4)
      case{
            randomnumber=1 { finalsite= site +1}
            randomnumber=2 { finalsite= site -1}
            randomnumber=3 { finalsite= site + L}
            randomnumber=4 { finalsite= site - L}
           }

      //change the value of the cumulant distribution to reflect change
      if finalsite > site{
           for i between site and finalsite{
                        cdf[i]--
              }
      elseif finalsite < site{
           for i between finalsite and site{
                        cdf[i]++
              }
      else {error: something failed}


      return cdf
      }

This process works really well for simple problems. For this particular problem, I can run about 1 trillion steps on a 1000x 1000 lattice in about 2 minutes on average with my current set up, and I can compile population data to gifs every 10000 or so steps by spinning a go routine without a huge slowdown.

Where efficiency breaks down

The trouble comes when I want to add different processes, with real-valued coefficients, whose rates are not proportional to site population. So say I now have a hopping rate at k_hop *n(i,j) and a death rate (where I simply remove a particle) at k_death *(n(i,j))^2. There are two slow-downs in this case:

  • My cdf will be double the size (not that big of a deal). It will be real valued and created by cdfpop[i*L+j]= 4 *k_hop * pop[i][j] for i*L+j<L*L and cdfpop[i*L+j]= k_death*math. Power(pop[i][j],2) for L*L<=i*L+j<2*L*L, followed by cdfpop[i]+=cdfpop[i-1]. I would then select a random real in the range of the cdf.
  • Because of the squared n, I will have to dynamically recalculate the part of the cdf associated with the death process weights at each step. This is a MAJOR slow down, as expected. Timing for this is about 3 microseconds compared with the original algorithm which took less than a nanosecond.

This problem only gets worse if I have rates calculated as a function of populations on neighboring sites -- e.g. spontaneous particle creation depends on the product of populations on neighboring sites. While I hope to work out a way to just modify the cdf without recalculation by thinking really hard, as I try to simulate problems of increasing complexity, I can't help but wonder if there is a universal solution with reasonable efficiency I'm missing that doesn't require specialized code for each random process.

Thanks for reading!

来源:https://stackoverflow.com/questions/52442838/is-there-an-elegant-and-efficient-way-to-implement-weighted-random-choices-in-go

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!