How to share numpy random state of a parent process with child processes?

后端 未结 3 2043
旧巷少年郎
旧巷少年郎 2020-11-27 08:20

I set numpy random seed at the beginning of my program. During the program execution I run a function multiple times using multiprocessing.Process. The function

相关标签:
3条回答
  • 2020-11-27 08:22

    You need to update the state of the Manager each time you get a random number:

    import numpy as np
    from multiprocessing import Manager, Pool, Lock
    
    lock = Lock()
    mng = Manager()
    state = mng.list(np.random.get_state())
    
    def get_random(_):
        with lock:
            np.random.set_state(state)
            result = np.random.uniform()
            state[:] = np.random.get_state()
            return result
    
    np.random.seed(1)
    result1 = Pool(10).map(get_random, range(10))
    
    # Compare with non-parallel version
    np.random.seed(1)
    result2 = [np.random.uniform() for _ in range(10)]
    
    # result of Pool.map may be in different order
    assert sorted(result1) == sorted(result2)
    
    0 讨论(0)
  • 2020-11-27 08:22

    Fortunately, according to the documentation, you can access the complete state of the numpy random number generator using get_state and set it again using set_state. The generator itself uses the Mersenne Twister algorithm (see the RandomState part of the documentation).

    This means you can do anything you want, though whether it will be good and efficient is a different question entirely. As abarnert points out, no matter how you share the parent's state—this could use Alex Hall's method, which looks correct—your sequencing within each child will depend on the order in which each child draws random numbers from the MT state machine.

    It would perhaps be better to build a large pool of pseudo-random numbers for each child, saving the start state of the entire generator once at the start. Then each child can draw a PRNG value until its particular pool runs out, after which you have the child coordinate with the parent for the next pool. The parent enumerates which children got which "pool'th" number. The code would look something like this (note that it would make sense to turn this into an infinite generator with a next method):

    class PrngPool(object):
        def __init__(self, child_id, shared_state):
            self._child_id = child_id
            self._shared_state = shared_state
            self._numbers = []
    
        def next_number(self):
            if not self.numbers:
                self._refill()
            return self.numbers.pop(0)  # XXX inefficient
    
        def _refill(self):
            # ... something like Alex Hall's lock/gen/unlock,
            # but fill up self._numbers with the next 1000 (or
            # however many) numbers after adding our ID and
            # the index "n" of which n-through-n+999 numbers
            # we took here.  Any other child also doing a
            # _refill will wait for the lock and get an updated
            # index n -- eg, if we got numbers 3000 to 3999,
            # the next child will get numbers 4000 to 4999.
    

    This way there is not nearly as much communication through Manager items (MT state and our ID-and-index added to the "used" list). At the end of the process, it's possible to see which children used which PRNG values, and to re-generate those PRNG values if needed (remember to record the full MT internal start state!).

    Edit to add: The way to think about this is like this: the MT is not actually random. It is periodic with a very long period. When you use any such RNG, your seed is simply a starting point within the period. To get repeatability you must use non-random numbers, such as a set from a book. There is a (virtual) book with every number that comes out of the MT generator. We're going to write down which page(s) of this book we used for each group of computations, so that we can re-open the book to those pages later and re-do the same computations.

    0 讨论(0)
  • 2020-11-27 08:41

    Even if you manage to get this working, I don’t think it will do what you want. As soon as you have multiple processes pulling from the same random state in parallel, it’s no longer deterministic which order they each get to the state, meaning your runs won’t actually be repeatable. There are probably ways around that, but it seems like a nontrivial problem.

    Meanwhile, there is a solution that should solve both the problem you want and the nondeterminism problem:

    Before spawning a child process, ask the RNG for a random number, and pass it to the child. The child can then seed with that number. Each child will then have a different random sequence from other children, but the same random sequence that the same child got if you rerun the entire app with a fixed seed.

    If your main process does any other RNG work that could depend non-deterministically on the execution of the children, you'll need to pre-generate the seeds for all of your child processes, in order, before pulling any other random numbers.


    As senderle pointed out in a comment: If you don't need multiple distinct runs, but just one fixed run, you don't even really need to pull a seed from your seeded RNG; just use a counter starting at 1 and increment it for each new process, and use that as a seed. I don't know if that's acceptable, but if it is, it's hard to get simpler than that.

    As Amir pointed out in a comment: a better way is to draw a random integer every time you spawn a new process and pass that random integer to the new process to set the numpy's random seed with that integer. This integer can indeed come from np.random.randint().

    0 讨论(0)
提交回复
热议问题