Are these models equivalent?

前端 未结 1 2059
無奈伤痛
無奈伤痛 2020-12-09 23:39

Main question: I define the same model in two different ways. Why do I get different results? They seem to be the same model.

相关标签:
1条回答
  • 2020-12-10 00:10

    The problem's rooted in the expected vs. actual behavior of model definition and randomness. To see what's going on, we must understand how "RNG" works:

    • A "random number generator" (RNG) is actually a function that produces numbers such that they map onto a probability distribution 'in the long run'
    • When the RNG function, e.g. RNG() is called, it returns a "random" value and increments its internal counter by 1. Call this counter n - then: random_value = RNG(n)
    • When you set a SEED, you set n according to the value of that seed (but not to that seed); we can represent this difference via + c in the counter
    • c will be a constant produced by a non-linear, but deterministic, function of the seed: f(seed)
    import numpy as np
    
    np.random.seed(4)         # internal counter = 0 + c
    print(np.random.random()) # internal counter = 1 + c
    print(np.random.random()) # internal counter = 2 + c
    print(np.random.random()) # internal counter = 3 + c
    
    np.random.seed(4)         # internal counter = 0 + c
    print(np.random.random()) # internal counter = 1 + c
    print(np.random.random()) # internal counter = 2 + c
    print(np.random.random()) # internal counter = 3 + c
    
    0.9670298390136767
    0.5472322491757223
    0.9726843599648843
    
    0.9670298390136767
    0.5472322491757223
    0.9726843599648843
    

    Suppose model1 has 100 weights, and you set a seed (n = 0 + c). After model1 is built, your counter is at 100 + c. If you don't reset the seed, even if you build model2 with the exact same code, the models will differ - as model2's weights are initialized per n from 100 + c to 200 + c.


    Additional info:

    There are three seeds to ensure better randomness:

    import numpy as np
    np.random.seed(1)         # for Numpy ops
    import random 
    random.seed(2)            # for Python ops
    import tensorflow as tf
    tf.set_random_seed(3)     # for tensorfow ops - e.g. Dropout masks
    

    This'll give pretty good reproducibility, but not perfect if you're using a GPU - due to parallelism of operations; this video explains it well. For even better reproducibility, set your PYHTONHASHSEED - that and other info in the official Keras FAQ.

    "Perfect" reproducibility is rather redundant, as your results should agree within .1% majority of the time - but if you really need it, likely the only way currently is to switch to CPU and stop using CUDA - but that'll slow down training tremendously (by x10+).


    Sources of randomness:

    • Weight initializations (every default Keras initializer uses randomness)
    • Noise layers (Dropout, GaussianNoise, etc)
    • Hashing for hash-based operations, e.g. item order in a set or dict
    • GPU parallelism (see linked video)

    Model randomness demo:

    import numpy as np
    np.random.seed(4)
    
    model1_init_weights = [np.random.random(), np.random.random(), np.random.random()]
    model2_init_weights = [np.random.random(), np.random.random(), np.random.random()]
    print("model1_init_weights:", model1_init_weights)
    print("model2_init_weights:", model2_init_weights)
    
    model1_init_weights: [0.9670298390136767, 0.5472322491757223, 0.9726843599648843]
    model2_init_weights: [0.7148159936743647, 0.6977288245972708, 0.21608949558037638]
    

    Restart kernel. Now run this:

    import numpy as np
    np.random.seed(4)
    
    model2_init_weights = [np.random.random(), np.random.random(), np.random.random()]
    model1_init_weights = [np.random.random(), np.random.random(), np.random.random()]
    print("model1_init_weights:", model1_init_weights)
    print("model2_init_weights:", model2_init_weights)
    
    model1_init_weights: [0.7148159936743647, 0.6977288245972708, 0.21608949558037638]
    model2_init_weights: [0.9670298390136767, 0.5472322491757223, 0.9726843599648843]
    

    Thus, flipping the order of model1 and model2 in your code also flips the losses. This is because the seed does not reset itself between the two models' definitions, so your weight initializations are totally different.

    If you wish them to be the same, reset the seed before defining EACH MODEL, and before FITTING each model - and use a handy function like below. But your best bet is to restart the kernel and work in separate .py files.

    def reset_seeds():
        np.random.seed(1)
        random.seed(2)
        tf.set_random_seed(3)
        print("RANDOM SEEDS RESET")
    
    0 讨论(0)
提交回复
热议问题