Tensorflow: Confusion regarding the adam optimizer

后端 未结 2 1597
栀梦
栀梦 2020-12-24 08:26

I\'m confused regarding as to how the adam optimizer actually works in tensorflow.

The way I read the docs, it says that the learning rate is changed every gradient

2条回答
  •  攒了一身酷
    2020-12-24 08:51

    RMS_PROP and ADAM both have adaptive learning rates .

    The basic RMS_PROP

    cache = decay_rate * cache + (1 - decay_rate) * dx**2
    x += - learning_rate * dx / (np.sqrt(cache) + eps)
    

    You can see originally this has two parameters decay_rate & eps

    Then we can add a momentum to make our gradient more stable Then we can write

    cache = decay_rate * cache + (1 - decay_rate) * dx**2
    **m = beta1*m + (1-beta1)*dx**  [beta1 =momentum parameter in the doc ]
    x += - learning_rate * dx / (np.sqrt(cache) + eps)
    

    Now you can see here if we keep beta1 = o Then it's rms_prop without the momentum .

    Then Basics of ADAM

    In cs-231 Andrej Karpathy has initially described the adam like this

    Adam is a recently proposed update that looks a bit like RMSProp with momentum

    So yes ! Then what makes this difference from the rms_prop with momentum ?

    m = beta1*m + (1-beta1)*dx
    v = beta2*v + (1-beta2)*(dx**2)
    **x += - learning_rate * m / (np.sqrt(v) + eps)**
    

    He again mentioned in the updating equation m , v are more smooth .

    So the difference from the rms_prop is the update is less noisy .

    What makes this noise ?

    Well in the initialization procedure we will initialize m and v as zero .

    m=v=0

    In order to reduce this initializing effect it's always to have some warm-up . So then equation is like

    m = beta1*m + (1-beta1)*dx          beta1 -o.9 beta2-0.999
    **mt = m / (1-beta1**t)**
    v = beta2*v + (1-beta2)*(dx**2)
    **vt = v / (1-beta2**t)**
    x += - learning_rate * mt / (np.sqrt(vt) + eps)
    

    Now we run this for few iterations . Clearly pay attention to the bold lines , you can see when t is increasing (iteration number) following thing happen to the mt ,

    mt = m

提交回复
热议问题