TimeDistributed(Dense) vs Dense in Keras - Same number of parameters

后端 未结 2 988
傲寒
傲寒 2020-12-02 14:41

I\'m building a model that converts a string to another string using recurrent layers (GRUs). I have tried both a Dense and a TimeDistributed(Dense) layer as the last-but-on

相关标签:
2条回答
  • 2020-12-02 15:09

    Here is a piece of code that verifies TimeDistirbuted(Dense(X)) is identical to Dense(X):

    import numpy as np 
    from keras.layers import Dense, TimeDistributed
    import tensorflow as tf
    
    X = np.array([ [[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9],
                    [10, 11, 12]
                   ],
                   [[3, 1, 7],
                    [8, 2, 5],
                    [11, 10, 4],
                    [9, 6, 12]
                   ]
                  ]).astype(np.float32)
    print(X.shape)
    

    (2, 4, 3)

    dense_weights = np.array([[0.1, 0.2, 0.3, 0.4, 0.5],
                              [0.2, 0.7, 0.9, 0.1, 0.2],
                              [0.1, 0.8, 0.6, 0.2, 0.4]])
    bias = np.array([0.1, 0.3, 0.7, 0.8, 0.4])
    print(dense_weights.shape)
    

    (3, 5)

    dense = Dense(input_dim=3, units=5, weights=[dense_weights, bias])
    input_tensor = tf.Variable(X, name='inputX')
    output_tensor1 = dense(input_tensor)
    output_tensor2 = TimeDistributed(dense)(input_tensor)
    print(output_tensor1.shape)
    print(output_tensor2.shape)
    

    (2, 4, 5)

    (2, ?, 5)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        output1 = sess.run(output_tensor1)
        output2 = sess.run(output_tensor2)
    
    print(output1 - output2)
    

    And the difference is:

    [[[0. 0. 0. 0. 0.]
      [0. 0. 0. 0. 0.]
      [0. 0. 0. 0. 0.]
      [0. 0. 0. 0. 0.]]
    
     [[0. 0. 0. 0. 0.]
      [0. 0. 0. 0. 0.]
      [0. 0. 0. 0. 0.]
      [0. 0. 0. 0. 0.]]]
    
    0 讨论(0)
  • 2020-12-02 15:19

    TimeDistributedDense applies a same dense to every time step during GRU/LSTM Cell unrolling. So the error function will be between predicted label sequence and the actual label sequence. (Which is normally the requirement for sequence to sequence labeling problems).

    However, with return_sequences=False, Dense layer is applied only once at the last cell. This is normally the case when RNNs are used for classification problem. If return_sequences=True then Dense layer is applied to every timestep just like TimeDistributedDense.

    So for as per your models both are same, but if you change your second model to return_sequences=False, then Dense will be applied only at the last cell. Try changing it and the model will throw as error because then the Y will be of size [Batch_size, InputSize], it is no more a sequence to sequence but a full sequence to label problem.

    from keras.models import Sequential
    from keras.layers import Dense, Activation, TimeDistributed
    from keras.layers.recurrent import GRU
    import numpy as np
    
    InputSize = 15
    MaxLen = 64
    HiddenSize = 16
    
    OutputSize = 8
    n_samples = 1000
    
    model1 = Sequential()
    model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
    model1.add(TimeDistributed(Dense(OutputSize)))
    model1.add(Activation('softmax'))
    model1.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    
    
    model2 = Sequential()
    model2.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
    model2.add(Dense(OutputSize))
    model2.add(Activation('softmax'))
    model2.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    
    model3 = Sequential()
    model3.add(GRU(HiddenSize, return_sequences=False, input_shape=(MaxLen, InputSize)))
    model3.add(Dense(OutputSize))
    model3.add(Activation('softmax'))
    model3.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    
    X = np.random.random([n_samples,MaxLen,InputSize])
    Y1 = np.random.random([n_samples,MaxLen,OutputSize])
    Y2 = np.random.random([n_samples, OutputSize])
    
    model1.fit(X, Y1, batch_size=128, nb_epoch=1)
    model2.fit(X, Y1, batch_size=128, nb_epoch=1)
    model3.fit(X, Y2, batch_size=128, nb_epoch=1)
    
    print(model1.summary())
    print(model2.summary())
    print(model3.summary())
    

    In the above example architecture of model1 and model2 are sample (sequence to sequence models) and model3 is a full sequence to label model.

    0 讨论(0)
提交回复
热议问题