问题
I wrote this little model using Keras Functional API to find similarity of a dialogue between two individuals. I am using Gensim's Doc2Vec embeddings for transforming text-data into vectors (vocab size: 4117). My data is equally divided up into 56 positive cases and 64 negative cases. (yes I know the dataset is small - but that's all I have for the time being).
def euclidean_distance(vects):
x, y = vects
sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
return K.sqrt(K.maximum(sum_square, K.epsilon()))
ch_inp = Input(shape=(38, 200))
csr_inp = Input(shape=(38, 200))
inp = Input(shape=(38, 200))
net = Embedding(int(vocab_size), 16)(inp)
net = Conv2D(16, 1, activation='relu')(net)
net = TimeDistributed(LSTM(8, return_sequences=True))(net)
out = Activation('relu')(net)
sia = Model(inp, out)
x = sia(csr_inp)
y = sia(ch_inp)
sub = Subtract()([x, y])
mul = Multiply()([sub, sub])
mul_x = Multiply()([x, x])
mul_y = Multiply()([y, y])
sub_xy = Subtract()([x, y])
euc = Lambda(euclidean_distance)([x, y])
z = Concatenate(axis=-1)([euc, sub_xy, mul])
z = TimeDistributed(Bidirectional(LSTM(4)))(z)
z = Activation('relu')(z)
z = GlobalMaxPooling1D()(z)
z = Dense(2, activation='relu')(z)
out = Dense(1, activation = 'sigmoid')(z)
model = Model([ch_inp, csr_inp], out)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
The problem is: my accuracy won't improve from 60.87% - I ran 10 epochs and the accuracy remains constant. Is there something I've done here in my code that's causing that? Or perhaps its an issue with my data?
I also did K-Fold Validation for some Sklearn models and got these results from the dataset:
Additionally, an overview of my dataset is attached below:
I'm definitely struggling with this one - so literally any help here would be appreciated. Thanks!
UPDATE: I increased my data-size to 1875 train-samples. Its accuracy improved to 70.28%. But its still constant over all iterations.
回答1:
I see two things that may be important there.
You're using
'relu'after theLSTM. AnLSTMin Keras already has'tanh'as default activation. So, although you're not locking your model, you're making it harder for it to learn, with an activation that constraints the results between as small range plus one that cuts the negative valuesYou're using
'relu'with very few units! Relu with few units, bad initialization, big learning rates and bad luck will get stuck in the zero region without any gradients.
If your loss completely freezes, it's most probably due to the second point above. And even if it doesn't freeze, it may be using just one unit from the 2 Dense units, for instance, making the layer very poor.
You should do something from below:
- Your model is small, so quit using
'relu'and use'tanh'instead. This will give your model the expected power it should have. - Otherwise, you should definitely increase the number of units, both for the
LSTMand for theDense, so'relu'doesn't get easily stuck. - You can add a
BatchNormalizationlayer afterDenseand before'relu', this way you guarantee that a good amount units will always be above zero.
In any case, don't use 'relu' after the LSTM.
The other approach would be making the model more powerful.
For instance:
z = TimeDistributed(Bidirectional(LSTM(4)))(z)
z = Conv1D(10, 3, activation = 'tanh')(z) #or 'relu' maybe
z = MaxPooling1D(z)
z = Conv1D(15, 3, activation = 'tanh')(z) #or 'relu' maybe
z = Flatten()(z) #unless the length is variable, then GlobalAveragePooling1D()(z)
z = Dense(10, activation='relu')(z)
out = Dense(1, activation = 'sigmoid')(z)
来源:https://stackoverflow.com/questions/60100361/improve-accuracy-for-a-siamese-network