问题
I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function
The network isn't leraning for first 5-10 epochs, and then everything is fine.
I tried using initializers and regularizers, but that only made it worse.
I use the network like this:
import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()
# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
datagen = ImageDataGenerator(
rescale=1./255,
width_shift_range=[-1, 0, 1],
height_shift_range=[-1, 0, 1],
rotation_range=10)
epochs = 20
batch_size = 50
num_classes = 10
model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
input_shape=x_train.shape[1:],
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
verbose=2, shuffle=True)
With the code above I receive results like these:
Epoch 1/20
- 55s - loss: 2.3098 - accuracy: 0.1036
Epoch 2/20
- 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20
- 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20
- 56s - loss: 2.3060 - accuracy: 0.1079
...
For 7 epochs (diffirent every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.
But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch
This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?
回答1:
Activation function suggestion:
In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of “dead” units in network, try Leaky ReLU and tanh. Never use sigmoid.
Reasons for not using the sigmoid:
A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.
来源:https://stackoverflow.com/questions/58608113/neural-network-isnt-learning-for-a-first-few-epochs-on-keras