Neural network isn't learning for a first few epochs on Keras

僤鯓⒐⒋嵵緔 提交于 2020-02-05 14:39:20

问题


I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function

The network isn't leraning for first 5-10 epochs, and then everything is fine. I tried using initializers and regularizers, but that only made it worse.
I use the network like this:

import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot


# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()

# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

datagen = ImageDataGenerator(
    rescale=1./255,
    width_shift_range=[-1, 0, 1],
    height_shift_range=[-1, 0, 1],
    rotation_range=10)

epochs = 20
batch_size = 50
num_classes = 10

model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
                 input_shape=x_train.shape[1:],
                 activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
                              activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
                             activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
                             activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
                    verbose=2, shuffle=True)

With the code above I receive results like these:

Epoch 1/20
- 55s - loss: 2.3098 - accuracy: 0.1036
Epoch 2/20
- 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20
- 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20
- 56s - loss: 2.3060 - accuracy: 0.1079
...
For 7 epochs (diffirent every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs. But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch

This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?


回答1:


Activation function suggestion:

In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of “dead” units in network, try Leaky ReLU and tanh. Never use sigmoid.

Reasons for not using the sigmoid:

A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.



来源:https://stackoverflow.com/questions/58608113/neural-network-isnt-learning-for-a-first-few-epochs-on-keras

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!