In simple multi-layer FFNN only ReLU activation function doesn't converge

ぐ巨炮叔叔 提交于 2020-01-11 13:43:10

问题


I'm learning tensorflow, deep learning and experimenting various kinds of activation functions.

I created a multi-layer FFNN for the MNIST problem. Mostly based on the tutorial from the official tensorflow website, except that 3 hidden layers were added.

The activation functions I have experimented are: tf.sigmoid, tf.nn.tanh, tf.nn.softsign, tf.nn.softmax, tf.nn.relu. Only tf.nn.relu doesn't converge, the network output random noise (testing accuracy is about 10%). The following are my source code:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x = tf.placeholder(tf.float32, [None, 784])

W0 = tf.Variable(tf.random_normal([784, 200]))
b0 = tf.Variable(tf.random_normal([200]))
hidden0 = tf.nn.relu(tf.matmul(x, W0) + b0)

W1 = tf.Variable(tf.random_normal([200, 200]))
b1 = tf.Variable(tf.random_normal([200]))
hidden1 = tf.nn.relu(tf.matmul(hidden0, W1) + b1)

W2 = tf.Variable(tf.random_normal([200, 200]))
b2 = tf.Variable(tf.random_normal([200]))
hidden2 = tf.nn.relu(tf.matmul(hidden1, W2) + b2)

W3 = tf.Variable(tf.random_normal([200, 10]))
b3 = tf.Variable(tf.random_normal([10]))
y = tf.matmul(hidden2, W3) + b3

y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for _ in range(10000):
        batch_xs, batch_ys = mnist.train.next_batch(128)
        session.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
        if _ % 1000 == 0:
            correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
            print(_, session.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print('final:', session.run(accuracy, feed_dict={x: mnist.test.images,
            y_: mnist.test.labels}))

The code outputs something like this:

0 0.098
1000 0.098
2000 0.098
3000 0.098
4000 0.098
5000 0.098
6000 0.098
7000 0.098
8000 0.098
9000 0.098
final: 0.098

If tf.nn.relu is replaced with other activation functions the network accuracy improves gradually (with different final accuracy though), which is expected.

I have read in may textbooks/tutorials that ReLU should be the first candidate as activation function.

My question is why ReLU doesn't work in my network? or my program is simply wrong ?


回答1:


You are using the Relu activation function that computes the activation as follows,

max(features, 0)

Since it outputs the max value, this sometimes causes the exploding gradient.

Gradientdecnt optimizer update the weight via the following,

∆wij = −η ∂Ei/ ∂wij

where η is the learning rate and ∂Ei/∂wij is the partial derivation of the loss w.r.t weight. When max values gets larger and larger, partial derivations also gets larger and causes the exploding gradient. Therefore, as you can observe in the equation, you need to tune the learning rate (η) to overcome this situation.

A common rule is to reduce the learning rate, usually by a factor of 10 each time.

For your case, set the learning rate = 0.001 and will improve the accuracy.

Hope this helps.



来源:https://stackoverflow.com/questions/47235290/in-simple-multi-layer-ffnn-only-relu-activation-function-doesnt-converge

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!