I am playing with a ANN which is part of Udacity DeepLearning course.
I have an assignment which involves introducing generalization to the network with one hidden R
In fact, we usually do not regularize bias terms (intercepts). So, I go for:
loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=out_layer, labels=tf_train_labels)) +
0.01*tf.nn.l2_loss(hidden_weights) +
0.01*tf.nn.l2_loss(out_weights))
By penalizing the intercept term, as the intercept is added to y values, it will result in changing the y values, adding a constant c to the intercepts. Having it or not will not change the results but takes some computations