问题
I created the exact same network with native and backend tensorflow but after many hours of testing using number of different parameters, still couldn't figure out why keras outperforms the native tensorflow and produces better(slightly but better) results.
Does Keras implement a different weight initializer method? or performs different weight decay approach other than tf.train.inverse_time_decay?
P.s. the score difference is always like
Keras with Tensorflow: ~0.9850 - 0.9885 - ~45 sec. avg. training time for 1 epoch
Tensorflow Native ~0.9780 - 0.9830 - ~23 sec.
My environment is:
Python 3.5.2 -Anaconda / Windows 10
CUDA: 8.0 with cuDNN 5.1
Keras 1.2.1
Tensorflow 0.12.1
Nvidia Geforce GTX 860M
and keras.json file:
{
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
and you can also copy and execute following two files
https://github.com/emrahyigit/deep/blob/master/keras_cnn_mnist.py
https://github.com/emrahyigit/deep/blob/master/tf_cnn_mnist.py
https://github.com/emrahyigit/deep/blob/master/mnist.py
回答1:
The problem was due to incorrect use of keep_prob parameter of the dropout layer as I should have fed this parameter with different values on train and test process.
来源:https://stackoverflow.com/questions/41777466/native-tf-vs-keras-tf-performance-comparison