The official Tensorflow API doc claims that the parameter kernel_initializer defaults to None for tf.layers.conv2d and tf.layers
Great question! It is quite a trick to find out!
variable_scope.get_variable: In code:
self.kernel = vs.get_variable('kernel',
shape=kernel_shape,
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
trainable=True,
dtype=self.dtype)
Next step: what does the variable scope do when the initializer is None?
Here it says:
If initializer is
None(the default), the default initializer passed in the constructor is used. If that one isNonetoo, we use a newglorot_uniform_initializer.
So the answer is: it uses the glorot_uniform_initializer
For completeness the definition of this initializer:
The Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where
limitissqrt(6 / (fan_in + fan_out))wherefan_inis the number of input units in the weight tensor andfan_outis the number of output units in the weight tensor. Reference: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
Edit: this is what I found in the code and documentation. Perhaps you could verify that the initialization looks like this by running eval on the weights!
2.0 Compatible Answer: Even in Tensorflow 2.0, the Default Kernel Initializer in tf.keras.layers.Conv2D and tf.keras.layers.Dense is glorot_uniform.
This is specified in the Tensorflow.org Website.
Link for Conv2D is https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D?version=nightly#init
and the Link for Dense is
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense?version=nightly#init
According to this course by Andrew Ng and the Xavier documentation, if you are using ReLU as activation function, better change the default weights initializer(which is Xavier uniform) to Xavier normal by:
y = tf.layers.conv2d(x, kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False), )
In CNN, kernels values are initialized randomly. Then the values will be readjusted during backpropagation to yield better edge detection(!) kernels. See this