网络上关于ReLU、LReLU等非常多的理论东西,可是大部分都是理论的,聚集怎么应用比较少。
在 Convolutional Neural Network (CNN) https://tensorflow.google.cn/tutorials/images/cnn?hl=en 的学习课程中,激活函数是 relu。
在学习过程中,看有的博文中说当激活函数 ReLU 效果不好时,建议使用LReLU试试,可是网上并没有特别详细的使用方法,只好去官网上找。
1 关于 relu 的常规使用方法
首先使用常规的relu —— 直接使用。
直接使用官网例子《Create the convolutional base》
model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu'))
很明显,这里的 relu 是可以直接使用的。
找到官网关于 activation='relu' 的相关内容
在 activation 的官网中
(Module: tf.compat.v2.keras.activations) https://tensorflow.google.cn/api_docs/python/tf/compat/v2/keras/activations?hl=en
我们可以看到如下内容
Built-in activation functions.
Functions
elu(...): Exponential linear unit.
exponential(...): Exponential activation function.
hard_sigmoid(...): Hard sigmoid activation function.
linear(...): Linear activation function.
relu(...): Rectified Linear Unit.
selu(...): Scaled Exponential Linear Unit (SELU).
sigmoid(...): Sigmoid.
softmax(...): The softmax activation function transforms the outputs so that all values are in
softplus(...): Softplus activation function.
softsign(...): Softsign activation function.
tanh(...): Hyperbolic Tangent (tanh) activation function.
很明显,内建函数中包括有relu;而没有LReLU。所以直接使用 activation='lrelu' 会报错!
报错内容: ValueError: Unknown activation function:lrelu 。
备注: activation='relu' 等价于 activation=tf.keras.activations.relu() 。
2 relu 函数溯源
官网 tf.keras.activations.relu https://tensorflow.google.cn/api_docs/python/tf/keras/activations/relu
tf.keras.activations.relu(
x,
alpha=0.0,
max_value=None,
threshold=0
)
参数:
-
x:张量或变量。 -
alpha:负截面的标量斜率(默认值=0.)。 -
max_value:浮动。饱和度阈值。 -
threshold:浮动。阈值激活的阈值。
使用默认值,该函数返回 element-wise max(x, 0).
否则,它遵循如下规则:
f(x) = max_value for x >= max_value,
f(x) = x for threshold <= x < max_value,
f(x) = alpha * (x - threshold).
其实我在想,要是负截面的标量斜率 alpha ≠ 0 ,是不是就类似于RLeLU 函数了?
接下来我们对比 LeaKyReLU 函数。
3 LeaKyReLU 溯源
3.1 LeaKyReLU 的基本概念
tf.keras.layers.LeakyReLU 的官方网址:https://tensorflow.google.cn/api_docs/python/tf/keras/layers/LeakyReLU?hl=en
class tf.keras.layers.LeakyReLU
首先需要明确的是 LeaKyReLU 是一个类,而不是函数!
该类继承自 layer(当我意识到它是类时,以为继承自layers,尾后附源码)
参数:
alpha:浮点> =0。负斜率系数。
__init__方法
__init__(
alpha=0.3,
**kwargs
)
3.2 LeakyReLU 的实例应用
深度卷积生成对抗网络 Deep Convolutional Generative Adversarial Network
该文中是这样应用的:
def make_generator_model():
model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
assert model.output_shape == (None, 7, 7, 256) # 注意:batch size 没有限制
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)
return model
可以理解为直接实例化应用。
在官方给出的另一文中的应用方法。
Pix2Pix https://tensorflow.google.cn/tutorials/generative/pix2pix
应用方法为
def downsample(filters, size, apply_batchnorm=True):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
kernel_initializer=initializer, use_bias=False))
if apply_batchnorm:
result.add(tf.keras.layers.BatchNormalization())
result.add(tf.keras.layers.LeakyReLU())
return result
红色标识的地方可知,也是直接实例化应用的。
当然了也可以直接赋值并实例化:
3.3 将 LeaKyReLU 替代 relu 的使用方法(第一章节代码)
依据官方的方法进行修改
首先引用文件
from tensorflow.keras import layers, models from tensorflow.keras.layers import LeakyReLU
构建模型
model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), input_shape=(28, 28, 3))) model.add(LeakyReLU(alpha=0.01)) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3))) model.add(LeakyReLU(alpha=0.01)) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3))) model.add(LeakyReLU(alpha=0.01)) model.add(layers.MaxPooling2D((2, 2))) model.summary()
尝试是可以运行的:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 26, 26, 32) 896 _________________________________________________________________ leaky_re_lu (LeakyReLU) (None, 26, 26, 32) 0 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 11, 11, 64) 18496 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 11, 11, 64) 0 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 3, 3, 64) 36928 _________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 3, 3, 64) 0 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 1, 1, 64) 0 ================================================================= Total params: 56,320 Trainable params: 56,320 Non-trainable params: 0 _________________________________________________________________
我们尝试修改代码行数,观察是否可行:
构建模型
model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation=LeakyReLU(alpha=0.01), input_shape=(32, 32, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation=LeakyReLU(alpha=0.01))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation=LeakyReLU(alpha=0.01))) model.add(layers.MaxPooling2D((2, 2))) model.summary()
运行
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 30, 30, 32) 896 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 13, 13, 64) 18496 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 4, 4, 64) 36928 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 2, 2, 64) 0 ================================================================= Total params: 56,320 Trainable params: 56,320 Non-trainable params: 0 _________________________________________________________________
很明显,也是可以运行的。只是它们的 summary 有些区别。
4 LeakyReLU 的底层代码
为了显示底层代码的重要性,我们将其作为单独的章节列出
tensorflow/tensorflow/python/keras/layers/advanced_activations.py
在该内容中我们看到如下代码内容,从代码中可知,class LeakyReLU 确实继承自 class Layer
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Layers that act as activation functions.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.python.keras import backend as K
from tensorflow.python.keras import constraints
from tensorflow.python.keras import initializers
from tensorflow.python.keras import regularizers
from tensorflow.python.keras.engine.base_layer import Layer
from tensorflow.python.keras.engine.input_spec import InputSpec
from tensorflow.python.keras.utils import tf_utils
from tensorflow.python.ops import math_ops
from tensorflow.python.util.tf_export import keras_export
@keras_export('keras.layers.LeakyReLU')
class LeakyReLU(Layer):
"""Leaky version of a Rectified Linear Unit.
It allows a small gradient when the unit is not active:
`f(x) = alpha * x for x < 0`,
`f(x) = x for x >= 0`.
Input shape:
Arbitrary. Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape:
Same shape as the input.
Arguments:
alpha: Float >= 0. Negative slope coefficient.
"""
def __init__(self, alpha=0.3, **kwargs):
super(LeakyReLU, self).__init__(**kwargs)
self.supports_masking = True
self.alpha = K.cast_to_floatx(alpha)
def call(self, inputs):
return K.relu(inputs, alpha=self.alpha)
def get_config(self):
config = {'alpha': float(self.alpha)}
base_config = super(LeakyReLU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@tf_utils.shape_type_conversion
def compute_output_shape(self, input_shape):
return input_shape
@keras_export('keras.layers.PReLU')
class PReLU(Layer):
"""Parametric Rectified Linear Unit.
It follows:
`f(x) = alpha * x for x < 0`,
`f(x) = x for x >= 0`,
where `alpha` is a learned array with the same shape as x.
Input shape:
Arbitrary. Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape:
Same shape as the input.
Arguments:
alpha_initializer: Initializer function for the weights.
alpha_regularizer: Regularizer for the weights.
alpha_constraint: Constraint for the weights.
shared_axes: The axes along which to share learnable
parameters for the activation function.
For example, if the incoming feature maps
are from a 2D convolution
with output shape `(batch, height, width, channels)`,
and you wish to share parameters across space
so that each filter only has one set of parameters,
set `shared_axes=[1, 2]`.
"""
def __init__(self,
alpha_initializer='zeros',
alpha_regularizer=None,
alpha_constraint=None,
shared_axes=None,
**kwargs):
super(PReLU, self).__init__(**kwargs)
self.supports_masking = True
self.alpha_initializer = initializers.get(alpha_initializer)
self.alpha_regularizer = regularizers.get(alpha_regularizer)
self.alpha_constraint = constraints.get(alpha_constraint)
if shared_axes is None:
self.shared_axes = None
elif not isinstance(shared_axes, (list, tuple)):
self.shared_axes = [shared_axes]
else:
self.shared_axes = list(shared_axes)
@tf_utils.shape_type_conversion
def build(self, input_shape):
param_shape = list(input_shape[1:])
if self.shared_axes is not None:
for i in self.shared_axes:
param_shape[i - 1] = 1
self.alpha = self.add_weight(
shape=param_shape,
name='alpha',
initializer=self.alpha_initializer,
regularizer=self.alpha_regularizer,
constraint=self.alpha_constraint)
# Set input spec
axes = {}
if self.shared_axes:
for i in range(1, len(input_shape)):
if i not in self.shared_axes:
axes[i] = input_shape[i]
self.input_spec = InputSpec(ndim=len(input_shape), axes=axes)
self.built = True
def call(self, inputs):
pos = K.relu(inputs)
neg = -self.alpha * K.relu(-inputs)
return pos + neg
def get_config(self):
config = {
'alpha_initializer': initializers.serialize(self.alpha_initializer),
'alpha_regularizer': regularizers.serialize(self.alpha_regularizer),
'alpha_constraint': constraints.serialize(self.alpha_constraint),
'shared_axes': self.shared_axes
}
base_config = super(PReLU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@tf_utils.shape_type_conversion
def compute_output_shape(self, input_shape):
return input_shape
@keras_export('keras.layers.ELU')
class ELU(Layer):
"""Exponential Linear Unit.
It follows:
`f(x) = alpha * (exp(x) - 1.) for x < 0`,
`f(x) = x for x >= 0`.
Input shape:
Arbitrary. Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape:
Same shape as the input.
Arguments:
alpha: Scale for the negative factor.
"""
def __init__(self, alpha=1.0, **kwargs):
super(ELU, self).__init__(**kwargs)
self.supports_masking = True
self.alpha = K.cast_to_floatx(alpha)
def call(self, inputs):
return K.elu(inputs, self.alpha)
def get_config(self):
config = {'alpha': float(self.alpha)}
base_config = super(ELU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@tf_utils.shape_type_conversion
def compute_output_shape(self, input_shape):
return input_shape
@keras_export('keras.layers.ThresholdedReLU')
class ThresholdedReLU(Layer):
"""Thresholded Rectified Linear Unit.
It follows:
`f(x) = x for x > theta`,
`f(x) = 0 otherwise`.
Input shape:
Arbitrary. Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape:
Same shape as the input.
Arguments:
theta: Float >= 0. Threshold location of activation.
"""
def __init__(self, theta=1.0, **kwargs):
super(ThresholdedReLU, self).__init__(**kwargs)
self.supports_masking = True
self.theta = K.cast_to_floatx(theta)
def call(self, inputs):
return inputs * math_ops.cast(
math_ops.greater(inputs, self.theta), K.floatx())
def get_config(self):
config = {'theta': float(self.theta)}
base_config = super(ThresholdedReLU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@tf_utils.shape_type_conversion
def compute_output_shape(self, input_shape):
return input_shape
@keras_export('keras.layers.Softmax')
class Softmax(Layer):
"""Softmax activation function.
Input shape:
Arbitrary. Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape:
Same shape as the input.
Arguments:
axis: Integer, axis along which the softmax normalization is applied.
"""
def __init__(self, axis=-1, **kwargs):
super(Softmax, self).__init__(**kwargs)
self.supports_masking = True
self.axis = axis
def call(self, inputs):
return K.softmax(inputs, axis=self.axis)
def get_config(self):
config = {'axis': self.axis}
base_config = super(Softmax, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@tf_utils.shape_type_conversion
def compute_output_shape(self, input_shape):
return input_shape
@keras_export('keras.layers.ReLU')
class ReLU(Layer):
"""Rectified Linear Unit activation function.
With default values, it returns element-wise `max(x, 0)`.
Otherwise, it follows:
`f(x) = max_value` for `x >= max_value`,
`f(x) = x` for `threshold <= x < max_value`,
`f(x) = negative_slope * (x - threshold)` otherwise.
Input shape:
Arbitrary. Use the keyword argument `input_shape`
(tuple of integers, does not include the samples axis)
when using this layer as the first layer in a model.
Output shape:
Same shape as the input.
Arguments:
max_value: Float >= 0. Maximum activation value.
negative_slope: Float >= 0. Negative slope coefficient.
threshold: Float. Threshold value for thresholded activation.
"""
def __init__(self, max_value=None, negative_slope=0, threshold=0, **kwargs):
super(ReLU, self).__init__(**kwargs)
if max_value is not None and max_value < 0.:
raise ValueError('max_value of Relu layer '
'cannot be negative value: ' + str(max_value))
if negative_slope < 0.:
raise ValueError('negative_slope of Relu layer '
'cannot be negative value: ' + str(negative_slope))
self.support_masking = True
if max_value is not None:
max_value = K.cast_to_floatx(max_value)
self.max_value = max_value
self.negative_slope = K.cast_to_floatx(negative_slope)
self.threshold = K.cast_to_floatx(threshold)
def call(self, inputs):
# alpha is used for leaky relu slope in activations instead of
# negative_slope.
return K.relu(inputs,
alpha=self.negative_slope,
max_value=self.max_value,
threshold=self.threshold)
def get_config(self):
config = {
'max_value': self.max_value,
'negative_slope': self.negative_slope,
'threshold': self.threshold
}
base_config = super(ReLU, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
@tf_utils.shape_type_conversion
def compute_output_shape(self, input_shape):
return input_shape