(vanishing gradient) (exploding gradient)
aturate,
(CNN)(RNN)
tanh
1
2
z = 1/(1 + np.exp(-np.dot(W, x))) #
dx = np.dot(W.T, z*(1-z)) # :
dW = np.outer(z*(1-z), x) # : W
(long term dependencies)
文章来源: 梯度消失与梯度爆炸总结
(vanishing gradient) (exploding gradient)
aturate,
(CNN)(RNN)
1
2
z = 1/(1 + np.exp(-np.dot(W, x))) #
dx = np.dot(W.T, z*(1-z)) # :
dW = np.outer(z*(1-z), x) # : W
(long term dependencies)