Implementation of a softmax activation function for neural networks

前端 未结 2 1705
失恋的感觉
失恋的感觉 2020-12-23 15:13

I am using a Softmax activation function in the last layer of a neural network. But I have problems with a safe implementation of this function.

A naive implementati

2条回答
  •  天命终不由人
    2020-12-23 15:24

    I know it's already answered but I'll post here a step-by-step anyway.

    put on log:

    zj = wj . x + bj
    oj = exp(zj)/sum_i{ exp(zi) }
    log oj = zj - log sum_i{ exp(zi) }
    

    Let m be the max_i { zi } use the log-sum-exp trick:

    log oj = zj - log {sum_i { exp(zi + m - m)}}
       = zj - log {sum_i { exp(m) exp(zi - m) }},
       = zj - log {exp(m) sum_i {exp(zi - m)}}
       = zj - m - log {sum_i { exp(zi - m)}}
    

    the term exp(zi-m) can suffer underflow if m is much greater than other z_i, but that's ok since this means z_i is irrelevant on the softmax output after normalization. final results is:

    oj = exp (zj - m - log{sum_i{exp(zi-m)}})
    

提交回复
热议问题