Fast sigmoid algorithm

后端 未结 11 1875
庸人自扰
庸人自扰 2020-12-12 15:55

The sigmoid function is defined as

I found that using the C built-in function exp() to calculate the value of f(x) is slow. Is th

相关标签:
11条回答
  • 2020-12-12 16:26

    To do the NN more flexible usually used some alpha rate to change the angle of graph around 0.

    The sigmoid function looks like:

    f(x) = 1 / ( 1+exp(-x*alpha))
    

    The nearly equivalent, (but more faster function) is:

    f(x) = 0.5 * (x * alpha / (1 + abs(x*alpha))) + 0.5
    

    You can check the graphs here

    When I using abs function the network become faster 100+ times.

    0 讨论(0)
  • 2020-12-12 16:27

    Using Eureqa to search for approximations to sigmoid I found 1/(1 + 0.3678749025^x) approximates it. It's pretty close, just gets rid of one operation with the negation of x.

    Some of the other functions shown here are interesting, but is the power operation really that slow? I tested it and it actually did faster than addition, but that could just be a fluke. If so it should be just as fast or faster as all the others.

    EDIT:0.5 + 0.5*tanh(0.5*x) and less accurate, 0.5 + 0.5*tanh(n) also works. And you could just get rid of the constants if you don't care about getting it between the range [0,1] like sigmoid. But it assumes that tanh is faster.

    0 讨论(0)
  • 2020-12-12 16:35

    Also you might use rough version of sigmoid (it differences not greater than 0.2% from original):

        inline float RoughSigmoid(float value)
        {
            float x = ::abs(value);
            float x2 = x*x;
            float e = 1.0f + x + x2*0.555f + x2*x2*0.143f;
            return 1.0f / (1.0f + (value > 0 ? 1.0f / e : e));
        }
    
        void RoughSigmoid(const float * src, size_t size, const float * slope, float * dst)
        {
            float s = slope[0];
            for (size_t i = 0; i < size; ++i)
                dst[i] = RoughSigmoid(src[i] * s);
        }
    

    Optimization of RoughSigmoid function with using SSE:

        #include <xmmintrin.h>
    
        void RoughSigmoid(const float * src, size_t size, const float * slope, float * dst)
        {
            size_t alignedSize =  size/4*4;
            __m128 _slope = _mm_set1_ps(*slope);
            __m128 _0 = _mm_set1_ps(-0.0f);
            __m128 _1 = _mm_set1_ps(1.0f);
            __m128 _0555 = _mm_set1_ps(0.555f);
            __m128 _0143 = _mm_set1_ps(0.143f);
            size_t i = 0;
            for (; i < alignedSize; i += 4)
            {
                __m128 _src = _mm_loadu_ps(src + i);
                __m128 x = _mm_andnot_ps(_0, _mm_mul_ps(_src, _slope));
                __m128 x2 = _mm_mul_ps(x, x);
                __m128 x4 = _mm_mul_ps(x2, x2);
                __m128 series = _mm_add_ps(_mm_add_ps(_1, x), _mm_add_ps(_mm_mul_ps(x2, _0555), _mm_mul_ps(x4, _0143)));
                __m128 mask = _mm_cmpgt_ps(_src, _0);
                __m128 exp = _mm_or_ps(_mm_and_ps(_mm_rcp_ps(series), mask), _mm_andnot_ps(mask, series));
                __m128 sigmoid = _mm_rcp_ps(_mm_add_ps(_1, exp));
                _mm_storeu_ps(dst + i, sigmoid);
            }
            for (; i < size; ++i)
                dst[i] = RoughSigmoid(src[i] * slope[0]);
        }
    

    Optimization of RoughSigmoid function with using AVX:

        #include <immintrin.h>
    
        void RoughSigmoid(const float * src, size_t size, const float * slope, float * dst)
        {
            size_t alignedSize = size/8*8;
            __m256 _slope = _mm256_set1_ps(*slope);
            __m256 _0 = _mm256_set1_ps(-0.0f);
            __m256 _1 = _mm256_set1_ps(1.0f);
            __m256 _0555 = _mm256_set1_ps(0.555f);
            __m256 _0143 = _mm256_set1_ps(0.143f);
            size_t i = 0;
            for (; i < alignedSize; i += 8)
            {
                __m256 _src = _mm256_loadu_ps(src + i);
                __m256 x = _mm256_andnot_ps(_0, _mm256_mul_ps(_src, _slope));
                __m256 x2 = _mm256_mul_ps(x, x);
                __m256 x4 = _mm256_mul_ps(x2, x2);
                __m256 series = _mm256_add_ps(_mm256_add_ps(_1, x), _mm256_add_ps(_mm256_mul_ps(x2, _0555), _mm256_mul_ps(x4, _0143)));
                __m256 mask = _mm256_cmp_ps(_src, _0, _CMP_GT_OS);
                __m256 exp = _mm256_or_ps(_mm256_and_ps(_mm256_rcp_ps(series), mask), _mm256_andnot_ps(mask, series));
                __m256 sigmoid = _mm256_rcp_ps(_mm256_add_ps(_1, exp));
                _mm256_storeu_ps(dst + i, sigmoid);
            }
            for (; i < size; ++i)
                dst[i] = RoughSigmoid(src[i] * slope[0]);
        }
    
    0 讨论(0)
  • 2020-12-12 16:35

    You can use a simple but effective method by using two formulas:

    if x < 0 then f(x) = 1 / (0.5/(1+(x^2)))
    if x > 0 then f(x) = 1 / (-0.5/(1+(x^2)))+1
    

    This will look like this:

    Two graphs for a sigmoid {Blue: (0.5/(1+(x^2))), Yellow: (-0.5/(1+(x^2)))+1}

    0 讨论(0)
  • 2020-12-12 16:36

    It's best to measure on your hardware first. Just a quick benchmark script shows, that on my machine 1/(1+|x|) is the fastest, and tanh(x) is the close second. Error function erf is pretty fast too.

    % gcc -Wall -O2 -lm -o sigmoid-bench{,.c} -std=c99 && ./sigmoid-bench
    atan(pi*x/2)*2/pi   24.1 ns
    atan(x)             23.0 ns
    1/(1+exp(-x))       20.4 ns
    1/sqrt(1+x^2)       13.4 ns
    erf(sqrt(pi)*x/2)    6.7 ns
    tanh(x)              5.5 ns
    x/(1+|x|)            5.5 ns
    

    I expect that the results may vary depending on architecture and the compiler used, but erf(x) (since C99), tanh(x) and x/(1.0+fabs(x)) are likely to be the fast performers.

    0 讨论(0)
提交回复
热议问题