Fast sigmoid algorithm

后端未结

关注

 11  1884

庸人自扰

The sigmoid function is defined as

I found that using the C built-in function exp() to calculate the value of f(x) is slow. Is th

相关标签:

11条回答

执笔经年

2020-12-12 16:26
To do the NN more flexible usually used some alpha rate to change the angle of graph around 0.

The sigmoid function looks like:
```
f(x) = 1 / ( 1+exp(-x*alpha))
```
The nearly equivalent, (but more faster function) is:
```
f(x) = 0.5 * (x * alpha / (1 + abs(x*alpha))) + 0.5
```
You can check the graphs here

When I using abs function the network become faster 100+ times.
0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2020-12-12 16:27

Using Eureqa to search for approximations to sigmoid I found 1/(1 + 0.3678749025^x) approximates it. It's pretty close, just gets rid of one operation with the negation of x.

Some of the other functions shown here are interesting, but is the power operation really that slow? I tested it and it actually did faster than addition, but that could just be a fluke. If so it should be just as fast or faster as all the others.

EDIT:0.5 + 0.5*tanh(0.5*x) and less accurate, 0.5 + 0.5*tanh(n) also works. And you could just get rid of the constants if you don't care about getting it between the range [0,1] like sigmoid. But it assumes that tanh is faster.

0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2020-12-12 16:35

Also you might use rough version of sigmoid (it differences not greater than 0.2% from original):

    inline float RoughSigmoid(float value)
    {
        float x = ::abs(value);
        float x2 = x*x;
        float e = 1.0f + x + x2*0.555f + x2*x2*0.143f;
        return 1.0f / (1.0f + (value > 0 ? 1.0f / e : e));
    }

    void RoughSigmoid(const float * src, size_t size, const float * slope, float * dst)
    {
        float s = slope[0];
        for (size_t i = 0; i < size; ++i)
            dst[i] = RoughSigmoid(src[i] * s);
    }

Optimization of RoughSigmoid function with using SSE:

    #include <xmmintrin.h>

    void RoughSigmoid(const float * src, size_t size, const float * slope, float * dst)
    {
        size_t alignedSize =  size/4*4;
        __m128 _slope = _mm_set1_ps(*slope);
        __m128 _0 = _mm_set1_ps(-0.0f);
        __m128 _1 = _mm_set1_ps(1.0f);
        __m128 _0555 = _mm_set1_ps(0.555f);
        __m128 _0143 = _mm_set1_ps(0.143f);
        size_t i = 0;
        for (; i < alignedSize; i += 4)
        {
            __m128 _src = _mm_loadu_ps(src + i);
            __m128 x = _mm_andnot_ps(_0, _mm_mul_ps(_src, _slope));
            __m128 x2 = _mm_mul_ps(x, x);
            __m128 x4 = _mm_mul_ps(x2, x2);
            __m128 series = _mm_add_ps(_mm_add_ps(_1, x), _mm_add_ps(_mm_mul_ps(x2, _0555), _mm_mul_ps(x4, _0143)));
            __m128 mask = _mm_cmpgt_ps(_src, _0);
            __m128 exp = _mm_or_ps(_mm_and_ps(_mm_rcp_ps(series), mask), _mm_andnot_ps(mask, series));
            __m128 sigmoid = _mm_rcp_ps(_mm_add_ps(_1, exp));
            _mm_storeu_ps(dst + i, sigmoid);
        }
        for (; i < size; ++i)
            dst[i] = RoughSigmoid(src[i] * slope[0]);
    }

Optimization of RoughSigmoid function with using AVX:

    #include <immintrin.h>

    void RoughSigmoid(const float * src, size_t size, const float * slope, float * dst)
    {
        size_t alignedSize = size/8*8;
        __m256 _slope = _mm256_set1_ps(*slope);
        __m256 _0 = _mm256_set1_ps(-0.0f);
        __m256 _1 = _mm256_set1_ps(1.0f);
        __m256 _0555 = _mm256_set1_ps(0.555f);
        __m256 _0143 = _mm256_set1_ps(0.143f);
        size_t i = 0;
        for (; i < alignedSize; i += 8)
        {
            __m256 _src = _mm256_loadu_ps(src + i);
            __m256 x = _mm256_andnot_ps(_0, _mm256_mul_ps(_src, _slope));
            __m256 x2 = _mm256_mul_ps(x, x);
            __m256 x4 = _mm256_mul_ps(x2, x2);
            __m256 series = _mm256_add_ps(_mm256_add_ps(_1, x), _mm256_add_ps(_mm256_mul_ps(x2, _0555), _mm256_mul_ps(x4, _0143)));
            __m256 mask = _mm256_cmp_ps(_src, _0, _CMP_GT_OS);
            __m256 exp = _mm256_or_ps(_mm256_and_ps(_mm256_rcp_ps(series), mask), _mm256_andnot_ps(mask, series));
            __m256 sigmoid = _mm256_rcp_ps(_mm256_add_ps(_1, exp));
            _mm256_storeu_ps(dst + i, sigmoid);
        }
        for (; i < size; ++i)
            dst[i] = RoughSigmoid(src[i] * slope[0]);
    }

0 讨论(0)

予麋鹿

2020-12-12 16:35
You can use a simple but effective method by using two formulas:
```
if x < 0 then f(x) = 1 / (0.5/(1+(x^2)))
if x > 0 then f(x) = 1 / (-0.5/(1+(x^2)))+1
```
This will look like this:

Two graphs for a sigmoid {Blue: (0.5/(1+(x^2))), Yellow: (-0.5/(1+(x^2)))+1}
0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-12 16:36
It's best to measure on your hardware first. Just a quick benchmark script shows, that on my machine 1/(1+|x|) is the fastest, and tanh(x) is the close second. Error function erf is pretty fast too.
```
% gcc -Wall -O2 -lm -o sigmoid-bench{,.c} -std=c99 && ./sigmoid-bench
atan(pi*x/2)*2/pi   24.1 ns
atan(x)             23.0 ns
1/(1+exp(-x))       20.4 ns
1/sqrt(1+x^2)       13.4 ns
erf(sqrt(pi)*x/2)    6.7 ns
tanh(x)              5.5 ns
x/(1+|x|)            5.5 ns
```
I expect that the results may vary depending on architecture and the compiler used, but erf(x) (since C99), tanh(x) and x/(1.0+fabs(x)) are likely to be the fast performers.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2