The sigmoid function is defined as
I found that using the C built-in function exp()
to calculate the value of f(x)
is slow. Is th
This answer probably isn't relevant for most cases, but just wanted to throw out there that for CUDA computing I've found x/sqrt(1+x^2)
to be the fastest function by far.
For example, done with single precision float intrinsics:
__device__ void fooCudaKernel(/* some arguments */) {
float foo, sigmoid;
// some code defining foo
sigmoid = __fmul_rz(rsqrtf(__fmaf_rz(foo,foo,1)),foo);
}
you don't have to use the actual, exact sigmoid function in a neural network algorithm but can replace it with an approximated version that has similar properties but is faster the compute.
For example, you can use the "fast sigmoid" function
f(x) = x / (1 + abs(x))
Using first terms of the series expansion for exp(x) won't help too much if the arguments to f(x) are not near zero, and you have the same problem with a series expansion of the sigmoid function if the arguments are "large".
An alternative is to use table lookup. That is, you precalculate the values of the sigmoid function for a given number of data points, and then do fast (linear) interpolation between them if you want.
People here are mostly concerned about how fast one function is relative to another and create micro benchmark to see whether f1(x)
runs 0.0001 ms faster than f2(x)
. The big problem is that this is mostly irrelevant, because what matters is how fast your network learns with your activation function trying to minimize your cost function.
As of current theory, rectifier function and softplus
compared to sigmoid function or similar activation functions, allow for faster and effective training of deep neural architectures on large and complex datasets.
So I suggest to throw away micro-optimization, and take a look at which function allows faster learning (also taking looking at various other cost function).
You can also use this:
y=x / (2 * (x<0.0?-x:x) + 2) + 0.5;
y'=y(1-y);
acts like a sigmoid now because y(1-y)=y' is more let say round than 1/(2 (1 + abs(x))^2) acts more like to fast sigmoid;
I don't think you can do better than the built-in exp() but if you want another approach, you can use series expansion. WolframAlpha can compute it for you.
The tanh function may be optimized in some languages, making it faster than a custom defined x/(1+abs(x)), such is the case in Julia.