Fastest implementation of log2(int) and log2(float)

前端 未结 9 1463
天命终不由人
天命终不由人 2020-12-08 04:57

The question is

Are there any other (and/or faster) implementations of a basic 2log?

Applications

The log2(int) and

相关标签:
9条回答
  • 2020-12-08 05:10

    There have been quite a few answers providing fast approximate approaches to log2(int) but few for log2(float), so here's two (Java implementation given) that use both a lookup table and mantissa/bit hacking:

    Fast accurate log2(float):

    /**
     * Calculate the logarithm to base 2, handling special cases.
     */
    public static float log2(float x) {
    
        final int bits = Float.floatToRawIntBits(x);
        final int e = (bits >> 23) & 0xff;
        final int m = (bits & 0x7fffff);
    
        if (e == 255) {
            if (m != 0) {
                return Float.NaN;
            }
            return ((bits >> 31) != 0) ? Float.NaN : Float.POSITIVE_INFINITY;
        }
    
        if ((bits >> 31) != 0) {
            return (e == 0 && m == 0) ? Float.NEGATIVE_INFINITY : Float.NaN;
        }
    
        return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
    }
    

    Note:

    • If the argument is NaN or less than zero, then the result is NaN.
    • If the argument is positive infinity, then the result is positive infinity.
    • If the argument is positive zero or negative zero, then the result is negative infinity.

    Fast accurate log2(float) (slightly faster, no checking):

    /**
     * Calculate the logarithm using base 2. Requires the argument be finite and
     * positive.
     */
    public static float fastLog2(float x) {
        final int bits = Float.floatToRawIntBits(x);
        final int e = (bits >> 23) & 0xff;
        final int m = (bits & 0x7fffff);
        return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
    }
    

    This second method forgoes the checking present in the other method and therefore has the following special cases:

    • If the argument is NaN, then the result is incorrect.
    • If the argument is negative, then the result is incorrect.
    • If the argument is positive infinity, then the result is incorrect.
    • If the argument is positive zero or negative zero, then the result is negative infinity.

    Both methods upon rely on a lookup table data (and variables q and qm1). These are populated with the following method. n defines the accuracy-space tradeoff.

    static int q, qm1;
    static float[] data;
    
    /**
     * Compute lookup table for a given base table size.
     * 
     * @param n The number of bits to keep from the mantissa. Table storage =
     *          2^(n+1) * 4 bytes, e.g. 64Kb for n=13. Must be in the range
     *          0<=n<=23
     */
    public static void populateLUT(int n) {
    
        final int size = 1 << (n + 1);
    
        q = 23 - n;
        qm1 = q - 1;
        data = new float[size];
    
        for (int i = 0; i < size; i++) {
            data[i] = (float) (Math.log(i << q) / Math.log(2)) - 150;
        }
    }
    

    populateLUT(12);
    log2(6666); // = 12.702606
    
    0 讨论(0)
  • 2020-12-08 05:12

    (I haven't done any measurements so this may not match up, but I thought user user9337139's idea was neat and wanted to try the same in C# - his is C++).

    Here's a C# int Magnitude(byte) function based on converting the byte value to float and extracting the exponent from the IEEE float representation.

        using System.Runtime.InteropServices;
    
        [StructLayout(LayoutKind.Explicit)]
        struct UnionWorker
        {
            [FieldOffset(0)]
            public int i;
            [FieldOffset(0)]
            public float f;
        }
    
        static int Magnitude(byte b)
        {
            UnionWorker u;
            u.i = 0; // just to please the compiler
            u.f = b;
            return Math.Max((u.i >> 23) & 0xFF, 126) - 126;
        }
    

    Returns zero for zero, 8 for 0xFF, other values as you would expect.

    Zero is a special case, so I needed the Math.Max clamp for that. I suspect user9337139's solution might have a similar problem.

    Note, this has not been tested for endianness issues - caveat emptor.

    0 讨论(0)
  • 2020-12-08 05:13

    For more algorithms look here http://www.asmcommunity.net/forums/topic/?id=15010

    Also did some testing in C++ and my implementation of BSR is slower than lookup table

    • i am using BDS2006 there is probably slow down by state push/popping by asm directive
    • your lookup is fine but i am using 11 bits table instead of 8
    • it divides 32 bit into 3 branches instead of 4
    • and it is still small enough to handle without init function

    code:

    //---------------------------------------------------------------------------
    DWORD log2_slow(const DWORD &x)
        {
        DWORD m,i;
        if (!x) return 0;
        if (x>=0x80000000) return 31;
        for (m=1,i=0;m<x;m<<=1,i++);
         if (m!=x) i--;
        return i;
        }
    //---------------------------------------------------------------------------
    DWORD log2_asm(const DWORD &x)
        {
        DWORD xx=x;
        asm {
            mov eax,xx
            bsr eax,eax;
            mov xx,eax;
            }
        return xx;
        }
    //---------------------------------------------------------------------------
    BYTE _log2[2048]=
        {
         0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
         7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
         8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
         8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
         9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
         9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
         9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
         9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
        };
    DWORD log2(const DWORD &x)
        {
             if (x>=0x00400000) return _log2[x>>22]+22;
        else if (x>=0x00000800) return _log2[x>>11]+11;
        else                    return _log2[x];
        }
    //---------------------------------------------------------------------------
    

    test code:

    DWORD x,j,i,n=256;
    tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2     (j<<i); tend(); mm_log->Lines->Add(tstr(1));
    tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_asm (j<<i); tend(); mm_log->Lines->Add(tstr(1));
    tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_slow(j<<i); tend(); mm_log->Lines->Add(tstr(1));
    

    my results on AMD A8-5500 3.2 GHz:

    [   0.040 ms] log2     (x) - 11bit lookup table
    [   0.060 ms] log2_asm (x) - BSR
    [   0.415 ms] log2_slow(x) - shift loop
    

    Note:

    • log2(0) -> 0 because of use of DWORDS, in real it should be -inf
    • all other values are correct for all functions
    0 讨论(0)
  • 2020-12-08 05:13
    inline int fast_log2(register double x)
    { 
        return (reinterpret_cast<uint64_t&>(x) >> 52) - 1023;
    };
    
    0 讨论(0)
  • 2020-12-08 05:14

    There are some integer algorithms here.

    In C#:

    public static uint FloorLog2(uint x)
    {
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
    
        return (uint)(NumBitsSet(x) - 1);
    }
    
    public static uint CeilingLog2(uint x)
    {
        int y = (int)(x & (x - 1));
    
        y |= -y;
        y >>= (WORDBITS - 1);
        x |= (x >> 1);
        x |= (x >> 2);
        x |= (x >> 4);
        x |= (x >> 8);
        x |= (x >> 16);
    
        return (uint)(NumBitsSet(x) - 1 - y);
    }
    
    public static int NumBitsSet(uint x)
    {
        x -= ((x >> 1) & 0x55555555);
        x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
        x = (((x >> 4) + x) & 0x0f0f0f0f);
        x += (x >> 8);
        x += (x >> 16);
    
        return (int)(x & 0x0000003f);
    }
    
    private const int WORDBITS = 32;
    

    You should look at the original code on the site I linked for the context, particularly what happens with Log2(0).

    0 讨论(0)
  • 2020-12-08 05:14
        static byte FloorLog2(UInt16 value)
        {
            for (byte i = 0; i < 15; ++i)
            {
                if ((value >>= 1) < 1)
                {
                    return i;
                }
            }
            return 15;
        }
    
    0 讨论(0)
提交回复
热议问题