How can I bit-reflect a byte in Delphi?

问题

Is there an easy way to bit-reflect a byte variable in Delphi so that the most significant bit (MSB) gets the least significant bit (LSB) and vice versa?

回答1:

In code you can do it like this:

function ReverseBits(b: Byte): Byte;
var 
  i: Integer;
begin
  Result := 0;
  for i := 1 to 8 do
  begin
    Result := (Result shl 1) or (b and 1);
    b := b shr 1;
  end;
end;

But a lookup table would be much more efficient, and only consume 256 bytes of memory.

function ReverseBits(b: Byte): Byte; inline;
const
  Table: array [Byte] of Byte = (
    0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240,
    8,136,72,200,40,168,104,232,24,152,88,216,56,184,120,248,
    4,132,68,196,36,164,100,228,20,148,84,212,52,180,116,244,
    12,140,76,204,44,172,108,236,28,156,92,220,60,188,124,252,
    2,130,66,194,34,162,98,226,18,146,82,210,50,178,114,242,
    10,138,74,202,42,170,106,234,26,154,90,218,58,186,122,250,
    6,134,70,198,38,166,102,230,22,150,86,214,54,182,118,246,
    14,142,78,206,46,174,110,238,30,158,94,222,62,190,126,254,
    1,129,65,193,33,161,97,225,17,145,81,209,49,177,113,241,
    9,137,73,201,41,169,105,233,25,153,89,217,57,185,121,249,
    5,133,69,197,37,165,101,229,21,149,85,213,53,181,117,245,
    13,141,77,205,45,173,109,237,29,157,93,221,61,189,125,253,
    3,131,67,195,35,163,99,227,19,147,83,211,51,179,115,243,
    11,139,75,203,43,171,107,235,27,155,91,219,59,187,123,251,
    7,135,71,199,39,167,103,231,23,151,87,215,55,183,119,247,
    15,143,79,207,47,175,111,239,31,159,95,223,63,191,127,255
  );
begin
  Result := Table[b];
end;

This is more than 10 times faster than the version of the code that operates on individual bits.

Finally, I don't normally like to comment too negatively on accepted answers when I have a competing answer. In this case there are very serious problems with the answer that you accepted that I would like to state clearly for you and also for any future readers.

You accepted @Arioch's answer at the time when it contained the same Pascal code as can be seen in this answer, together with two assembler versions. It turns out that those assembler versions are much slower than the Pascal version. They are twice as slow as the Pascal code.

It is a common fallacy that converting high level code to assembler results in faster code. If you do it badly then you can easily produce code that runs more slowly than the code emitted by the compiler. There are times when it is worth writing code in assembler but you must not ever do so without proper benchmarking.

What is particularly egregious about the use of assembler here is that it is so obvious that the table based solution will be exceedingly fast. It's hard to imagine how that could be significantly improved upon.

回答2:

function BitFlip(B: Byte): Byte;
const
  N: array[0..15] of Byte = (0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
begin
  Result := N[B div 16] or N[B mod 16] shl 4;
end;

回答3:

function ByteReverseLoop(b: byte): byte;
var i: integer;
begin
  Result := 0; // actually not needed, just to make compiler happy

  for i := 1 to 8 do
  begin
    Result := Result shl 1; 
    if Odd(b) then Result := Result or 1;
    b := b shr 1;
  end;
end;

If speed is important, then you can use lookup table. You feel it once on program start and then you just take a value from table. Since you're only needing to map byte to byte, that would take 256x1=256 bytes of memory. And given recent Delphi versions support inline functions, that would provide for both speed, readability and reliability (incapsulating array lookup in the function you may be sure you would not change the values due to some typo)

Var ByteReverseLUT: array[byte] of byte;

function ByteReverse(b: byte): byte; inline;
begin Result := ByteReverseLUT[b] end;

{Unit/program initialization}
var b: byte;
    for b := Low(ByteReverseLUT) to High(ByteReverseLUT) 
        do ByteReverseLUT[b] := ByteReverseLoop(b);

Speed comparison of several implementations that were mentioned on this forum.
AMD Phenom2 x710 / Win7 x64 / Delphi XE2 32-bit {$O+}

Pascal AND original:       12494
Pascal AND reversed:       33459
Pascal IF original:        46829
Pascal IF reversed:        45585

       Asm SHIFT 1:        15802
       Asm SHIFT 2:        15490
       Asm SHIFT 3:        16212

         Asm AND 1:        19408
         Asm AND 2:        19601
         Asm AND 3:        19802

Pascal AND unrolled: 10052 Asm Shift unrolled: 4573 LUT, called: 3192 Pascal math, called: 4614

http://pastebin.ca/2304708

Note: LUT (lookup table) timings are probably rather optimistic here. Due to running in tight loop the whole table was sucked into L1 CPU cache. In real computations this function most probably would be called much less frequently and L1 cache would not keep the table entirely.

Pascal inlined function calls result are bogus - Delphi did not called them, detecting they had no side-effects. But funny - the timings were different.

      Asm Shift unrolled:         4040
             LUT, called:         3011
            LUT, inlined:          977
         Pascal unrolled:        10052
  Pas. unrolled, inlined:          849
     Pascal math, called:         4614
    Pascal math, inlined:         6517

And below the explanation:

Project1.dpr.427: d := BitFlipLUT(i)
0044AC45 8BC3             mov eax,ebx
0044AC47 E89CCAFFFF       call BitFlipLUT

Project1.dpr.435: d := BitFlipLUTi(i)

Project1.dpr.444: d := MirrorByte(i);
0044ACF8 8BC3             mov eax,ebx
0044ACFA E881C8FFFF       call MirrorByte

Project1.dpr.453: d := MirrorByteI(i);
0044AD55 8BC3             mov eax,ebx

Project1.dpr.460: d := MirrorByte7Op(i);
0044ADA3 8BC3             mov eax,ebx
0044ADA5 E8AEC7FFFF       call MirrorByte7Op

Project1.dpr.462: d := MirrorByte7OpI(i);
0044ADF1 0FB6C3           movzx eax,bl

All calls to inlined functions were eliminated. Yet about passing the parameters Delphi made three different decisions:

For the 1st call it eliminated parameter passing together with function call
For the 2nd call it kept parameter passing, despite function was not called
For the 3rd call it kept changed parameter passing, which proved longer then function call itself! Weird! :-)

回答4:

Using brute force can be simple and effective.

This routine is NOT on par with David's LUT solution.

Update

Added array of byte as input and result assigned to array of byte as well. This shows better performance for the LUT solution.

function MirrorByte(b : Byte) : Byte;  inline;
begin
  Result :=
    ((b and $01) shl 7) or
    ((b and $02) shl 5) or
    ((b and $04) shl 3) or
    ((b and $08) shl 1) or
    ((b and $10) shr 1) or
    ((b and $20) shr 3) or
    ((b and $40) shr 5) or
    ((b and $80) shr 7);
end;

Update 2

Googling a little, found BitReverseObvious.

function MirrorByte7Op(b : Byte) : Byte;  inline;
begin
  Result :=
    {$IFDEF WIN64}  // This is slightly better in x64 than the code in x32
    (((b * UInt64($80200802)) and UInt64($0884422110)) * UInt64($0101010101)) shr 32;
    {$ENDIF}
    {$IFDEF WIN32}
    ((b * $0802 and $22110) or (b * $8020 and $88440)) * $10101 shr 16;
    {$ENDIF}
end;

This one is closer to the LUT solution, even faster in one test.

To sum up, MirrorByte7Op() is 5-30% slower than LUT in 3 of the tests, 5% faster in one test.

Code to benchmark:

uses
  System.Diagnostics;

const 
  cBit : Byte = $AA;
  cLoopMax = 1000000000;
var
  sw : TStopWatch;
  arrB : array of byte;
  i : Integer;

begin
  SetLength(arrB,cLoopMax);
  for i := 0 TO Length(arrB) - 1 do
    arrB[i]:= System.Random(256);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    b := b;
  end;
  sw.Stop;
  WriteLn('Loop             ',b:3,' ',sw.ElapsedMilliSeconds);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    b := ReflectBits(arrB[i]); 
  end;
  sw.Stop;
  WriteLn('RB array in:     ',b:3,' ',sw.ElapsedMilliSeconds);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    b := MirrorByte(arrB[i]);
  end;
  sw.Stop;
  WriteLn('MB array in:     ',b:3,' ',sw.ElapsedMilliSeconds);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    b := MirrorByte7Op(arrB[i]);
  end;
  sw.Stop;
  WriteLn('MB7Op array in : ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    arrB[i] := ReflectBits(arrB[i]);
  end;
  sw.Stop;
  WriteLn('RB array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    arrB[i]:= MirrorByte(arrB[i]);
  end;
  sw.Stop;
  WriteLn('MB array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);

  sw := TStopWatch.StartNew;
  for i := 0 to Pred(cLoopMax) do
  begin
    arrB[i]:= MirrorByte7Op(arrB[i]);
  end;
  sw.Stop;
  WriteLn('MB7Op array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);

  ReadLn;

end.

Result of benchmark (XE3, i7 CPU 870):

                                32 bit     64 bit
--------------------------------------------------
Byte assignment (= empty loop)   599 ms    2117 ms
MirrorByte    to byte, array in 6991 ms    8746 ms
MirrorByte7Op to byte, array in 1384 ms    2510 ms
ReverseBits   to byte, array in  945 ms    2119 ms
--------------------------------------------------
ReverseBits   array in/out      1944 ms    3721 ms
MirrorByte7Op array in/out      1790 ms    3856 ms
BitFlipNibble array in/out      1995 ms    6730 ms
MirrorByte    array in/out      7157 ms    8894 ms
ByteReverse   array in/out     38246 ms   42303 ms

I added some of the other proposals in the last part of the table (all inlined). It is probably most fair to test in a loop with an array in and an array as result. ReverseBits (LUT) and MirrorByte7Op are comparable in speed followed by BitFlipNibble (LUT) which underperforms a bit in x64.

Note: I added a new algorithm for the x64 bit part of MirrorByte7Op. It makes better use of the 64 bit registers and has fewer instructions.

来源：https://stackoverflow.com/questions/14400845/how-can-i-bit-reflect-a-byte-in-delphi

标签

Delphi

byte

bit-manipulation