问题
Is there an easy way to bit-reflect a byte variable in Delphi so that the most significant bit (MSB) gets the least significant bit (LSB) and vice versa?
回答1:
In code you can do it like this:
function ReverseBits(b: Byte): Byte;
var
i: Integer;
begin
Result := 0;
for i := 1 to 8 do
begin
Result := (Result shl 1) or (b and 1);
b := b shr 1;
end;
end;
But a lookup table would be much more efficient, and only consume 256 bytes of memory.
function ReverseBits(b: Byte): Byte; inline;
const
Table: array [Byte] of Byte = (
0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240,
8,136,72,200,40,168,104,232,24,152,88,216,56,184,120,248,
4,132,68,196,36,164,100,228,20,148,84,212,52,180,116,244,
12,140,76,204,44,172,108,236,28,156,92,220,60,188,124,252,
2,130,66,194,34,162,98,226,18,146,82,210,50,178,114,242,
10,138,74,202,42,170,106,234,26,154,90,218,58,186,122,250,
6,134,70,198,38,166,102,230,22,150,86,214,54,182,118,246,
14,142,78,206,46,174,110,238,30,158,94,222,62,190,126,254,
1,129,65,193,33,161,97,225,17,145,81,209,49,177,113,241,
9,137,73,201,41,169,105,233,25,153,89,217,57,185,121,249,
5,133,69,197,37,165,101,229,21,149,85,213,53,181,117,245,
13,141,77,205,45,173,109,237,29,157,93,221,61,189,125,253,
3,131,67,195,35,163,99,227,19,147,83,211,51,179,115,243,
11,139,75,203,43,171,107,235,27,155,91,219,59,187,123,251,
7,135,71,199,39,167,103,231,23,151,87,215,55,183,119,247,
15,143,79,207,47,175,111,239,31,159,95,223,63,191,127,255
);
begin
Result := Table[b];
end;
This is more than 10 times faster than the version of the code that operates on individual bits.
Finally, I don't normally like to comment too negatively on accepted answers when I have a competing answer. In this case there are very serious problems with the answer that you accepted that I would like to state clearly for you and also for any future readers.
You accepted @Arioch's answer at the time when it contained the same Pascal code as can be seen in this answer, together with two assembler versions. It turns out that those assembler versions are much slower than the Pascal version. They are twice as slow as the Pascal code.
It is a common fallacy that converting high level code to assembler results in faster code. If you do it badly then you can easily produce code that runs more slowly than the code emitted by the compiler. There are times when it is worth writing code in assembler but you must not ever do so without proper benchmarking.
What is particularly egregious about the use of assembler here is that it is so obvious that the table based solution will be exceedingly fast. It's hard to imagine how that could be significantly improved upon.
回答2:
function BitFlip(B: Byte): Byte;
const
N: array[0..15] of Byte = (0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
begin
Result := N[B div 16] or N[B mod 16] shl 4;
end;
回答3:
function ByteReverseLoop(b: byte): byte;
var i: integer;
begin
Result := 0; // actually not needed, just to make compiler happy
for i := 1 to 8 do
begin
Result := Result shl 1;
if Odd(b) then Result := Result or 1;
b := b shr 1;
end;
end;
If speed is important, then you can use lookup table. You feel it once on program start and then you just take a value from table. Since you're only needing to map byte to byte, that would take 256x1=256 bytes of memory. And given recent Delphi versions support inline functions, that would provide for both speed, readability and reliability (incapsulating array lookup in the function you may be sure you would not change the values due to some typo)
Var ByteReverseLUT: array[byte] of byte;
function ByteReverse(b: byte): byte; inline;
begin Result := ByteReverseLUT[b] end;
{Unit/program initialization}
var b: byte;
for b := Low(ByteReverseLUT) to High(ByteReverseLUT)
do ByteReverseLUT[b] := ByteReverseLoop(b);
Speed comparison of several implementations that were mentioned on this forum.
AMD Phenom2 x710 / Win7 x64 / Delphi XE2 32-bit {$O+}
Pascal AND original: 12494
Pascal AND reversed: 33459
Pascal IF original: 46829
Pascal IF reversed: 45585
Asm SHIFT 1: 15802
Asm SHIFT 2: 15490
Asm SHIFT 3: 16212
Asm AND 1: 19408
Asm AND 2: 19601
Asm AND 3: 19802
Pascal AND unrolled: 10052 Asm Shift unrolled: 4573 LUT, called: 3192 Pascal math, called: 4614
http://pastebin.ca/2304708
Note: LUT (lookup table) timings are probably rather optimistic here. Due to running in tight loop the whole table was sucked into L1 CPU cache. In real computations this function most probably would be called much less frequently and L1 cache would not keep the table entirely.
Pascal inlined function calls result are bogus - Delphi did not called them, detecting they had no side-effects. But funny - the timings were different.
Asm Shift unrolled: 4040
LUT, called: 3011
LUT, inlined: 977
Pascal unrolled: 10052
Pas. unrolled, inlined: 849
Pascal math, called: 4614
Pascal math, inlined: 6517
And below the explanation:
Project1.dpr.427: d := BitFlipLUT(i)
0044AC45 8BC3 mov eax,ebx
0044AC47 E89CCAFFFF call BitFlipLUT
Project1.dpr.435: d := BitFlipLUTi(i)
Project1.dpr.444: d := MirrorByte(i);
0044ACF8 8BC3 mov eax,ebx
0044ACFA E881C8FFFF call MirrorByte
Project1.dpr.453: d := MirrorByteI(i);
0044AD55 8BC3 mov eax,ebx
Project1.dpr.460: d := MirrorByte7Op(i);
0044ADA3 8BC3 mov eax,ebx
0044ADA5 E8AEC7FFFF call MirrorByte7Op
Project1.dpr.462: d := MirrorByte7OpI(i);
0044ADF1 0FB6C3 movzx eax,bl
All calls to inlined functions were eliminated. Yet about passing the parameters Delphi made three different decisions:
- For the 1st call it eliminated parameter passing together with function call
- For the 2nd call it kept parameter passing, despite function was not called
- For the 3rd call it kept changed parameter passing, which proved longer then function call itself! Weird! :-)
回答4:
Using brute force can be simple and effective.
This routine is NOT on par with David's LUT solution.
Update
Added array of byte as input and result assigned to array of byte as well. This shows better performance for the LUT solution.
function MirrorByte(b : Byte) : Byte; inline;
begin
Result :=
((b and $01) shl 7) or
((b and $02) shl 5) or
((b and $04) shl 3) or
((b and $08) shl 1) or
((b and $10) shr 1) or
((b and $20) shr 3) or
((b and $40) shr 5) or
((b and $80) shr 7);
end;
Update 2
Googling a little, found BitReverseObvious.
function MirrorByte7Op(b : Byte) : Byte; inline;
begin
Result :=
{$IFDEF WIN64} // This is slightly better in x64 than the code in x32
(((b * UInt64($80200802)) and UInt64($0884422110)) * UInt64($0101010101)) shr 32;
{$ENDIF}
{$IFDEF WIN32}
((b * $0802 and $22110) or (b * $8020 and $88440)) * $10101 shr 16;
{$ENDIF}
end;
This one is closer to the LUT solution, even faster in one test.
To sum up, MirrorByte7Op() is 5-30% slower than LUT in 3 of the tests, 5% faster in one test.
Code to benchmark:
uses
System.Diagnostics;
const
cBit : Byte = $AA;
cLoopMax = 1000000000;
var
sw : TStopWatch;
arrB : array of byte;
i : Integer;
begin
SetLength(arrB,cLoopMax);
for i := 0 TO Length(arrB) - 1 do
arrB[i]:= System.Random(256);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := b;
end;
sw.Stop;
WriteLn('Loop ',b:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := ReflectBits(arrB[i]);
end;
sw.Stop;
WriteLn('RB array in: ',b:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := MirrorByte(arrB[i]);
end;
sw.Stop;
WriteLn('MB array in: ',b:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
b := MirrorByte7Op(arrB[i]);
end;
sw.Stop;
WriteLn('MB7Op array in : ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
arrB[i] := ReflectBits(arrB[i]);
end;
sw.Stop;
WriteLn('RB array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
arrB[i]:= MirrorByte(arrB[i]);
end;
sw.Stop;
WriteLn('MB array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
sw := TStopWatch.StartNew;
for i := 0 to Pred(cLoopMax) do
begin
arrB[i]:= MirrorByte7Op(arrB[i]);
end;
sw.Stop;
WriteLn('MB7Op array in/out: ',arrB[0]:3,' ',sw.ElapsedMilliSeconds);
ReadLn;
end.
Result of benchmark (XE3, i7 CPU 870):
32 bit 64 bit
--------------------------------------------------
Byte assignment (= empty loop) 599 ms 2117 ms
MirrorByte to byte, array in 6991 ms 8746 ms
MirrorByte7Op to byte, array in 1384 ms 2510 ms
ReverseBits to byte, array in 945 ms 2119 ms
--------------------------------------------------
ReverseBits array in/out 1944 ms 3721 ms
MirrorByte7Op array in/out 1790 ms 3856 ms
BitFlipNibble array in/out 1995 ms 6730 ms
MirrorByte array in/out 7157 ms 8894 ms
ByteReverse array in/out 38246 ms 42303 ms
I added some of the other proposals in the last part of the table (all inlined). It is probably most fair to test in a loop with an array in and an array as result. ReverseBits
(LUT) and MirrorByte7Op
are comparable in speed followed by BitFlipNibble
(LUT) which underperforms a bit in x64.
Note: I added a new algorithm for the x64 bit part of MirrorByte7Op
. It makes better use of the 64 bit registers and has fewer instructions.
来源:https://stackoverflow.com/questions/14400845/how-can-i-bit-reflect-a-byte-in-delphi