How do I implement an efficient 32 bit DivMod in 64 bit code

前提是你 提交于 2019-11-28 11:10:13
J...

For the special case of always dividing by 10 (per comments) you can do something like this :

procedure DivMod10(num : Cardinal; var q, r : Cardinal); inline;
var
  rl : uInt64;
begin
  rl := UInt64(3435973837)*num;
  q := rl shr 35;
  r := num - q*10;
end;

The algorithm varies depending on the denominator, but the source for determining it and the magic numbers can be found in libdivide. This is tested accurate for all unsigned 32-bit integers and is about 3 times faster than using div (and provides the remainder).

Benchmark (optimizations on):

  t0 := GetTickCount;
  for I := 1 to 999999999 do begin
    DivMod10(i, q, r);
  end;
  ShowMessage(IntToStr(GetTickCount - t0));  // result :  1809

  t0 := GetTickCount;
  for I := 1 to 999999999 do begin
    q := i div 10;
  end;
  ShowMessage(IntToStr(GetTickCount - t0));  // result :  5336

Test :

for I := 1 to High(Cardinal) do begin
  DivMod10(i,q,r);
  if q <> (i div 10) then WriteLn(IntToStr(i));
  // no mismatch found
end;

I dug a bit deeper. I think it would be perfectly reasonably to implement this on top of the UInt64 version. That would look like this:

procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal);
var
  Quotient64, Remainder64: UInt64;
begin
  DivMod(Dividend, Divisor, Quotient64, Remainder64);
  Quotient := Quotient64;
  Remainder := Remainder64;
end;

I don't think the performance would be very significantly affected in comparison to the most optimal asm version.

However, I believe that the x64 asm code in the question is correct. The MOV instructions are all fine with 32 bit operands. And the DIV is also as described in the comment in the asm code. The Intel documentation for DIV r/m32 says:

Unsigned divide EDX:EAX by r/m32, with result stored in EAX ← Quotient, EDX ← Remainder.

And let's take a look at what the Delphi compiler does with this code:

var
  a, b, c, d: Cardinal;
....
a := 666;
b := 42;
c := a div b;
d := a mod b;

The code that is produced is:

    
Project39.dpr.14: a := 666;
0000000000423A68 C7450C9A020000   mov [rbp+$0c],$0000029a
Project39.dpr.15: b := 42;
0000000000423A6F C745082A000000   mov [rbp+$08],$0000002a
Project39.dpr.16: c := a div b;
0000000000423A76 8B450C           mov eax,[rbp+$0c]
0000000000423A79 33D2             xor edx,edx
0000000000423A7B F77508           div dword ptr [rbp+$08]
0000000000423A7E 894504           mov [rbp+$04],eax
Project39.dpr.17: d := a mod b;
0000000000423A81 8B450C           mov eax,[rbp+$0c]
0000000000423A84 33D2             xor edx,edx
0000000000423A86 F77508           div dword ptr [rbp+$08]
0000000000423A89 895500           mov [rbp+$00],edx

I don't have any expectation that the 32 bit divide will be more efficient than a 64 bit divide, but that doesn't really matter. It seems more natural to perform the 32 bit operation with 32 bit operands.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!