问题
My processor, a small 16-bit microcontroller with no FPU and integer math only has 16/16 division and 32/16 division which both take 18 cycles. At the moment I'm using a very slow software routine (~7,500 cycles) to do 64/32 division. Is there any way to use these division engines to calculate a 64/32 division? Similar to how I'm already using the 16x16 multiplier and adder to calculate 32x32 multiplies? I'm using C but can work with any general explanation on how it can be done... I'm hoping to target <200 cycles (if it's at all possible.)
回答1:
See "Hacker's Delight", multiword division (pages 140-145).
The basic concept (going back to Knuth) is to think of your problem in base-65536 terms. Then you have a 4 digit by 2 digit division problem, with 2/1 digit division as a primitive.
The C code is here: https://github.com/hcs0/Hackers-Delight/blob/master/divmnu.c.txt
回答2:
My copy of Knuth (The Art of Computer Programming) is at work, so I can't check it until Monday, but that would be my first source. It has a whole section on arithmetic.
edit: your post about "16/16 division and 32/16 division which both take 18 cycles." -- dsPICs have a conditional subtract operation in assembly. Consider using this as your computational primitive.
Also note that if X = XH * 232 + XL and D = DH * 216 + DL, then if you are looking for
(Q,R) = X/D where X = Q * D + R
where Q = QH * 216 + QL, R = RH * 216 + RL, then
XH * 232 + XL = DH * QH * 232 + (DL * QH + DH * QL) * 216 + (DL * QL) + RH * 216 + RL
This suggests (by looking at terms that are the high 32 bits) to use the following procedure, akin to long division:
- (QH, R0) = XH / (DH+1) -> XH = QH * (DH+1) + R0 [32/16 divide]
- R1 = X - (QH * 216) * D [requires a 16*32 multiply, a shift-left by 16, and a 64-bit subtract]
- calculate R1' = R1 - D * 216
- while R1' >= 0, adjust QH upwards by 1, set R1 = R1', and goto step 3
- (QL, R2) = (R1 >> 16) / (DH+1) -> R1 = QL * (DH+1) + R2 [32/16 divide]
- R3 = R1 - (QL * D) [requires a 16*32 multiply and a 48-bit subtract]
- calculate R3' = R3 - D
- while R3' >= 0, adjust QL upwards by 1, set R3 = R3', and goto step 7
Your 32-bit quotient is the pair (QH,QL), and 32-bit remainder is R3.
(This assumes that the quotient is not larger than 32-bit, which you need to know ahead of time, and can easily check ahead of time.)
回答3:
Starting point would be: D. Knuth, The Art of Computer Programming Vol.2, Section 4.3.1, Algorithm D
But I suppose you may need to optimize the algorithm.
回答4:
You may want to look at Booth's Algorithm
(http://www.scribd.com/doc/3132888/Booths-Algorithm-Multiplication-Division).
The part you want is about 1/2 way down the page.
I haven't looked at this since my VLSI class, but, this may be your best bet, if possible you may want to do this in assembly, to optimize it as much as possible, if you will be calling this often.
Basically involves shifting and adding or subtracting.
来源:https://stackoverflow.com/questions/4771823/64-32-bit-division-on-a-processor-with-32-16-bit-division