Multiplying 32 bit two numbers on 8086 microprocessor

前端 未结 3 2039
没有蜡笔的小新
没有蜡笔的小新 2021-01-23 06:37

I have code example for multiplying two 16 bit numbers on 8086 and trying to update it for two 32 bit numbers multiplying.

start:
 MOV AX,0002h ; 16 bit multipli         


        
3条回答
  •  萌比男神i
    2021-01-23 07:28

    For the record, 8086 has a mul instruction that makes this much easier (and more efficient on later CPUs with fast mul). On original 8086 it was really slow, but running an RCL multi-precision shift loop 32 times sucks a lot on all CPUs! This version has less static code size, which is nice.

    You only need three mul instructions to get the low * low, low * high, and high * low products. (And if you wanted the full 64-bit result, another one for the high * high product).

    8086 is missing the efficient imul reg, reg form that doesn't need DX:AX as an implicit output, and that doesn't put the high half anywhere. So unfortunately we need more register shuffling than a compiler would for a 64x64 => 64 multiply in 32-bit mode, but otherwise this is exactly the same problem. (See https://godbolt.org/z/ozSkt_)

    x_lo, x_hi, y_lo, and y_hi can be memory relative to bp as locals or function args, or labels. Or some of those could be in registers that this function doesn't use, if you change the syntax so they're not addressing modes.

    ;; untested
    ;; inputs: uint32_t x, y in memory
    ;; clobbers: CX, SI, DI
    
        mov     ax, [y_lo]
        mov     cx, ax
        mul     word ptr [x_hi]
        mov     si, ax            ; save  y_lo * x_hi
    
        mov     ax, [x_lo]
        mov     di, ax
        mul     word ptr [y_hi]
        add     si, ax            ; sum of the cross products
    
        mov     ax, di
        mul     cx                ; DX:AX = y_lo * x_lo
        add     dx, si            ; add the cross products into the high half
    ;; Result: uint32_t DX:AX = X * Y
    

    To use fewer tmp registers, you could just reload x_lo and y_lo from memory twice each instead of saving them in DI and CX.

    Note that we don't save the high-half DX results of either lo * hi product because we only want a 32-bit result, not a full 32x32 => 64-bit result. The low 16 bits of those products add into the top half our our final 32-bit product. (And we don't need carry-out from them into the top-most 16-bit word of a 64-bit result, so we can add them before the last mul.)

    A 16 * 32 => 32-bit multiply would be even easier, just two mul and one add (plus a bunch of mov to get data into the right places). See for example a factorial loop that does this: multiply two consecutive times in assembly language program (that answer also shows how extended-precision multiply math works, the same way you add terms for the paper & pencil algorithm for doing multiplication on numbers of multiple decimal digits.)

提交回复
热议问题