How does one do integer (signed or unsigned) division on ARM?

后端 未结 5 1844
长发绾君心
长发绾君心 2020-12-06 00:33

I\'m working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don\'t come with integer division, but what is the best way to do it other than convert

相关标签:
5条回答
  • 2020-12-06 01:12

    I wrote the following functions for the ARM GNU assembler. If you don't have a CPU with udiv/sdiv machine support, just cut out the first few lines up to the "0:" label in either function.

    .arm
    .cpu    cortex-a7
    .syntax unified
    
    .type   udiv,%function
    .globl  udiv
    udiv:   tst     r1,r1
            bne     0f
            udiv    r3,r0,r2
            mls     r1,r2,r3,r0
            mov     r0,r3
            bx      lr
    0:      cmp     r1,r2
            movhs   r1,r2
            bxhs    lr
            mvn     r3,0
    1:      adds    r0,r0
            adcs    r1,r1
            cmpcc   r1,r2
            subcs   r1,r2
            orrcs   r0,1
            lsls    r3,1
            bne     1b
            bx      lr
    .size   udiv,.-udiv
    
    .type   sdiv,%function
    .globl  sdiv
    sdiv:   teq     r1,r0,ASR 31
            bne     0f
            sdiv    r3,r0,r2
            mls     r1,r2,r3,r0
            mov     r0,r3
            bx      lr
    0:      mov     r3,2
            adds    r0,r0
            and     r3,r3,r1,LSR 30
            adcs    r1,r1
            orr     r3,r3,r2,LSR 31
            movvs   r1,r2
            ldrvc   pc,[pc,r3,LSL 2]
            bx      lr
            .int    1f
            .int    3f
            .int    5f
            .int    11f
    1:      cmp     r1,r2
            movge   r1,r2
            bxge    lr
            mvn     r3,1
    2:      adds    r0,r0
            adcs    r1,r1
            cmpvc   r1,r2
            subge   r1,r2
            orrge   r0,1
            lsls    r3,1
            bne     2b
            bx      lr
    3:      cmn     r1,r2
            movge   r1,r2
            bxge    lr
            mvn     r3,1
    4:      adds    r0,r0
            adcs    r1,r1
            cmnvc   r1,r2
            addge   r1,r2
            orrge   r0,1
            lsls    r3,1
            bne     4b
            rsb     r0,0
            bx      lr
    5:      cmn     r1,r2
            blt     6f
            tsteq   r0,r0
            bne     7f
    6:      mov     r1,r2
            bx      lr
    7:      mvn     r3,1
    8:      adds    r0,r0
            adcs    r1,r1
            cmnvc   r1,r2
            blt     9f
            tsteq   r0,r3
            bne     10f
    9:      add     r1,r2
            orr     r0,1
    10:     lsls    r3,1
            bne     8b
            rsb     r0,0
            bx      lr
    11:     cmp     r1,r2
            blt     12f
            tsteq   r0,r0
            bne     13f
    12:     mov     r1,r2
            bx      lr
    13:     mvn     r3,1
    14:     adds    r0,r0
            adcs    r1,r1
            cmpvc   r1,r2
            blt     15f
            tsteq   r0,r3
            bne     16f
    15:     sub     r1,r2
            orr     r0,1
    16:     lsls    r3,1
            bne     14b
            bx      lr
    

    There are two functions, udiv for unsigned integer division and sdiv for signed integer division. They both expect a 64-bit dividend (either signed or unsigned) in r1 (high word) and r0 (low word), and a 32-bit divisor in r2. They return the quotient in r0 and the remainder in r1, thus you can define them in a C header as extern returning a 64-bit integer and mask out the quotient and remainder afterwards. An error (division by 0 or overflow) is indicated by a remainder having an absolute value greater than or equal the absolute value of the divisor. The signed division algorithm uses case distinction by the signs of both dividend and divisor; it does not convert to positive integers first, since that wouldn't detect all overflow conditions properly.

    0 讨论(0)
  • 2020-12-06 01:15

    Division by a constant value is done quickly by doing a 64bit-multiply and shift-right, for example, like this:

    LDR     R3, =0xA151C331
    UMULL   R3, R2, R1, R3
    MOV     R0, R2,LSR#10
    

    here R1 is divided by 1625. The calculation is done like this: 64bitreg(R2:R3) = R1*0xA151C331, then the result is the upper 32bit right shifted by 10:

    R1*0xA151C331/2^(32+10) = R1*0.00061538461545751488 = R1/1624.99999980
    

    You can calculate your own constants from this formula:

    x / N ==  (x*A)/2^(32+n)   -->       A = 2^(32+n)/N
    

    select the largest n, for which A < 2^32

    0 讨论(0)
  • 2020-12-06 01:17

    The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly:

    https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib

    going to float and back is probably not the best solution. you can try it and see how fast it is...This is a multiply but could as easily make it a divide:

    https://github.com/dwelch67/stm32vld/ then stm32f4d/float01/vectors.s

    I didnt time it though to see how fast/slow. Understood I am using a cortex-m above and you are talking about a cortex-a, different ends of the spectrum, similar float instructions and the gcc lib stuff is similar, for the cortex-m I have to build for thumb but you can just as easily build for arm. Actually with gcc it should all just work automagically you should not need to do it the way I did it. Other compilers as well you should not need to do it the way I did it in the adventure game above.

    0 讨论(0)
  • 2020-12-06 01:22

    Some copy-pasta from elsewhere for an integer divide: Basically, 3 instructions per bit. From this website, though I've seen it many other places as well. This site also has a nice version which may be faster in general.

    
    @ Entry  r0: numerator (lo) must be signed positive
    @        r2: deniminator (den) must be non-zero and signed negative
    idiv:
            lo .req r0; hi .req r1; den .req r2
            mov hi, #0 @ hi = 0
            adds lo, lo, lo
            .rept 32 @ repeat 32 times
              adcs hi, den, hi, lsl #1
              subcc hi, hi, den
              adcs lo, lo, lo
            .endr
            mov pc, lr @ return
    @ Exit   r0: quotient (lo)
    @        r1: remainder (hi)
    
    0 讨论(0)
  • 2020-12-06 01:34

    I wrote my own routine to perform an unsigned division as I could not find an unsigned version on the web. I needed to divide a 64 bit value with a 32 bit value to get a 32 bit result.

    The inner loop is not as efficient as the signed solution provided above, but this does support unsigned arithmetic. This routine performs a 32 bit division if the high part of the numerator (hi) is smaller than the denominator (den), otherwise a full 64 bit division is performed (hi:lo/den). The result is in lo.

      cmp     hi, den                   // if hi < den do 32 bits, else 64 bits
      bpl     do64bits
      REPT    32
        adds    lo, lo, lo              // shift numerator through carry
        adcs    hi, hi, hi
        subscc  work, hi, den           // if carry not set, compare        
        subcs   hi, hi, den             // if carry set, subtract
        addcs   lo, lo, #1              // if carry set, and 1 to quotient
      ENDR
    
      mov     r0, lo                    // move result into R0
      mov     pc, lr                    // return
    
    do64bits:
      mov     top, #0
      REPT    64
        adds    lo, lo, lo              // shift numerator through carry
        adcs    hi, hi, hi
        adcs    top, top, top
        subscc  work, top, den          // if carry not set, compare        
        subcs   top, top, den           // if carry set, subtract
        addcs   lo, lo, #1              // if carry set, and 1 to quotient
      ENDR
      mov     r0, lo                    // move result into R0
      mov     pc, lr                    // return
    

    Extra checking for boundary conditions and power of 2 can be added. Full details can be found at http://www.idwiz.co.za/Tips%20and%20Tricks/Divide.htm

    0 讨论(0)
提交回复
热议问题