Creating an x86 assembler program that converts an integer to a 16-bit binary string of 0's and 1's

前端未结

关注

 3  1856

谎友^ 2020-12-12 04:15

As the question suggests, I have to write a MASM program to convert an integer to binary. I have tried many different approaches, but none of them helped me at all. The fina

3条回答

时光取名叫无心 (楼主)

2020-12-12 04:48
For the decimal-string -> integer part, see NASM Assembly convert input to integer?

You can do the whole thing without any loops, using SIMD to do all the bits in parallel.

(Or for only 8 bits -> 8 bytes in 32-bit mode, just a mul into EDX:EAX, 2x and, 2x shr: How to create a byte out of 8 bool values (and vice versa)?.
Then to ASCII with 2x add reg, '0000' and 2x bswap into MSB-first printing order. That's a lot of instructions total, SSE2 may be better in 32-bit mode. If you don't have SSE2, this is certainly better than looping every bit. 64-bit mode can deal with all 8 bytes in one register, and SSE2 is guaranteed to be available.

Also related:
- int -> hex string including SIMD.
- int -> decimal string (or other non-power-of-2 bases)
- How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD for a neat SIMD version that's efficient when you want to stop with 0/1 integers instead of ASCII digits. It's specifically optimized for 8 bits -> 8 bytes.
The integer -> base 2 string part is much simpler, and can be done efficiently using only SSE2.

This uses the same technique as Evgeny Kluev's answer on a question about doing the inverse of PMOVMSKB, to turn a bit-pattern into a vector of 0 / -1 elements: broadcast-shuffle the input bytes so every vector element contains the bit you want (plus neighbours). AND that to leave only zero or 1, then compare against an all-zero vector.

This version only requires SSE2, so it works on every CPU that can run a 64-bit OS, and some 32-bit-only CPUs (like early Pentium4 and Pentium-M). It can go faster with SSSE3 (one PSHUFB instead of three shuffles to get the low and high bytes where we want them). You could do 8 bits -> 8 bytes at a time with MMX.

I'm not going to try to convert it from NASM to MASM syntax. I have actually tested this, and it works. The x86 32-bit System V calling convention doesn't differ from the 32-bit Windows cdecl calling convention in any ways that affect this code, AFAIK.
```
;;; Tested and works

;; nasm syntax, 32-bit System V (or Windows cdecl) calling convention:
;;;; char *numberToBin(uint16_t num, char buf[17]);
;; returns buf.

ALIGN 16
global numberToBin
numberToBin:
        movd    xmm0, [esp+4]       ; 32-bit load even though we only care about the low 16 bits.
        mov     eax, [esp+8]        ; buffer pointer

        ; to print left-to-right, we need the high bit to go in the first (low) byte
        punpcklbw xmm0, xmm0              ; llhh      (from low to high byte elements)
        pshuflw   xmm0, xmm0, 00000101b   ; hhhhllll
        punpckldq xmm0, xmm0              ; hhhhhhhhllllllll

        ; or with SSSE3:
        ; pshufb  xmm0, [shuf_broadcast_hi_lo]  ; SSSE3

        pand    xmm0, [bitmask]     ; each input bit is now isolated within the corresponding output byte
        ; compare it against zero
        pxor    xmm1,xmm1
        pcmpeqb xmm0, xmm1          ; -1 in elements that are 0,   0 in elements with any non-zero bit.

        paddb   xmm0, [ascii_ones]  ; '1' +  (-1 or 0) = '0' or 1'

        mov     byte [eax+16], 0    ; terminating zero
        movups  [eax], xmm0
        ret


section .rodata
ALIGN 16

;; only used for SSSE3
shuf_broadcast_hi_lo:
        db 1,1,1,1, 1,1,1,1     ; broadcast the second 8 bits to the first 8 bytes
        db 0,0,0,0, 0,0,0,0     ; broadcast the first 8 bits to the second 8 bytes

bitmask:  ; select the relevant bit within each byte, from high to low for printing
        db 1<<7,  1<<6, 1<<5, 1<<4
        db 1<<3,  1<<2, 1<<1, 1<<0
        db 1<<7,  1<<6, 1<<5, 1<<4
        db 1<<3,  1<<2, 1<<1, 1<<0

ascii_ones:
        times 16 db '1'
```
Using PSHUFLW to do the reversal in the second shuffle step is faster on old CPUs (first-gen Core2 and older) that have slow 128b shuffles, because shuffling only the low 64 bits is fast. (Compared to using PUNPCKLWD / PSHUFD). See Agner Fog's Optimizing Assembly guide to learn more about writing efficient asm, and other links in the x86 tag wiki.

(Thanks to clang for spotting the possibility).

If you were using this in a loop, you'd load the vector constants into vector registers instead of re-loading them every time.

From asm, you can call it like
```
    sub     esp, 32

    push    esp           ; buf[] on the stack
    push    0xfba9        ; use a constant num for exmaple
    call    numberToBin
    add     esp, 8
    ;; esp is now pointing at the string
```
Or call it from C or C++ with the prototype from comments in the asm.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...