As the question suggests, I have to write a MASM program to convert an integer to binary. I have tried many different approaches, but none of them helped me at all. The fina
For the decimal-string -> integer part, see NASM Assembly convert input to integer?
You can do the whole thing without any loops, using SIMD to do all the bits in parallel.
(Or for only 8 bits -> 8 bytes in 32-bit mode, just a mul
into EDX:EAX, 2x and
, 2x shr
: How to create a byte out of 8 bool values (and vice versa)?.
Then to ASCII with 2x add reg, '0000'
and 2x bswap
into MSB-first printing order. That's a lot of instructions total, SSE2 may be better in 32-bit mode. If you don't have SSE2, this is certainly better than looping every bit. 64-bit mode can deal with all 8 bytes in one register, and SSE2 is guaranteed to be available.
Also related:
The integer -> base 2 string part is much simpler, and can be done efficiently using only SSE2.
This uses the same technique as Evgeny Kluev's answer on a question about doing the inverse of PMOVMSKB, to turn a bit-pattern into a vector of 0 / -1 elements: broadcast-shuffle the input bytes so every vector element contains the bit you want (plus neighbours). AND that to leave only zero or 1, then compare against an all-zero vector.
This version only requires SSE2, so it works on every CPU that can run a 64-bit OS, and some 32-bit-only CPUs (like early Pentium4 and Pentium-M). It can go faster with SSSE3 (one PSHUFB instead of three shuffles to get the low and high bytes where we want them). You could do 8 bits -> 8 bytes at a time with MMX.
I'm not going to try to convert it from NASM to MASM syntax. I have actually tested this, and it works. The x86 32-bit System V calling convention doesn't differ from the 32-bit Windows cdecl calling convention in any ways that affect this code, AFAIK.
;;; Tested and works
;; nasm syntax, 32-bit System V (or Windows cdecl) calling convention:
;;;; char *numberToBin(uint16_t num, char buf[17]);
;; returns buf.
ALIGN 16
global numberToBin
numberToBin:
movd xmm0, [esp+4] ; 32-bit load even though we only care about the low 16 bits.
mov eax, [esp+8] ; buffer pointer
; to print left-to-right, we need the high bit to go in the first (low) byte
punpcklbw xmm0, xmm0 ; llhh (from low to high byte elements)
pshuflw xmm0, xmm0, 00000101b ; hhhhllll
punpckldq xmm0, xmm0 ; hhhhhhhhllllllll
; or with SSSE3:
; pshufb xmm0, [shuf_broadcast_hi_lo] ; SSSE3
pand xmm0, [bitmask] ; each input bit is now isolated within the corresponding output byte
; compare it against zero
pxor xmm1,xmm1
pcmpeqb xmm0, xmm1 ; -1 in elements that are 0, 0 in elements with any non-zero bit.
paddb xmm0, [ascii_ones] ; '1' + (-1 or 0) = '0' or 1'
mov byte [eax+16], 0 ; terminating zero
movups [eax], xmm0
ret
section .rodata
ALIGN 16
;; only used for SSSE3
shuf_broadcast_hi_lo:
db 1,1,1,1, 1,1,1,1 ; broadcast the second 8 bits to the first 8 bytes
db 0,0,0,0, 0,0,0,0 ; broadcast the first 8 bits to the second 8 bytes
bitmask: ; select the relevant bit within each byte, from high to low for printing
db 1<<7, 1<<6, 1<<5, 1<<4
db 1<<3, 1<<2, 1<<1, 1<<0
db 1<<7, 1<<6, 1<<5, 1<<4
db 1<<3, 1<<2, 1<<1, 1<<0
ascii_ones:
times 16 db '1'
Using PSHUFLW to do the reversal in the second shuffle step is faster on old CPUs (first-gen Core2 and older) that have slow 128b shuffles, because shuffling only the low 64 bits is fast. (Compared to using PUNPCKLWD / PSHUFD). See Agner Fog's Optimizing Assembly guide to learn more about writing efficient asm, and other links in the x86 tag wiki.
(Thanks to clang for spotting the possibility).
If you were using this in a loop, you'd load the vector constants into vector registers instead of re-loading them every time.
From asm, you can call it like
sub esp, 32
push esp ; buf[] on the stack
push 0xfba9 ; use a constant num for exmaple
call numberToBin
add esp, 8
;; esp is now pointing at the string
Or call it from C or C++ with the prototype from comments in the asm.