问题
As the question suggests, I have to write a MASM program to convert an integer to binary. I have tried many different approaches, but none of them helped me at all. The final code I'm working on is as follows. I get an access memory violation error when I debug my code in Visual Studio.
Any help on how to solve the error and if I'm on the right track or not will be greatly appreciated. The first code is my C++ code which passes a char array to an .asm
file to be converted to binary.
#include <iostream>
using namespace std;
extern "C"
{
int intToBin(char*);
}
int main()
{
char str[17] = { NULL };
for (int i = 0; i < 16; i++)
{
str[i] = '0';
}
cout << "Please enter an integer number :";
cin >>str;
intToBin(str);
cout << " the equivilant binaryis: " << str << endl;
return 0;
}
and the .asm
file is the following:
.686
.model small
.code
_intToBin PROC ;name of fucntion
start:
push ebp ; save base pointer
mov ebp, esp ; establish stack frame
mov eax, [ebp+8] ; stroing char value into eax
mov ebx, [ebp+12]; adress offset of char array
mov edx,32768 ;storin max 16bit binary in edx
mov ecx,17 ; since its a 16 bit , we do the loop 17 times
nextBite:
test eax,edx ;testing if eax is equal to edx
jz storeZero ;if it is 0 is to be moved into bl
mov bl,'1' ;if not 1 is moved into bl
jmp storeAscBit ;then jump to store ascii bit
storeZero:
mov bl,'0' ;moving 0 into bl register
storeAscBit:
mov [di ],bl ;moving bl (either 1 or 9) into [di]
inc edx ;increasing edx stack by 1 point to go to next bt
shr edx,1 ;shfiting right 1 time so the 1 comes to second
loop nextBite ; do the whole step again
EndifReach:
pop ebp
_intToBin ENDP
END
回答1:
Next is an example of using "atoi" to convert the string to number, then use assembly to convert the number to binary:
#include "stdafx.h"
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{ char str[6]; // ◄■■ NUMBER IN STRING FORMAT.
int num; // ◄■■ NUMBER IN NUMERIC FORMAT.
char bin[33] = " "; // ◄■■ BUFFER FOR ONES AND ZEROES.
cout << "Enter a number: ";
cin >> str; // ◄■■ CAPTURE NUMBER AS STRING.
num = atoi(str); // ◄■■ CONVERT STRING TO NUMBER.
__asm {
mov eax, num // ◄■■ THE NUMBER.
lea edi, bin // ◄■■ POINT TO VARIABLE "BIN".
mov ecx, 32 // ◄■■ NUMBER IS 32 BITS.
conversion:
shl eax, 1 // ◄■■ GET LEFTMOST BIT.
jc bit1 // ◄■■ IF EXTRACTED BIT == 1
mov [edi], '0'
jmp skip
bit1:
mov [edi], '1'
skip :
inc edi // ◄■■ NEXT POSITION IN "BIN".
loop conversion
}
cout << bin;
return 0;
}
回答2:
This is high-level answer to explain some terms.
Part 1 - about integer numbers and their encoding in computer
Integer value is integer value, in math it's purely abstract thing. Number "5" is not what you see on the monitor (that's digit 5 (graphical image or "glyph") representing value 5 in base-10 (decimal) format for humans (and some trained animals) who can recognize that glyph pattern; the value 5 itself is purely abstract).
When you use int
in C++, it's not completely abstract, it's lot more hard-wired into the metal. It's 32 bit (on most of the platforms) integer value.
But still that abstract description is much closer to truth, than imagining it as human decimal format of it.
int a = 12345; // decimal number
Here a
contains value 12345, not the format. It's not aware it was entered as decimal string in the source code.
int a = 0x3039; // hexadecimal number
will compile into the exactly same machine code, for CPU it's the same thing, still (a == 12345)
. And finally:
int a = 0b0011000000111001; // binary number
is again the same thing. It's still the same 12345
value, just written in different formatting.
The last binary form is closest to what CPU is using to store the value. It is stored in 32 bits (low/high voltage cells/wires), so if you would measure voltage on particular cell/wire, you would see the "0" voltage level on top 18 bits, then 2 bits with "1" voltage level, and then the rest like in that binary format above... With two least significant bits being "0" and "1".
Also most of CPU circuitry is not aware of particular value of particular bit, that's again "interpretation" of that 0/1, done by the code. Many CPU algorithms like add
or sub
work "from right to left" over all bits, not being aware that currently processed bit is representing in final integer value for example 213 value (that's the 14th least significant bit).
It's upon taking those bits, and calculating string with decimal/hexadecimal/binary representation of those bit values, when you give those "1"s their value. So then it becomes text "12345"
.
If you treat those 32 bits in different way, like representation of ON/OFF LED lights for a LED display panel, then so it will be, once you send it from CPU to the display, the LED display panel will turn on corresponding LED lights, not caring that those bits form also 12345
value when treated as int
.
Only very few CPU instructions work in a way, where they need to be aware of particular value of particular bit.
Part 2 - about input, output and arguments of C/C++ functions
You want to "convert decimal integer (input) to binary."
So let's reason what is input and what is output. Input is taken from std::cin
, so the user will enter string.
Yet if you will do:
int inputNum;
std::cin >> inputNum;
You will end with already converted integer value (32 bits, see above) (or invalid std::cin
state, when user will not enter correct number, probably not your task to handle this).
If you have the number in int
, the binary conversion was already done by the clib
, when it was encoding user input string as 32 bit integer.
Now you can create asm function with C prototype:
void formatToBinary(uint16_t value, char result[17]);
That means you will give it uint16_t
(unsigned 16 bit) integer value, and pointer to 17 reserved bytes in memory, where you will write '0'
and '1'
ASCII characters, and terminate it by another 0
value (for rough description of this one follow my first link in comments under your question).
If you must take input as string, ie.
char str[17];
std::cin > str;
Then you will have in str
(after "12345" input) bytes with values: '1'
(49 in decimal), '2'
, '3'
, '4'
, '5'
, 0
. (Note the last one is zero, NOT ASCII digit '0'
= value 48
).
You will need first to convert these ASCII bytes into integer value (in C++ atoi
may help, or one of few other functions for conversions/formatting). In ASM check SO for questions "how to enter integer".
Once you will convert it to integer value, you can proceed the same way as described a bit above (at that moment it's already encoded in 16 or 32 bits, so outputting string representation of it should be easy).
You may still run into some tricky parts, like if you don't want to output leading zeroes, etc... but all of that should be easy, if you understand how this works.
In this case your ASM function prototype may be only void convertToBinary(char*);
to reuse the string pointer both as input, and output.
Your int intToBin(char*);
looks weird, because it means the ASM will return int
.. but why? That's integer value, not bonded into any particular formatting, so it's binary/octal/decimal/hexa at the same time. Depends how you display it. So you don't need it, you need only the string representing the value in binary form, that's that char *
. And you don't give it the number you entered (unless it's taking it from the string).
From the task description and your skill level I think you are allowed to convert the input into int
right in C++ (ie. std::cin >> int_variable;
).
BTW, if you fully understand what is happening to values in computer, and how CPU instruction work over them, you can often come with many different ways how to achieve some result. For example Jose's conversion to binary is written in simple way how an Assembly newcomer would write it (he wrote it like that to make it easier for you to understand):
mov eax, num // ◄■■ THE NUMBER.
lea edi, bin // ◄■■ POINT TO VARIABLE "BIN".
mov ecx, 32 // ◄■■ NUMBER IS 32 BITS.
conversion:
shl eax, 1 // ◄■■ GET LEFTMOST BIT.
jc bit1 // ◄■■ IF EXTRACTED BIT == 1
mov [edi], '0'
jmp skip
bit1:
mov [edi], '1'
skip :
inc edi // ◄■■ NEXT POSITION IN "BIN".
loop conversion
It's still a bit fragile, for example he initializes "bin" in such way, that it contains 32 spaces and 33th value is zero (null terminator of C string). Then in code he does modify exactly 32 bytes, so the 33th zero is still there and working. If you would adjust his code to skip leading zeroes, it would "break" by displaying remaining part of buffer, as he doesn't set null terminator explicitly.
This is common way how to code in Assembly for performance, to be exactly aware of everything happening, and not setting values which are already set/etc. While you are learning, I would suggest you to work in "defensive" way, rather doing some wasteful things, which will work as safety net in case of some mistake, so I would add mov byte ptr [edi],0
after loop
to set terminator explicitly again.
But it is actually not very fast, as it is using branching. CPU doesn't like that, decoding new instructions is a costly task, and if it is not sure, which instructions will be executed, it simply decodes ahead one path, and in case of wrong guess, it will throw it out, and decode the correct path, but that means several cycles pause in execution, until first instruction of new path is fully decoded and ready for execution.
So when coding for performance, you want to avoid hard-to-predict branches (the final loop
is easy to predict for CPU, as it always loops, only until final exit after ecx is 0
). One of many possible ways in this case can be:
mov edx, num
lea edi, bin
mov ah,'0'/2 // for fast init of al later
// '0' is 48 (even), '0'/2 will work (24)
mov ecx, 32 // countdown counter
conversion:
mov al,ah // al = '0'/2
shl edx, 1 // most significant bit into CF
adc al,al // al = '0'/2 + '0'/2 + CF = '0' or '1'
stosb // store the '0' or '1' to [edi++]
dec ecx // manually written "loop"
jnz conversion // (it is faster on modern CPUs)
mov [edi],ch // explicit set of null-terminator
// (ch == 0, because here ecx == 0)
As you can see, now there is no branching except the loop, CPU branch prediction will handle this much more smoothly, and the performance will be considerably better.
A dword variant for discussion with Cody (NASM syntax, 32b target):
; .data
binNumber times 36 db 0
; .text
numberToBin:
mov edx,0x12345678
lea edi,[binNumber]
mov ecx, 32/4 ; countdown counter
n2b_conversion:
mov eax,0b11000000110000001100000011000
; ^ will become '0'/'1' for each of four bits
shl edx,1
rcr eax,8
shl edx,1
rcr eax,8
shl edx,1
rcr eax,8
shl edx,1
rcr eax,8
; here was "or eax,'0000'" => no more needed.
stosd
dec ecx
jnz n2b_conversion
mov [edi],dl ; null terminator
ret
Didn't profile it, just verified it return correct result.
回答3:
You can do the whole thing without any loops, using SIMD to do all the bits in parallel.
The decimal-string -> integer part is complicated, but there is a complete atoi() implementation in asm which uses SSE4.2 PCMPISTRI, then some shuffles, then PMADDWD to multiply digits by their place-values. For Haswell, IACA says it has about 64 cycles latency, so it's probably not faster for short integers.
The integer -> base 2 string part is much simpler, and can be done efficiently using only SSE2.
This uses the same technique as Evgeny Kluev's answer on a question about doing the inverse of PMOVMSKB, to turn a bit-pattern into a vector of 0 / -1 elements: broadcast-shuffle the input bytes so every vector element contains the bit you want (plus neighbours). AND that to leave only zero or 1, then compare against an all-zero vector.
This version only requires SSE2, so it works on every CPU that can run a 64-bit OS, and some 32-bit-only CPUs (like early Pentium4). It can go faster with SSSE3 (one PSHUFB instead of three shuffles to get the low and high bytes where we want them). You could do 8 bits -> 8 bytes at a time with MMX.
I'm not going to try to convert it from NASM to MASM syntax. I have actually tested this, and it works. I'd probably just introduce a syntax error trying to convert. The x86 32-bit System V calling convention doesn't differ from the 32-bit Windows cdecl calling convention in any ways that affect this code, AFAIK.
;;; Tested and works
;; nasm syntax, 32-bit System V (or Windows cdecl) calling convention:
;;;; char *numberToBin(uint16_t num, char buf[17]);
;; returns buf.
ALIGN 16
global numberToBin
numberToBin:
movd xmm0, [esp+4] ; 32-bit load even though we only care about the low 16 bits.
mov eax, [esp+8] ; buffer pointer
; to print left-to-right, we need the high bit to go in the first (low) byte
punpcklbw xmm0, xmm0 ; llhh (from low to high byte elements)
pshuflw xmm0, xmm0, 00000101b ; hhhhllll
punpckldq xmm0, xmm0 ; hhhhhhhhllllllll
; or with SSSE3:
; pshufb xmm0, [shuf_broadcast_hi_lo] ; SSSE3
pand xmm0, [bitmask] ; each input bit is now isolated within the corresponding output byte
; compare it against zero
pxor xmm1,xmm1
pcmpeqb xmm0, xmm1 ; -1 in elements that are 0, 0 in elements with any non-zero bit.
paddb xmm0, [ascii_ones] ; '1' + (-1 or 0) = '0' or 1'
mov byte [eax+16], 0 ; terminating zero
movups [eax], xmm0
ret
section .rodata
ALIGN 16
;; only used for SSSE3
shuf_broadcast_hi_lo:
db 1,1,1,1, 1,1,1,1 ; broadcast the second 8 bits to the first 8 bytes
db 0,0,0,0, 0,0,0,0 ; broadcast the first 8 bits to the second 8 bytes
bitmask: ; select the relevant bit within each byte, from high to low for printing
db 1<<7, 1<<6, 1<<5, 1<<4
db 1<<3, 1<<2, 1<<1, 1<<0
db 1<<7, 1<<6, 1<<5, 1<<4
db 1<<3, 1<<2, 1<<1, 1<<0
ascii_ones:
times 16 db '1'
Using PSHUFLW to do the reversal in the second shuffle step is faster on old CPUs (first-gen Core2 and older) that have slow 128b shuffles, because shuffling only the low 64 bits is fast. (Compared to using PUNPCKLWD / PSHUFD). See Agner Fog's Optimizing Assembly guide to learn more about writing efficient asm, and other links in the x86 tag wiki.
(Thanks to clang for spotting the possibility).
If you were using this in a loop, you'd load the vector constants into vector registers instead of re-loading them every time.
From asm, you can call it like
sub esp, 32
push esp ; buf[] on the stack
push 0xfba9 ; use a constant num for exmaple
call numberToBin
add esp, 8
;; esp is now pointing at the string
Or call it from C or C++ with the prototype from comments in the asm.
来源:https://stackoverflow.com/questions/40811218/creating-an-x86-assembler-program-that-converts-an-integer-to-a-16-bit-binary-st