Let\'s say I want to define a initialized variable string before running my assembly program (in section .data
). The variable I chose to create is called
I want to clarify something:
example: db 'ABCDE';
This reserves 5 bytes in total, each containing a letter.
ex2: db 1 ;
reserves a byte that contains 1
ex3: db "cool;
reserves 4 bytes and each byte contains a letter
ex4: db "cool", 1, 3;
reserves 3 bytes?
answers: ex4 is 6 bytes
For each character in the string "0123456789ABCDEF" you need just one byte. So, the string will occupy 16 bytes in the memory.
In case of this declaration:
vark db 1
you can make this:
mov [vark],128
and cannot:
mov [vark],1024
but in this case:
vark dw 1
you can.
One of the answers on the linked question has a quote from the NASM manual's examples which does answer your question. As requested, I'll expand on it for all three cases (and correct the lower-case vs. upper-case ASCII encoding error!):
db 'ABCDE' ; 0x41 0x42 0x43 0x44 0x45 (5 bytes)
dw 'ABCDE' ; 0x41 0x42 0x43 0x44 0x45 0x00 (6 bytes, 3 words)
dd 'ABCDE' ; 0x41 0x42 0x43 0x44 0x45 0x00 0x00 0x00 (8 bytes, 2 doublewords)
dq 'ABCDE' ; 0x41 0x42 0x43 0x44 0x45 0x00 0x00 0x00 (8 bytes, 1 quadword)
So the difference is that it pads out to a multiple of the element size with zeros when you use dd
or dw
instead of db
.
According to @Jose's comment, some assemblers may use a different byte order for dd
or dw
string constants. In NASM syntax, the string is always stored in memory in the same order it appears in the quoted constant.
You can assemble this with NASM (e.g. into the default flat binary output) and use hexdump -C
or something to confirm the byte ordering and amount of padding.
Note that this padding to the element size applies to each comma-separated element. So the seemingly-innocent dd '%lf', 10, 0
actually assembles like this:
;dd '%lf', 10, 0
db '%lf',0, 10,0,0,0, 0,0,0,0 ;; equivalent with db
Note the 0
before the newline; if you pass a pointer to this to printf
, the C string is just "%lf"
, terminated by the first 0
byte.
(write
system call or fwrite
function with an explicit length would print the whole thing, including the 0
bytes, because those functions work on binary data, not C implicit-length strings.)
Also note that in NASM, you can do stuff like mov dword [rdi], "abc"
to store "abc\0" to memory. i.e. multi-character literals work as numeric literals in any context in NASM.
See When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order? for more. Even in a dd "abcd"
, MASM breaks your strings, reversing the byte order inside chunks compared to source order.