addressing in assembler

问题

There is something I can't digest. I'm learning some assembler and right now I'm at the chapter with addressing. I understand the concept of brackets for dereferencing, but somehow when I see the usage of it I just can't soak up the point of it. To be a little bit more exact here is where my confusion started:

mov al, [L1]

Here I suppose L1 as an example case is some kind of macro which is later substituted for a real address in the machine code, right?

So what this instruction does is: dereferencing al register (because you could hardly change physical address) and changing the value to the one stored at L1.

If till now everything is ok:

mov [L1], al

that would analogicaly mean, there must have been an address stored (so there was some point in doing this) and you change it to some other place in memory, right?

If you could just tell me it's ok in case you don't see any mistakes please do it, that would make it possible for me to continue learning.

One last thing, NASM adds a bunch of 0xAA55 under my code (this sequence is supposed to end the program right?), why is it there so many times?

回答1:

L1 is typically/probably a label, associated with one particular address in memory. The programmer defines various labels for his/her convenience, and such labels are used to symbolically represent a particular location in memory (L1 is a lousy name; labels are typically indicative of the underlying purpose of the location: say, PingCounter, ErrorMessage, Login and the like).

A label for 1 byte of static storage is how a C compiler would implement char L1; at global scope.

In NASM syntax, mov edi, L1 will assemble to the mov eax, imm32 form of mov, i.e. the label address will become a 32-bit immediate in the machine code. (The assembler doesn't know the final numeric value, but the linker does.) Beware that in MASM syntax, this would be a load and you'd need mov edi, OFFSET L1 to get a label address as an immediate.

But mov al, [L1] will assemble to a different instruction, with the 32-bit address embedded in the machine code as an address to be dereferenced. This instruction loads 1 byte from the address L1, and places it in AL.

In the assembly language, this indirect addressing mode is signified by square bracketing the source or destination operand of a given instruction. (But not both: x86 only supports at most one explicit memory operand per instruction.)

mov al, [L1]

uses the address stored in L1, to locate some location in memory and reads 1 byte (= 8 bits = the size of AL register) at this location, and loads it into the AL register.

  mov [L1], al

Does this in reverse. i.e., specifically, read the address stored in L1, use this address to find a particular place in memory and stores the contents of AL register there.

Provided that you understand the following information to be incomplete and somewhat outdated with regards to the newer processors in the x86 family, this primer on the 8086 architecture is probably very useful to get one started with Assembly language for the x86 family.
The advantage of starting with this "antiquity of a CPU" (still in use, actually), is that the fundamental concepts are all there, unencumbered of the newer sets of registers, fancy addressing modes, modes of operation and other concepts. The bigger sizes, features and modes of the newer CPUs merely introduce a combinatorial explosion of options, all (most?) of them useful in their way, but essentially irrelevant for an initiation.

回答2:

It's hard to follow your question, but I'll try to help out.

In assembly, a symbol is just a name for a an address. In your assembly source, L1 is a symbol defined elsewhere, which the assembler will resolve as an offset to memory.

When dereferencing (using the [] notation), you can dereference a register (as in "mov al, [esi]") or an address (as in "mov al, [L1]"). Both statements do the same thing, the only difference is where the address comes from.

I recommend downloading the Intel CPU Documentation and skimming through the instruction reference. If you don't want to be overwhelmed, start reading from an older x86 processor (say, 486 or older), that documentation isn't exactly friendly but it is quite useful to have on hand.

I don't know the specifics of NASM, I learned assembly 15 years ago with Turbo Assembler, and that knowledge is still useful today :)

Also, might I suggest you try Googling for "x86 assembly tutorial", you'll find plenty of relevant documentation that may be useful for you.

回答3:

oh and one last thing, NASM adds a bunch of 0xAA55 under my code (this sequence is supposed to end the program right?), why is it there so many times? thank you very much for reading it to here..

I'm pretty sure thats only applicable if your creating a bootloader. It is the "boot signature." Say you write this code to a floppy(is your produced machine code also exactly 512 bytes?), well when you want to start the computer with this bootloader code, the BIOS will look at the floppy and determine if it's an actual bootloader. In order to do that, it will look at the last two bytes of the first sector of the floppy, which should be 0xAA55 to indicate that it is bootable.. (also, this works the same way if your booting off of harddrive, or thumb-drive, or whatever. Slightly different for CDs because they have 4096 byte sectors)

In your source code, is like the last line something like $(times.. db 0xAA55 or something like that? If your not intending on making a bootloader, you can effectively remove that line.

来源：https://stackoverflow.com/questions/2364162/addressing-in-assembler

标签

assembly

nasm

addressing