I\'ve got to learn assembly and I\'m very confused as to what the different registers do/point to.
The sp
register is the stack pointer, used for stack operation like push
and pop
.
The stack is known as a LIFO structure (last-in, first-out), meaning that the last thing pushed on is the fist thing popped off. It's used, among other things, to implement the ability to call functions.
The bp
register is the base pointer, and is commonly used for stack frame operations.
This means that it's a fixed reference to locate local variables, passed parameters and so forth on the stack, for a given level (while sp
may change during the execution of a function, bp
usually does not).
If you're looking at assembly language like:
mov eax, [bp+8]
you're seeing the code access a stack-level-specific variable.
The si
register is the source index, typically used for mass copy operations (di
is its equivalent destination index). Intel had these registers along with specific instructions for quick movement of bytes in memory.
The e-
variants are just the 32-bit versions of these (originally) 16-bit registers. And, as if that weren't enough, we have 64-bit r-
variants as well :-)
Perhaps the simplest place to start is here. It's specific to the 8086 but the concepts haven't changed that much. The simplicity of the 8086 compared to the current crop will be a good starting point for your education. Once you've learned the basics, it will be much easier to move up to the later members of the x86 family.
Transcribed here and edited quite a bit, to make the answer self-contained.
GENERAL PURPOSE REGISTERS
8086 CPU has 8 general purpose registers, each register has its own name:
AX
- the accumulator register (divided into AH/AL
). Probably the most commonly used register for general purpose stuff.BX
- the base address register (divided into BH/BL
).CX
- the count register (divided into CH/CL
). Special purpose instructions for loping and shifting.DX
- the data register (divided into DH/DL
). Used with AX
for some MUL
and DIV
operations, and for specifying ports in some IN
and OUT
operations.SI
- source index register. Special purpose instruction to use this as a source of mass memory transfers (DS:SI
).DI
- destination index register. Special purpose instruction to use this as a destination of mass memory transfers (ES:DI
).BP
- base pointer, primarily used for accessing parameters and variables on the stack.SP
- stack pointer, used for the basic stack operations.SEGMENT REGISTERS
CS
- points at the segment containing the current instruction.DS
- generally points at segment where variables are defined.ES
- extra segment register, it's up to a coder to define its usage.SS
- points at the segment containing the stack.Although it is possible to store any data in the segment registers, this is never a good idea. The segment registers have a very special purpose - pointing at accessible blocks of memory.
Segment registers work together with general purpose register to access any memory value. For example, if we would like to access memory at the physical address 12345h
, we could set the DS = 1230h
and SI = 0045h
. This way we can access much more memory than with a single register, which is limited to 16 bit values.
The CPU makes a calculation of the physical address by multiplying the segment register by 10h
and adding the general purpose register to it (1230h * 10h + 45h = 12345h
):
1230
0045
=====
12345
The address formed with 2 registers is called an effective address.
This usage is for real mode only (which is the only mode the 8086 had). Later processors changed these registers from segments to selectors and they are used to lookup addresses in a table, rather than having a fixed calculation performed on them.
By default BX
, SI
and DI
registers work with DS
segment register; and BP
and SP
work with SS
segment register.
SPECIAL PURPOSE REGISTERS
IP
- the instruction pointer:
CS
.IP
register always works together with CS
segment register and it points to currently executing instruction.
FLAGS REGISTER
Determines the current state of the processor. These flags are modified automatically by CPU after mathematical operations, this allows to determine the type of the result, and to determine conditions to transfer control to other parts of the program.
Generally you cannot access these registers directly.
CF
- this flag is set to 1 when there is an unsigned overflow. For example when you add bytes 255 + 1 (result is not in range 0...255). When there is no overflow this flag is set to 0.PF
- this flag is set to 1 when there is even number of one bits in result, and to 0 when there is odd number of one bits. AF
- set to 1 when there is an unsigned overflow for low nibble (4 bits).ZF
- set to 1 when result is zero. For non-zero result this flag is set to 0.SF
- set to 1 when result is negative. When result is positive it is set to 0. (This flag takes the value of the most significant bit.)TF
- Used for on-chip debugging.IF
- when this flag is set to 1 CPU reacts to interrupts from external devices.DF
- this flag is used by some instructions to process data chains, when this flag is set to 0 - the processing is done forward, when this flag is set to 1 the processing is done backward.OF
- set to 1 when there is a signed overflow. For example, when you add bytes 100 + 50 (result is not in range -128...127).On some architectures, like MIPS, all registers are created equal, and there is really no difference beyond the name of the register (and software conventions). On x86 you can mostly use any registers for general-purpose computing, but some registers are implicitly bound to the instruction set.
Lots of information about special purposes for registers can be found here.
Examples:
eax
, accumulator: many arithmetic instructions implicitly operate on eax
. There are also special shorter EAX-specific encodings for many instructions: add eax, 123456
is 1 byte shorter than add ecx, 123456
, for example. (add eax, imm32 vs. add r/m32, imm32)ebx
, base: few implicit uses, but xlat is one that matches the "Base" naming. Still relevant: cmpxchg8b. Because it's rarely required for anything specific, some 32-bit calling-conventions / ABIs use it as a pointer to the "global offset table" in Position Independent Code (PIC).edx
, data: some arithmetic operations implicitly operate on the 64-bit value in edx
:eax
ecx
, counter used for shift counts, and for rep movs
. Also, the mostly-obsolete loop instruction implicitly decrements ecx
esi
, source index: some string operations read a string from the memory pointed to by esi
edi
, destination index: some string operations write a string to the memory pointed to by edi
. e.g. rep movsb copies ECX bytes from [esi]
to [edi]
.ebp
, base pointer: normally used to point to local variables. Used implicitly by leave.esp
, stack pointer: points to the top of the stack, used implicitly by push
, pop
, call
and ret
The x86 instruction set is a complex beast, really. Many instructions have shorter forms that implicitly use one register or another. Some registers can be used to do certain addressing while others cannot.
The Intel 80386 Programmer's Reference Manual is a irreplaceable resource, it basically tells you everything there is to know about x86 assembly, except for newer extensions and performance on modern hardware.
The PC Assembly (e)book is a great resource for learning assembly.
Here's a simplified summary:
ESP is the current stack pointer, so you generally only update it to manipulate stack, and EBP is intended for stack manipulation too, for example saving the value of ESP before allocating stack space for local variables. But you can use EBP as a general purpose register too.
ESI is the Extended Source Index register, "string" (different from C-string, and I don't mean the type of C-string women wear either) instructions like MOVS use ESI and EDI.
Memory Addressing:
x86 CPUs have these special registers called "segment registers", each of them can point to different address, for example DS (commonly called data segment) may point to 0x1000000, and SS (commonly called stack segment) may point to 0x2000000.
When you use EBP and ESP, the default segment register used is SS, for ESI (and other general purpose registers), it's DS. For example, let's say DS=0x1000000, SS=0x2000000, EBP=0x10, ESI=0x10, so:
mov eax,[esp] //loading from address 0x2000000 + 0x10
mov eax,[esi] //loading from address 0x1000000 + 0x10
You can also specify a segment register to use, overriding the default:
mov eax,ds:[ebp]
In terms of addition, subtraction, logical operations, etc, there's no real difference between them.