Embedded System: Memory Layout when using Assembly Language

问题

From my understanding, an embedded system runs machine code. There are multiple ways to generate this code. One is to write a programm in a higher level language like C and use a compiler to get such code. An other way is writing instructions in the assambly language for that embedded system and using an assembler to translate that to machine code. Now we got machine code which is loaded to the system and executed. The programm code is stored in non-volatile memory.

Now, if the programm code was obtained from a C compiler I know the following: The code contains multiple sections:

.text: The actuall instructions
.bss: Declared but not defined variables
.data: Declared and defined variables
.rodata: Declared and defined read only variables ("const")

Then, on startup .bss and .data is (in most cases) loaded into ram. Then, a stack pointer is placed after the data section and a heap pointer is placed at the end of the ram, so that during execution, they grow agains each other.

The question is now, how do things behave if I write code in the assembly language? From my understanding, there should be no sections like above (in the programm code nor the ram), only the code (equivialent to .text). I can manually access memory addresses and write and read from there, but there are no such things as stack and heap. Is this portrayal correct?

回答1:

Your diagram is a textbook view of things and is not necessarily incorrect, but for a microcontroller that is not exactly how things look.

C and assembly language result in the same thing, in general, an object containing machine code and data and some structure for the linker to know what is what. Including some sort of information to indicate what chunks of bytes are what, often called sections. The specific names .text, .data, etc are not cast in stone, tools developers are free to choose whatever names they want. If they do not use those names then that adds confusion to the general population who are used to those terms. So it is wise to somewhat conform even though you might be writing a new compiler because you do not like any of the existing ones.

A stack pointer is as useful as any other register/concept in a processor, independent of language. Most processors are limited by the number of general purpose registers so there will come a time when you need to save some off temporarily to have room to do some more work. And concepts of subroutines/functions require some sort of jump with a notion of a return. Independent of programming language (which means assembly language, which is a programming language, is included).

Heap is a notion of running on an operating system or an environment where you are not completely in control. What you are talking about with respect to microcontrollers is called baremetal programming. Which generally means without operating system. Which implies/means you are in complete control. You do not have to ask for memory you simply take it.

With microcontrollers in general (there are exceptions to almost all of these statements) there is some form of non-volatile memory (flash, eeprom, etc, a rom of some sort), and ram (sram). The chip vendor chooses the address space for these logic components for a particular chip or family of chips. The processor core itself rarely cares, they are just addresses. The programmer is responsible for connecting all of the dots. So a MCU memory model will have a flash address space which, yes, basically has the code and ideally read-only items (you the programmer need to tell the tools to do this). And the sram will have the read/write items. But there exists another problem. The so called .data items desire to be set to a value before the body of the code or in the case of C before the C language compiled code starts to execute. Likewise if .bss is assumed to be zeroed, that has to happen as well. This is done in what is sometimes called a bootstrap. Some (ideally) assembly language code that bridges the gap between the entry point of the application and the entry point of the high level language (C). With an operating system first off a limited number of binary format files types are supported. Then within those the operating system authors decide if they want to prepare the memory for you other than simply allocating room for your application, normally be all ram you do not have the MCU problem I am about to describe. The OS can simply place data where linked and zero .bss where linked.

With an MCU you are generally booting the processor, your code is the first code, there is no operating system to prepare and manage things for you, this is IMO good, but also means more work. Specifically all you have on boot is the non-volatile storage, in order to get .data items into ram you need to have a copy of them in rom and you need to copy them before executing any compiled code that assumes they are in their final place. That is one of the jobs of the bootstrap, another is to set the stack pointer as compilers assume there is a stack when they generate compiled code.

unsigned int a;
unsigned int b = 5;
const unsigned int c = 7;
void fun ( void  )
{
    a = b + c;
}
Disassembly of section .text:

00000000 <fun>:
   0:   e59f3010    ldr r3, [pc, #16]   ; 18 <fun+0x18>
   4:   e5933000    ldr r3, [r3]
   8:   e59f200c    ldr r2, [pc, #12]   ; 1c <fun+0x1c>
   c:   e2833007    add r3, r3, #7
  10:   e5823000    str r3, [r2]
  14:   e12fff1e    bx  lr
    ...

Disassembly of section .data:

00000000 <b>:
   0:   00000005    andeq   r0, r0, r5

Disassembly of section .bss:

00000000 <a>:
   0:   00000000    andeq   r0, r0, r0

Disassembly of section .rodata:

00000000 <c>:
   0:   00000007    andeq   r0, r0, r7

You can see all of these elements in this example.

arm-none-eabi-ld -Ttext=0x1000 -Tdata=0x2000 -Tbss=0x3000 -Trodata=0x4000 so.o -o so.elf

Disassembly of section .text:

00001000 <fun>:
    1000:   e59f3010    ldr r3, [pc, #16]   ; 1018 <fun+0x18>
    1004:   e5933000    ldr r3, [r3]
    1008:   e59f200c    ldr r2, [pc, #12]   ; 101c <fun+0x1c>
    100c:   e2833007    add r3, r3, #7
    1010:   e5823000    str r3, [r2]
    1014:   e12fff1e    bx  lr
    1018:   00002000
    101c:   00003000

Disassembly of section .data:

00002000 <b>:
    2000:   00000005

Disassembly of section .bss:

00003000 <a>:
    3000:   00000000

Disassembly of section .rodata:

00001020 <c>:
    1020:   00000007

(naturally this is not a valid/executable binary, the tools do not know/care)

The tool ignored my -Trodata, but you can see otherwise we control where things go, and we normally do that through linking. We ultimately are responsible for making sure the build matches the target, that we link things to match the chip address space layout.

With many compilers, and particularly gnu GCC, you can create an assembly language output. In the case of GCC it compiles to assembly language then calls the assembler (a wise design choice, but not required).

arm-none-eabi-gcc -O2 -save-temps -c so.c -o so.o
cat so.s
    .cpu arm7tdmi
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 1
    .eabi_attribute 30, 2
    .eabi_attribute 34, 0
    .eabi_attribute 18, 4
    .file   "so.c"
    .text
    .align  2
    .global fun
    .arch armv4t
    .syntax unified
    .arm
    .fpu softvfp
    .type   fun, %function
fun:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r3, .L3
    ldr r3, [r3]
    ldr r2, .L3+4
    add r3, r3, #7
    str r3, [r2]
    bx  lr
.L4:
    .align  2
.L3:
    .word   .LANCHOR1
    .word   .LANCHOR0
    .size   fun, .-fun
    .global c
    .global b
    .global a
    .section    .rodata
    .align  2
    .type   c, %object
    .size   c, 4
c:
    .word   7
    .data
    .align  2
    .set    .LANCHOR1,. + 0
    .type   b, %object
    .size   b, 4
b:
    .word   5
    .bss
    .align  2
    .set    .LANCHOR0,. + 0
    .type   a, %object
    .size   a, 4
a:
    .space  4
    .ident  "GCC: (GNU) 10.2.0"

And in there lies the keys. Understanding that assembly language is specific to the assembler (the program) not the target (the cpu/chip), meaning you can have many incompatible assembly languages for the same processor chip, so long as they generate the right machine code they are all useful. This is gnu assembler (gas) assembly language.

.text
nop
add r0,r0,r1
eor r1,r2
b .
.align
.bss
.word 0
.data
.word 0x12345678
.section .rodata
.word 0xAABBCCDD

Disassembly of section .text:

00000000 <.text>:
   0:   e1a00000    nop         ; (mov r0, r0)
   4:   e0800001    add r0, r0, r1
   8:   e0211002    eor r1, r1, r2
   c:   eafffffe    b   c <.text+0xc>

Disassembly of section .data:

00000000 <.data>:
   0:   12345678

Disassembly of section .bss:

00000000 <.bss>:
   0:   00000000

Disassembly of section .rodata:

00000000 <.rodata>:
   0:   aabbccdd

Linked the same way:

Disassembly of section .text:

00001000 <.text>:
    1000:   e1a00000    nop         ; (mov r0, r0)
    1004:   e0800001    add r0, r0, r1
    1008:   e0211002    eor r1, r1, r2
    100c:   eafffffe    b   100c <__data_start-0xff4>

Disassembly of section .data:

00002000 <__data_start>:
    2000:   12345678

Disassembly of section .bss:

00003000 <__bss_start+0xffc>:
    3000:   00000000

Disassembly of section .rodata:

00001010 <_stack-0x7eff0>:
    1010:   aabbccdd

For an MCU with gnu linker (ld), note linker scripts or how you tell the linker what you want is specific to the linker do not assume that it is portable in any way to other linkers from other toolchains.

MEMORY
{
    rom : ORIGIN = 0x10000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
    .rodata : { *(.rodata*) } > rom
    .data   : { *(.data*)   } > ram AT > rom
    .bss    : { *(.bss*)    } > ram AT > rom
}

I am telling the linker first off that I want the read only things in one place and read/write things in another. Note that the words rom and ram are only there to connect the dots (for gnu linker):

MEMORY
{
    ted : ORIGIN = 0x10000000, LENGTH = 0x1000
    bob : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > ted
    .rodata : { *(.rodata*) } > ted
    .data   : { *(.data*)   } > bob AT > ted
    .bss    : { *(.bss*)    } > bob AT > ted
}

Now we get:

Disassembly of section .text:

10000000 <.text>:
10000000:   e1a00000    nop         ; (mov r0, r0)
10000004:   e0800001    add r0, r0, r1
10000008:   e0211002    eor r1, r1, r2
1000000c:   eafffffe    b   1000000c <.text+0xc>

Disassembly of section .rodata:

10000010 <.rodata>:
10000010:   aabbccdd

Disassembly of section .data:

20000000 <.data>:
20000000:   12345678

Disassembly of section .bss:

20000004 <.bss>:
20000004:   00000000

BUT! We have a chance at success with a MCU:

arm-none-eabi-objcopy -O binary so.elf so.bin
hexdump -C so.bin
00000000  00 00 a0 e1 01 00 80 e0  02 10 21 e0 fe ff ff ea  |..........!.....|
00000010  dd cc bb aa 78 56 34 12                           |....xV4.|
00000018

arm-none-eabi-objcopy -O srec --srec-forceS3 so.elf so.srec
cat so.srec
S00A0000736F2E7372656338
S315100000000000A0E1010080E0021021E0FEFFFFEAFF
S30910000010DDCCBBAAC8
S3091000001478563412BE
S70510000000EA

You can see the AABBCCDD and 12345678

S30910000010DDCCBBAAC8 AABBCCDD at address 0x10000010
S3091000001478563412BE 12345678 at address 0x10000014

In flash. The next step if your linker can help you which would be no good if it cannot:

MEMORY
{
    ted : ORIGIN = 0x10000000, LENGTH = 0x1000
    bob : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > ted
    .rodata : { *(.rodata*) } > ted
    __data_rom_start__ = .;
    .data   : 
        {
            __data_start__ = .;
            *(.data*)   
        } > bob AT > ted
    .bss    : 
        { 
            __bss_start__ = .;
            *(.bss*)    
        } > bob AT > ted
}

Essentially creating variables/labels that you can see in other languages:

.text
nop
add r0,r0,r1
eor r1,r2
b .
.align
.word __data_rom_start__
.word __data_start__
.word __bss_start__
.bss
.word 0
.data
.word 0x12345678
.section .rodata
.word 0xAABBCCDD

Disassembly of section .text:

10000000 <.text>:
10000000:   e1a00000    nop         ; (mov r0, r0)
10000004:   e0800001    add r0, r0, r1
10000008:   e0211002    eor r1, r1, r2
1000000c:   eafffffe    b   1000000c <__data_rom_start__-0x14>
10000010:   10000020
10000014:   20000000
10000018:   20000004

Disassembly of section .rodata:

1000001c <__data_rom_start__-0x4>:
1000001c:   aabbccdd

Disassembly of section .data:

20000000 <__data_start__>:
20000000:   12345678

Disassembly of section .bss:

20000004 <__bss_start__>:
20000004:   00000000

S00A0000736F2E7372656338
S315100000000000A0E1010080E0021021E0FEFFFFEAFF
S311100000102000001000000020040000205A
S3091000001CDDCCBBAABC
S3091000002078563412B2
S70510000000EA

The tools placed .data at 0x10000020

S3091000002078563412B2

Which we see in the flash

10000010: 10000020 __data_rom_start__
10000014: 20000000 __data_start__
10000018: 20000004 __bss_start__

arm-none-eabi-nm so.elf 
20000004 B __bss_start__
10000020 R __data_rom_start__
20000000 D __data_start__

Add some more of these types of things (note that gnu ld linker script is a PITA to get these things right) and you can then write some assembly language code to copy the .data items to ram as you now know where in the binary and where in ram the linker placed things. And where .bss is and now much memory to clear/zero.

Memory allocation in baremetal is not desireable, often because baremetal these days is microcontroller type work. It is not limited to that, an operating system itself is a baremetal program, booted by another baremetal program, a bootloader. But with an MCU, your resources, in particular ram are quite limited and if you use say globals instead of locals, and you do not allocate dynamically but instead statically declare things, then most of your sram usage can be seen using the tools, and can also be limited by the linker script.

arm-none-eabi-readelf -l so.elf

Elf file type is EXEC (Executable file)
Entry point 0x10000000
There are 2 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x010000 0x10000000 0x10000000 0x00020 0x00020 R E 0x10000
  LOAD           0x020000 0x20000000 0x10000020 0x00004 0x00008 RW  0x10000

 Section to Segment mapping:
  Segment Sections...
   00     .text .rodata 
   01     .data .bss

Normally setting the linker script sizes to match the target hardware, exaggerated here for demonstration purposes.

bob : ORIGIN = 0x20000000, LENGTH = 0x4

arm-none-eabi-ld -T flash.ld so.o -o so.elf
arm-none-eabi-ld: so.elf section `.bss' will not fit in region `bob'
arm-none-eabi-ld: region `bob' overflowed by 4 bytes

If you use too much dynamic allocation be it local variables or the family of malloc() calls, then you have to do an analysis of consumption to see if your stack overflows into data. Or your data into stack. Which can be quite difficult at best.

Also understanding that baremetal meaning no operating system greatly limits the C libraries you can use as a larger percentage of them rely on an operating system for something. Specifically the alloc functions in general. So in order to even have dynamic memory allocation at runtime you need to implement the back end for the C library that implements the allocation. (hint use your linker script to find out the size/location of unused ram). So dynamic memory allocation at runtime is discouraged. But there are times you will want to do it and will need to implement it.

Assembly language is obviously free to use a stack as it is just another part of the architecture and there are often instructions specific to the stack that are also supported by the assembly language. Heap and any other C library language call can be made from assembly language as assembly language by definition can make calls to labels/addresses just like C can.

unsigned char * fun ( unsigned int x )
{
    return malloc(x);
}

fun:
    push    {r4, lr}
    bl  malloc
    pop {r4, lr}
    bx  lr

.text, .rodata, .data, .bss, stack, and heap are all available to assembly language at least for assemblers that are geared toward object files and linking. There are assemblers that are meant to be a single file type of thing or not used with objects and linkers so have no need for sections, but will instead have things like

.org 0x1000
nop
add r0,r1,r2
.org 0x2000
.word 0x12345678

Where you are declaring the specific address where things are in the assembly language itself. And some tools may let you mix these concepts but it can get quite confusing for you and the tools.

With the heavily used modern tools like gnu/binutils and clang/llvm the use/notion of sections is available for all of the supported languages, as well as function/library calls from one object to another (can have and use a C library independent of the language used to call it).

回答2:

Generally it's up to you.

Your assembler will support sections, but if you want, you can just put everything in one section and then forget about sections entirely.

Most CPUs have a stack, which just means they have a stack pointer register, and specific instructions for pushing and popping. The top of the stack (the last pushed item) is wherever the stack pointer register says it is. And the CPU doesn't actually care where the bottom is. Usually, you should put an instruction at the beginning of your assembly program, which sets the stack pointer to a particular address, where you want the bottom of the stack to be.

The heap is something created by your program. The CPU doesn't know about it at all, and neither does the assembler. You might be able to link with the malloc library from C (assembly programs can still use libraries, even libraries that are written in C). Or you might not. You could also create your own malloc.

来源：https://stackoverflow.com/questions/64827373/embedded-system-memory-layout-when-using-assembly-language

标签

assembly

embedded

microcontroller

bare-metal