how does the cpu knows that the data in memory is data or a command? (C programming)

问题

I expect that that data begins at for example 0x100 (memory location) and everything before that it's commands... But I am really not sure! Thanks for help!

okay, to detail my question: I see the memory as a long array with one byte space. The space is filled with hex numbers. But the variables can fill the memory from for example 0x0000 - 0xffff. But how do you know that for example 0x002f is a command (for example 'mov') or just a number as data?

回答1:

The CPU doesn't know. It's all about conventions.

When you start your computer or embedded device it starts executing a bootloaer from a flash storage.

In turn, the bootloader loads startup code from persistent storage to memory and starts executing (typically the OS kernel) at the address where it loaded it.

In turn, the kernel will load additional modules and init code at known memory locations and execute from there.

At some point, memory virtualization is enabled, and executable files are loaded in memory and each is associated with a process and its address space. The executable header and OS conventions define code and data segments locations.

But the code segment may contain embedded data, and dynamically allocated memory may contain code, for instance in just-in-time compilers or malicious programs.

Ok to illustrate: imagine the address space of a running compiler...

[ code seg from .exe ] [ data seg from .exe ] [ dynamic alloc ]

The only difference between these memory regions is that [code] is Read+Execut, [data] is ReadOnly, [dynamic] is Read+Write, sometimes +Execute

[code] contains
- mainly machine instructions
- but it may contain immediate data, such as integer constants and so on
[data] contains
- things like strings for your language keywords, error messages, etc.
- code because you have a dictionary of machine instructions so that you can generate code
[dynamic] contains
- runtime allocated data structures such as strings, trees, etc.
- runtime generated code, which will be written to the exe the compiler is building
- runtime generated code, which will be executed by a jit for computing complex expressions at compile-time

So you see, you can have data and code in any section. That's what makes computers powerful, see also the turing machines.

回答2:

In general, the CPU doesn't know. It just execute instructions. However the operating system with the help or the memory management unit may flag some memory page as executable or not, preventing the cpu for executing it.

回答3:

How does the cpu knows that the data in memory is data or a command?

It only knows from the context - you can (leaving aside execution protection of modern CPUs) very well set your instruction pointer to a block of data and let the CPU execute it (most likely with useless results). The same is true for the other way round: when an instruction is accessing a memory location, it is taken as data to work with - but it could even be part of your code memory (reading the opcodes of CPU instructions).

So in essence, there is no distinction between data and code memory, at least not on CPUs which are built according to the von-Neumann architecture.

On the other hand, there are also CPUs (e.g. the PIC microcontroller series) which are built according to the Harvard architecture - on these systems, it is not possible to treat data as code or code as data since the access paths are physically separated.

I expect that that data begins at for example 0x100

That depends on the CPU and the operating system. You probably refer to ancient .COM files from MS-DOS (where code began at 0x100) or to some other architecture. On modern operating systems, there is usually a virtual memory management involved.

来源：https://stackoverflow.com/questions/22222337/how-does-the-cpu-knows-that-the-data-in-memory-is-data-or-a-command-c-programm

标签

memory