Are instruction set and assembly language the same thing?

问题

I was wondering if instruction set and assembly language are the same thing?

If not, how do they differ and what are their relations?

Thanks and regards!

回答1:

I think everyone is giving you the same answer. Instruction set is is the set (as in math) of all instructions the processor can execute or understand. Assembly language is a programming language.

Let me try some examples based on some of the questions you are asking. And I am going to be jumping around from processor to processor with whatever code I have handy.

Instruction or opcode or binary or machine language, whatever term you want to use for the bits/bytes that are loaded into the processor to be decoded and executed. An example

0x5C0B

The assembly language, would be

add r12,r11

For this particular processor. In this case that means r11 = r11 + r12. So I put that text, the add r12,r11 in a text file and use an assembler (a program that compiles/assembles assembly language) to assemble it into some form of binary. Like any programming language sometimes you create object files then link them together, sometimes you can go straight to a binary. And there are many forms of binaries which are in ascii and binary forms and a whole other discussion.

Now what can you do in assembler that is not part of the instruction set? How do they differ? Well for starters you can have macros:

.macro add3 arg1, arg2, arg3

    add \arg1,\arg3
    add \arg2,\arg3

.endm


.text

   add3 r10,r11,r12

Macros are like inline functions, they are not functions that are called but generate code in line. No different than a C macro for example. So you might use them to save some typing or you might use them to abstract something that you want to do over and over again and want the ability to change in one place and not have to touch every instance. The above example essentially generates this:

add r10,r12
add r11,r12

Another difference between the instruction set and assembly langage are pseudo instructions, for this particular instruction set for example there is no pop instruction for popping things off the stack at least not by that name, and I will explain why. But you are allowed to save some typing and use a pop in your code:

pop r12

The reason why there is no pop is because the addressing modes are flexible enough to have a read from the address in the source register put the value in the destination register and increment the source register by a word. Which in assembler for this instruction set is

mov @r1+,r12

both the pop and the mov result in the opcode 0x413C.

Another example of differences between the instruction set and assembler, switching instruction sets, is something like this:

ldr r0,=bob

Which to this assembly language means load the address of bob into register 0, there is no instruction for that, what the assembler does with it is generate something that would look like this if you were to write it in assembler by hand:

ldr r0,ZZ123
...
ZZ123: .word bob

Essentially, in a reachable place from that instruction, not in the execution path, a word is created which the linker will fill in with the address for bob. The ldr instruction likewise by the assembler or linker will get encoded with an ldr of a pc relative instruction.

That leads to a whole category of differences between the instruction set and the assembly language

call fun

Machine code has no way of knowing what fun is or where to find it. For this instruction set with its many addressing modes (note I am specifically and intentionally avoiding naming the instruction sets I am using as that is not relevant to the discussion) the assembler or linker as the case may be (depending on where the fun function ends up being relative to this instruction).

The assembler may choose to encode that instruction as pc relative, if the fun function is 40 bytes ahead of the call instruction it may encode it with the equivalent of call pc+36 (take four off because the pc is one instruction ahead at execution time and this is a 4 byte instruction).

Or the assembler may not know where or what fun is and leave it up to the linker, and in that case the linker may put the absolute address of the function something that would be similar to call #0xD00D.

Same goes for loads and stores, some instruction sets have near and far pc relative, some have absolute address, etc. And you may not care to choose, you may just say

mov bob,r1

and the assembler or linker or a combination of the two takes care of the rest.

Note that for some instruction sets the assembler and linker may happen at once in one program. These days we are used to the model of compiling to objects and then linking objects, but not all assemblers follow that model.

Some more cases where the assembly language can take some shortcuts:

hang: b hang
  b .
  b 2f
1:
  b 1b
  b 1f
1:
  b 1b
2:

The hang: b hang makes sense, branch to the label called hang. Essentially a branch to self. And as the name implies this is an infinite loop. But for this assembly language b . means branch to self, an infinite loop but I didnt have to invent a label, type it and branch to it. Another shortcut is using numbers b 1b means branch to 1 back, the assembler looks for the label number 1 behind or above the instruction. The b 1f, which is not a branch to self, means branch 1 forward, this is perfectly valid code for this assembler. It will look forward or below the line of code for a label number 1: And you can re-use number 1 like crazy in your assembly language program for this assembler, saves on having to invent label names for simple short branches. The second b 1b branches to the second 1. and is a branch to self.

It is important to understand that the company that created the processor defines the instruction set, and the machine code or opcodes or whatever term they or you use for the bits and bytes the processor decodes and executes. Very often that company will produce a document with assembly language for those instructions, a syntax. Often that company will produce an assembler program to compile/assemble that assembly language...using that syntax. But that doesnt mean that any other person on the planet that chooses to write an assembler for that instruction set has to use that syntax. This is very evident with the x86 instruction set. Likewise any psuedo instructions like the pop above or macro syntax or other short cuts like the b 1b have to be honored from one assembler to another. And very often are not, you see this with ARM for example the universal comment symbol of ; does not work with gnu assembler you have to use @ instead. ARMs assembler does use the ; (note I write my arm assembler with ;@ to make it portable). It gets even worse with gnu tools for example you can can put C language things like #define and /* comment */ in your assembler and use the C compiler instead of the assembler and it will work. I prefer to stay as pure as I can for maximum portability, but naturally you may choose to use whatever features the tool offers.

回答2:

The instruction set is composed by all the instructions a processor can execute, while assembly is the programming language that uses these instructions to make programs.
In other words, the instruction set is just a group of bytes a CPU can understand, but you can't do anything useful with them (think the instructions as the letters of the alphabet) while assembly is a language which lets you combine these instructions (or letters) to make a program (something like a speech).

回答3:

An assembly language will include mnemonics for the instructions but normally adds quite a bit more, such as:

macros
some way to define data
ways to define names (e.g., for functions)

Edit: An instruction (per se) will be encoded in binary for the CPU to read it. The mnemonic is a name for the instruction. For example, in assembly language I might write "mov ax, 1". The corresponding instruction for that would (in the case of an x86) be encoded as B8 00000001 (in hexadecimal).

Defining data, macros, names for functions, etc., are not actual instructions. A macro (much like a macro in C, etc.) allows you to define names during the assembly process. It might (often will) result in generating some instructions, but those are separate from the macro definition itself. Much like in C, when you define some data that will typically result in a record in the object file specifying some amount of space for name X, but doesn't directly generate any instructions.

回答4:

An assembly language is more than just a superset of the instruction set: it's a way of generating object files, symbols, debug info, linkage, and also to have some minimal structured programming even at this level. (Somewhat building on other answers/comments here)

Object file layout. For example, sections: code, data, read-only, debug, dynamic linkage. The common 'org' directive tells the assembler the location of instructions/data.
Pre-processing. This includes macros (inline expansion, repetition) and sometimes structured programming (structure layout, defining alias names for registers).
Data definition. Either including files wholesale, or defining byte/word at a time, e.g ".byte", ".word", ".dw" depending on your architecture.

Most C compilers generate assembly, which is then passed to the assembler to create object files. If you look at the output of gcc when run with flag '-S', you'll see most of the above being used. If you have debug turned on ('-g') and any dynamic linkage (default these days) you'll see a huge amount of assembly not devoted to just instructions.

回答5:

A computer (more precisely processor) can only do computation i.e. perform arithmetic and logical operations.

A single arithmetic or logical operation is called an instruction.

The collection of all instructions is called instruction set of that computer (more precisely processor).

The instruction set is either hard-wired in processor or is implemented using a technique called microcode.

The computer could only be programmed, if it had a language i.e. something it understands. Binary code is not the language of computer. Binary code based instruction set is the language of computer.

A language is nothing but a specification on paper. The first ever language designed on paper was machine language. Its implementation in computer was only possible through hardware (or the latest technique microcode). That implementation is called instruction set. All other languages would be designed on top of machine language.

Machine language was difficult to work with as we mostly work with alphabets in our daily life. Therefore, it was decided to introduce a mnemonic language called Assembly Language on top of machine language. The implementation of Assembly language was named Assembler.

[You may wonder how the first assembler was written. The first assembler may or may not be written in machine language. I'm not mentioning the concept of bootstrapping here for the sake of simplicity]

SUMMARY:

Assembly language is converted to instruction set by Assembler. And both are different sides of a coin with a layer of abstraction or mnemonic code between them. Machine language is "bit encoding" of a processor's instruction set. Assembly language is "symbolic encoding" of a processor's instruction set.

回答6:

When you look into the Wikipedia article on Assembly language you linked to in your question, there is an example below showing assembly language instructions and corresponding object code. Both are different representations of the same thing: instructions from a processor's instruction set. But only the column with the title "Instruction (AT&T syntax)" contains assembly language.

Hope this makes it clearer.

回答7:

Everything is in layered Architecture with "Strict (most of the time) and Well defined interfaces".

Start From Hardware

There are many layers until you reach up to processor.

Layer I mean we start from "physics->devices(electronics)->Analog(Amplifier)->Gates->Digital Circuits-> Micro-Architecture->Architecture(ISA, Processor)
But Start from processor, It has two parts (As most embedded systems have). : Hardware and Software.
Software part that is called ISA (Instruction Set Architecture)

It has all instructions that a respected processor can support. It means ISA is bound to only one processor (Hardware like x86).
Important thing is why this ISA is required ? Same as I told earlier it is Strict and Well Defined Interface. Processor can not run any instruction beyond ISA [Strict]

But Any one who want to use this processor can use these commands from ISA to get his work done. [Well Defined Interface]

Now come to Assembly, C, Assembler, Compiler ....

Layered Architecture you know we use it(Layered Arch) in Hardware to implement one processor for you

You can read more about why this Layered architecture. It make easy to deal with a big problem step by step.
Same here what we want? what our goal is ?

We want user can use this processor easily. Here user is programmer.
Now See the difficulty for programmer.

Can a programmer remember all instruction for a processor those are in binary format. And processor may change in next application from Intel to IBM (not version specific now).
- So here we also have layered architecture [not fixed].
- 1) Assembler - Compiler
- 2) Assembler

Assembler is also a layer what it has - two interfaces. Same with Compiler.

Ex: You write a code in C. Processor can not understand this code. It understand whatever written in binary format and defined by instruction given in ISA. But it is difficult to write(maintain|modify) a program in instruction in ISA.

1) So User write a code in C. This code a C-compiler understand. Because a user is restricted to use only syntax given in C. That means C-compiler giving a standard and well defined interface to user at one end. At the other and it can use directly ISA instruction or Another interface called "Assembler".

2) Now If you are using Assembler then Compiler will translate all C-Code in to Syntax given by Assembler. And the syntax that Assembler provide to Compiler called assembly language. It is also well defined interface and any one can use it to program in Assembly language. And at the other end Assembler converts all its syntax(Mnemonics|Directives, those are not present in ISA) to binary code instructions in ISA.

Here Some example of this translation.

In C = hello.c
In Assembly Code = hello.s
In Object Code = hello.obj (No Linking Done: More Info)

In this file one line is "Machine: Advanced Micro Devices X86-64" that is providing information about processor accordingly we are using ISA and assembler. And C programmer is not aware of this, he is free to code in C. That is the benefit of "Well Defined Interface".

In Machine Code = hello.binary (After Linking: More Info)

To Compare Just See

hello.c (C program)
hello.asm2bin (Object File Table: direct mapping Mnemonics and Binary Instruction)
hello.asm2bin_exe (Binary File Table: More mapping after linking)

You will see one line in these files "Disassembly of section .." Since what assembler do : It assemble the ISA instruction(Bit pattern) from assembly language, So here we are seeing first ISA instruction and then desassembly to Mnemonics.

All files are at this link [Download and Open]

https://www.dropbox.com/sh/v2moak4ztvs5vb7/AABRTxl7KQlqU2EkkMkKssqYa?dl=0

In Linux You can use vim, emacs to open these files.
In windows just use vim or use "Open-> Select a program from ....." option after right click on file and select text editor of your choice.

来源：https://stackoverflow.com/questions/5382130/are-instruction-set-and-assembly-language-the-same-thing

标签

assembly

instruction-set