Encoding/decoding PDP-11 assembly language to binary and hex?

问题

I've been given an assignment where I'm to write the actionPreformed methods for Encode and Decode buttons for an application where a user can hypothetically encode/decode PDP-11 instructions (that are limited to ADD, SUB, MOV[B], CMP[B]). The application is supposed to be able to take a 4 digit hex number or a 16 digit binary and decode it to assembly, or enter one of the assembly instructions and encode it to binary/hex.

This is all rather new to me and I can't seem to wrap my head around how I can do this. Does anyone have a way to break this process down a little more simply, or have any pointers of where I should get started? I'm not too sure where to begin. Thank you!

回答1:

If you have a list of items, say

lampshade
tweezers
glass
door
flowers

^{_{Courtesy of randomlists.com}}

You can assign each one a number (1 = lampshade, 2 = tweezers, etc.), we call this encoding.
This maps things into numbers, but numbers are still abstract concepts, we need to use concrete things to encode them.
The number of concrete things used is finite and fixed at design times, it determines how big the numbers can be.
Due to the current and past technology used to build computers, it is customary to encode numbers using a finite set of bi-stable entities, each entity has two states that we traditionally denote as "0" and "1" and n such entries have 2ⁿ possible state.
If we denote a specific state with a graphism where all the states of the single entities are presented, we end up with strings like "01101".
In mapping the first k natural numbers, thus including 0, there arises a natural map between the encoding of a number (e.g. "01101") and the binary system (e.g. the number, 13, expressed by the numeral 01101).

We can, thus, encoding things with numbers of finite size and express those number with any numerical base we want.
As seen, of particular interest is the base 2, other important bases are the octal base and the hexadecimal base.
Their importance arose from the fact that a digit in each of those bases is expressed with an exact number of binary digits (a.k.a. bits), 3 for the octal, 4 for the hexadecimal.
This transform long numbers into short ones.

Instructions are (abstract) things, we can then encode them with numbers.
A difficulty pops up: when considered with their operands, instructions have a lot of variants: add r0 r1 and add r0 r1 are two different things thus they must have two different numbers.
Proceeding this way would be too long, instead, we do a categorial abstraction, we break the thing we are encoding into ontological parts and encode each one separately.
The trick is that some of them are trivial.

Let's see an example, suppose we have a crossroad like this

 \ | /
 --+--
 / | \

Where 8 streets are converging.
We can encode each of the streets with a number from 0 to 7, setting the 0 on the top road and counting clockwise.
This is exactly one octal digit, how handy!
If we wanted to encode the path taken by a salesman that enter the crossroad on any street and exit it on any other (including the same) we could do it with two octal digits: the first one is the entering path, the second the egressing one.
For example, 37 would mean that they entered from SE and continue to NW.

We reduced a problem of encoding a large number of things (64 possible paths) to encoding a small set of things (the streets).
If we further wanted to encode the honesty of the salesman (either honest or not) we could have added another digit to do that: 0 = dishonest, 1 = honest.
Thus 124 is an honest salesman.
Note that 234 is not a valid encoding, this is because each octal digit is 3 bits but we needed only one (that's whey binary is the preferred way).

We can use this technique to encode the operands, addressing modes and any relevant part of an instruction.
One part that is always present is the type of instruction, this is called the operational code or opcode for short.
Luckily we don't have to do the encoding, the PDP-11 designers already did it.
We need to find the Instruction Set Architecture (ISA) reference, one is available here.

You need to read that document carefully, especially the conventions used.
Here is how the encoding for add works:

From the format we see that each instruction is 16 bits in length, thus we need ceiling(16/3) = 6 octal digits (the first one can only be 0 or 1 though).
The format also tells us that bits 0-5 are the destination (2 octal digits), bits 6-11 are the source (2 octal digits) and bits 12-15 are the opcode.
You can see that this is exactly what we did for encoding our salesman journey.
The destination and the source are the same conceptual thing in the PDP-11 ISA, this concept is called Addressing mode.
We will only consider the register addressing mode, the link above has the others.
When the destination and the source are registers, their first digit is 0 and the second one is the number of the register.
Can you see how the number of bits reserved for encoding a register (i.e. 3) and the number of registers (i.e. 8) relate?

The format is

   2 digits      1  digit     1 digit
  ______         ___         ___
 |opcode|   0   |src|   0   |dst|

This is the general format for each two operands instructions (makes sense), to see how add is encoded specifically, we take a look at the table just below the format.
It tells us that the opcode is 06.

Bingo! We have everything, the general format for add (remember, only with registers) is 060s0d where s and d are two digits for the source and destination registers.

add r4 r6 is 060406.

The rest of the homework is converting between numerals (e.g. 060406 oct in 6106 hex) and it is really independed of the PDP-11 encoding.
There is a lot of material available everywhere.

The encoding of the rest of the instructions, the rest of the addressing modes and the implementation is left to the reader as an exercise.

来源：https://stackoverflow.com/questions/48726139/encoding-decoding-pdp-11-assembly-language-to-binary-and-hex

标签

assembly

binary

hex

machine-language

pdp-11