I am reading a book \"Computer Organization and Design RISC-V Edition\", and I came across the encoding for S-B and U-J instruction types.
Those types I hav
The official RISC-V spec does an excellent job of explaining every design choice in the instruction set, why something is done in that specific way. When in doubt you just need to have a look at it
So the rationale for instruction encoding is described in chapter 2.2 - Base Instruction Formats. It's all for making instruction decoding simpler and faster by
The RISC-V ISA keeps the source (
rs1andrs2) and destination (rd) registers at the same position in all formats to simplify decoding. Except for the 5-bit immediates used in CSR instructions (Chapter 9), immediates are always sign-extended, and are generally packed towards the leftmost available bits in the instruction and have been allocated to reduce hardware complexity. In particular, the sign bit for all immediates is always in bit 31 of the instruction to speed sign-extension circuitry.
Decoding register specifiers is usually on the critical paths in implementations, and so the instruction format was chosen to keep all register specifiers at the same position in all formats at the expense of having to move immediate bits across formats (a property shared with RISC-IV aka. SPUR [11]).
Look at the instruction encoding you'll see that just a single decoder is needed for each of rs1, rs2 and rd in any instruction formats that need them, and bit 31 is always the sign bit in the immediates regardless of their length, for fast sign extension
Now focus to the immediates and you'll also see that they're arranged in "weird" orders, but they also allow decoders to be shared between formats. For example bits 10:1 are always at the same place in all formats. Same to bits 19:12 in U/J and 4:1 in S/B. Those 2 pairs are actually almost the same, with the immediate is shifted left by one bit in J and B. By interleaving bit that way the most of the hard work of shifting is left to the assembler, simplifying hardware even more
2.3 Immediate Encoding Variants
The only difference between the S and B formats is that the 12-bit immediate field is used to encode branch offsets in multiples of 2 in the B format. Instead of shifting all bits in the instruction-encoded immediate left by one in hardware as is conventionally done, the middle bits (imm[10:1]) and sign bit stay in fixed positions, while the lowest bit in S format (inst[7]) encodes a high-order bit in B format.
Similarly, the only difference between the U and J formats is that the 20-bit immediate is shifted left by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instruction bits in the U and J format immediates is chosen to maximize overlap with the other formats and with each other.
Sign-extension is one of the most critical operations on immediates (particularly for XLEN>32), and in RISC-V the sign bit for all immediates is always held in bit 31 of the instruction to allow sign-extension to proceed in parallel with instruction decoding.
Although more complex implementations might have separate adders for branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead of using dynamic hardware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediate mux costs by around a factor of 2. The scrambled immediate encoding will add negligible time to static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most common short forward branches have straightforward immediate encodings.
If you're interested you can find more discussions in the official github page