I\'m implementing binary translation and have to deal with sequences of NOPs (0x90) with length about 16 opcodes. Is it better for performance to place JMP (to the end) at s
The Intel Architecture Software developer's guide, volume 2B (instructions N-Z) contains the following table (pg 4-12) about NOP:
Table 4-9. Recommended Multi-Byte Sequence of NOP Instruction
Length Assembly Byte Sequence ================================================================================= 2 bytes 66 NOP 66 90H 3 bytes NOP DWORD ptr [EAX] 0F 1F 00H 4 bytes NOP DWORD ptr [EAX + 00H] 0F 1F 40 00H 5 bytes NOP DWORD ptr [EAX + EAX*1 + 00H] 0F 1F 44 00 00H 6 bytes 66 NOP DWORD ptr [EAX + EAX*1 + 00H] 66 0F 1F 44 00 00H 7 bytes NOP DWORD ptr [EAX + 00000000H] 0F 1F 80 00 00 00 00H 8 bytes NOP DWORD ptr [EAX + EAX*1 + 00000000H] 0F 1F 84 00 00 00 00 00H 9 bytes 66 NOP DWORD ptr [EAX + EAX*1 + 00000000H] 66 0F 1F 84 00 00 00 00 00H
This allows you to construct "padding NOP" of certain sizes. With two of those, you can bridge 16 Bytes, although I second the suggestion to check the optimization guides (for the CPU you're targeting) whether a JMP is faster than two such NOPs.