What methods can be used to efficiently extend instruction length on modern x86?
Imagine you want to align a series of x86 assembly instructions to certain boundaries. For example, you may want to align loops to a 16 or 32-byte boundary, or pack instructions so they are efficiently placed in the uop cache or whatever. The simplest way to achieve this is single-byte NOP instructions, followed closely by multi-byte NOPs . Although the latter is generally more efficient, neither method is free: NOPs use front-end execution resources, and also count against your 4-wide 1 rename limit on modern x86. Another option is to somehow lengthen some instructions to get the alignment