How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

强颜欢笑 提交于 2019-12-06 01:54:13

The monitor instruction arms the address monitoring hardware using the address specified in RAX/EAX/AX.Quote from Intel
The state of the monitor is used by the instruction mwait.

The effective address size used (16, 32 or 64-bit) depends on the effective address size of the encoded instruction (i.e. it can be overridden with the 67h prefix and by default it is the same as the code size).

The address given in rax/eax/ax is the offset part of the logical address from which the linear address used to arm the monitor is computed.
The segment part is ds by default, segment override prefixes can be applied to change segment.
Being a linear address used for the monitor, paging doesn't affect the monitoring.

The availability of the monitor (and mwait) instruction is indicated by the bit CPUID.01H:ECX.MONITOR[bit 3]1.
It is a privileged instruction but Intel claims:

The instructions are conditionally available at levels greater than 0.

The suggested method to detect such condition is to try to execute monitor and handle the eventual #UD exception (in the custom way to OS reports it to a userland program).

The address range monitored must be write-back cacheable.
Due to the involvement with the cache and cache coherence subsystems the size of the address range is given in term of the minimal and maximal sizes.
CPUID.01H:EAX[bit 15:0] gives the minimal range size. This is the length of the region monitored by the hardware monitor.
However, cache coherence traffic may work with "chunks" (lines) of bigger size and a write adjacent to the monitored region would trigger it nonetheless if the latter is included in the former.
This gives rise to the maximum range size, it can be found in CPUID.01H:EBX[bit 15:0].
To properly use monitor make sure that the data structure monitored fits the minimal range size but also make sure that no agents write in the addresses next to it up to the maximal range size.

For example, if the minimal range size is 8 bytes and the maximal size is 16 bytes, make sure that the structure watched fits 8 bytes but pad it with eight more bytes to reach a total of sixteen so that no write from the 8-th to the 16-th byte occurs.

In a single cluster system, the two values above are equal. Mine are both 64 bytes.
The BIOS is responsible for reporting the cache coherence line size in IA32_MONITOR_FILTER_LINE_SIZE in multi-clustered systems.

For the purpose of instruction ordering and access right, monitor is a load.

monitor allows the programmer to specify hints and extensions.
Extensions are specified in ecx while hints are in edx.
Unsupported extensions raise a #GP exception, unsupported hints are ignored.
I'm not aware of any extension or hint for monitor, the Intel manual reports

For the Pentium 4 processor (family 15, model 3), no extensions or hints are defined.

I believe that line is true in general, it just has an outdated processor model in it.
Further, the pseudo code for monitor report a #GP If ECX ≠ 0.

Arming the monitor without checking its state afterwards (with mwait) doesn't cause any harm.

The intrinsic is void _mm_monitor(void const *p, unsigned extensions,unsigned hints).


Once the monitor is armed, it can be trigger by different conditions:

  • External interrupts: NMI, SMM, INIT, BINIT, MCERR
  • Faults, Aborts including Machine Check
  • Architectural TLB invalidations, including writes to CR0, CR3, CR4 and certain MSR writes
  • Voluntary transitions due to fast system call and far calls
  • Masked interrupt (if enabled)
  • A write in the monitored address range

The state of the monitor is not visible to the programmer but it can be tested with mwait.
mwait enters an implementation-defined low power state until the monitor is in a triggered state.
If the monitor is not into an armed state or if it is already triggered mwait is a nop otherwise it makes the processor stop executing instructions until the monitor is triggered.

mwait can also be given extensions and hints.
Extensions are set in ecx and hints in eax.
At the time of writing the only extension is:

Bit 0 Treat interrupts as break events even if masked (e.g., even if EFLAGS.IF=0). May be set only if CPUID.05H:ECX[bit 1] = 1.
Bits 31-1 Reserved

The hints lets the programmer specify the implementation defined low power mode.

Bits 3:0 Sub C-state within a C-state, indicated by bits [7:4]
Bits 7:4 Target C-state
Value of 0 means C1; 1 means C2 and so on
Value of 01111B means C0
Note: Target C states for MWAIT extensions are processor-specific C-states, not ACPI C-states

The number of sub-states of a C-mode (and thus is availability) is given in CPUID.05h.EDX:

Bits 03 - 00: Number of C0* sub C-states supported using MWAIT.
Bits 07 - 04: Number of C1* sub C-states supported using MWAIT.
Bits 11 - 08: Number of C2* sub C-states supported using MWAIT.
Bits 15 - 12: Number of C3* sub C-states supported using MWAIT.
Bits 19 - 16: Number of C4* sub C-states supported using MWAIT.
Bits 23 - 20: Number of C5* sub C-states supported using MWAIT.
Bits 27 - 24: Number of C6* sub C-states supported using MWAIT.
Bits 31 - 28: Number of C7* sub C-states supported using MWAIT.

Note that putting the CPU into a state higher than C1 disable other threads too, so the write that triggers the monitor must come from other agents.

The intrinsic is void _mm_mwait(unsigned extensions, unsigned hints).


The monitor/mwait machinery was introduced to help synchronisation between threads, it is not well suited for monitoring accesses to a memory range because the trigger conditions include frequently occurring events.
After a mwait is always mandatory to check if the monitored range was written to.
There is an example here where the pattern is as follow:

  1. The watched structure is initialized with a specific value (say 0).
  2. The monitor/mwait pair is used.
  3. At some point later, another again write a specific value (say 1) to the watched structure.
  4. The monitor is triggered and mwait "returns", the watched structure value is compared to 1 (a write occurred) and if it is not equal execution jump back to 2.

Some sample, untested pseudo-code may be:

struct MonitoredType
{
  int (*event)(struct MonitoredType const* m);              /*Return 0 to keep monitoring*/
  struct AnyType data;                                /*Less, in size, than MIN_MONITOR_RANGE*/
  char padding[MAX_MONITOR_RANGE - sizeof(AnyType)];
};

void wait_for_write(struct MonitoredType const* m)
{
   /* This may miss a write if it happens before MONITOR, beware of race conditions if necessary */
   do
   {
     _mm_monitor(&m->data, 0, 0);
     _mm_mwait(0, 0);
   } while ( ! m->event(m));
}

Care must be taken to ensure that the exit condition of mwait was a write and not one of the other events.
That's the reason for the function pointer event.

For monitoring writes/reads to linear address an alternative can be the use of the debugging registers.
See chapter 17 of Intel manual 3 and check your OS documentation for the proper use of those registers.


1 Meaning: Execute cpuid with eax set to 01h and test the bit 3 of ecx afterward. Note that IA32_MISC_ENABLE allows the OS or the firmware to disable monitor/mwait.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!