InterlockedExchange and memory alignment

守給你的承諾、 提交于 2019-12-17 18:42:02

问题


I am confused that Microsoft says memory alignment is required for InterlockedExchange however, Intel documentation says that memory alignment is not required for LOCK. Am i missing something, or whatever? thanks

from Microsoft MSDN Library

Platform SDK: DLLs, Processes, and Threads InterlockedExchange

The variable pointed to by the Target parameter must be aligned on a 32-bit boundary; otherwise, this function will behave unpredictably on multiprocessor x86 systems and any non-x86 systems.

from Intel Software Developer’s Manual;

  • LOCK instruction Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted.

    The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.

  • Memory Ordering in P6 and More Recent Processor Families

    Locked instructions have a total order.

  • Software Controlled Bus Locking

    The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK semantics are followed for as many bus cycles as necessary to update the entire operand. However, it is recommend that locked accesses be aligned on their natural boundaries for better system performance: •Any boundary for an 8-bit access (locked or otherwise). •16-bit boundary for locked word accesses. •32-bit boundary for locked doubleword accesses. •64-bit boundary for locked quadword accesses.


回答1:


Once upon a time, Microsoft supported WindowsNT on processors other than x86, such as MIPS, PowerPC, and Alpha. These processors all require alignment for their interlocked instructions, so Microsoft put the requirement in their spec to ensure that these primitives would be portable to different architectures.




回答2:


Even though the lock prefix doesn't require memory to be aligned, and the cmpxchg operation that's probably used to implement InterlockedExchange() doesn't require alignment, if the OS has enabled alignment checking then the cmpxchg will raise an alignment check exception (AC) when executed with unaligned operands. Check the docs for the cmpxchg and similar, looking at the list of protected mode exceptions. I don't know for sure that Windows enables alignment checking, but it wouldn't surprise me.




回答3:


Hey, I answered a few questions related to this, also keep in mind;

  1. There is NO byte level InterlockedExchange there IS a 16 bit short InterlockedExchange however.
  2. The documentation discrepency you refer, is probably just some documentation oversight.
  3. If you want todo Byte/Bit level atomic access, there ARE pleanty of ways todo this with the existing intrinsics, Interlocked[And8|Or8|Xor8]
  4. Any operation where your doing high-perf locking (using the machiene code like you discuss), should not be operating un-aligned (performance anti-pattern)
  5. xchg (optimized instruction with implicit LOCK prefix, optimized due to ability to cache lock and avoid a full bus lock to main memory). CAN do 8bit interlocked operations.

I nearly forgot, from Intel's TBB, they have Load/Store 8bit's defined w/o the use of implicit or explicit locking (in some cases);

.code 
    ALIGN 4
    PUBLIC c __TBB_machine_load8
__TBB_machine_Load8:
    ; If location is on stack, compiler may have failed to align it correctly, so we do dynamic check.
    mov ecx,4[esp]
    test ecx,7
    jne load_slow
    ; Load within a cache line
    sub esp,12
    fild qword ptr [ecx]
    fistp qword ptr [esp]
    mov eax,[esp]
    mov edx,4[esp]
    add esp,12
    ret

EXTRN __TBB_machine_store8_slow:PROC
.code 
    ALIGN 4
    PUBLIC c __TBB_machine_store8
__TBB_machine_Store8:
    ; If location is on stack, compiler may have failed to align it correctly, so we do dynamic check.
    mov ecx,4[esp]
    test ecx,7
    jne __TBB_machine_store8_slow ;; tail call to tbb_misc.cpp
    fild qword ptr 8[esp]
    fistp qword ptr [ecx]
    ret
end

Anyhow, hope that clears at leat some of this up for you.




回答4:


I don't understand where your Intel information is coming from.

To me, its pretty clear that Intel cares A LOT about alignment and/or spanning cache-lines.

For example, on a Core-i7 processor, you STILL have to make sure your data doesn't not span over cache-lines, or else the operation is NOT guaranteed to be atomic.

On Volume 3-I, System Programming, For x86/x64 Intel clearly states:

8.1.1 Guaranteed Atomic Operations

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:

  • Reading or writing a byte
  • Reading or writing a word aligned on a 16-bit boundary
  • Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

  • Reading or writing a quadword aligned on a 64-bit boundary
  • 16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:

  • Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.



来源:https://stackoverflow.com/questions/881820/interlockedexchange-and-memory-alignment

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!