smp | 易学教程

Concurrent stores seen in a consistent order

阅读更多关于 Concurrent stores seen in a consistent order

问题 The Intel Architectures Software Developer's Manual, Aug. 2012, vol. 3A, sect. 8.2.2: Any two stores are seen in a consistent order by processors other than those performing the stores. But can this be so? The reason I ask is this: Consider a dual-core Intel i7 processor with HyperThreading. According to the Manual's vol. 1, Fig. 2-8, the i7's logical processors 0 and 1 share an L1/L2 cache, but its logical processors 2 and 3 share a different L1/L2 cache -- whereas all the logical processors

Linux Scheduler on NUMA and SMP

阅读更多关于 Linux Scheduler on NUMA and SMP

问题 I wanted to know if a copy of schedule() function runs on each processor, or is it just one schedule() running for each processor. If there is a copy of schedule() running on each processor/core, how are the processes dispatched to a particular CPU/cpu runqueue. Is it the job of load balancer? Is there only one load balancer running for all CPU's or it is done in a distributed fashion using flags/communication method? ps- I know the working of scheduling classes etc but I am having a hard

Boost threads and non-existant speedups on Linux SMPs

阅读更多关于 Boost threads and non-existant speedups on Linux SMPs

问题 I have written a small example C++ program, using boost::thread. Since it's 215 lines, I've posted it on pastebin instead http://pastebin.com/LRZ24W7D The program creates a large number of floats (currently 1gb) and adds them up, first sequentially, and then using a number of threads (hosted inside the device_matrix class). Assuming the machine is a SMP, I'd expect to see a speedup from the code. And on my Windows machine, I see a four-fold speedup, when using 4 device_matrix instances

How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

阅读更多关于 How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

问题 I am exploring the usage of MONITOR instruction (or the equivalent intrinsic, _mm_monitor ). Although I found literature describing them, I could not find any concrete examples/samples on how to use it. Can anyone share an example of how this instruction/intrinsic would be used in a driver? Essentially, I would like to use it to watch memory ranges. 回答1: The monitor instruction arms the address monitoring hardware using the address specified in RAX/EAX/AX . Quote from Intel The state of the

limit the number of cores used by erlang

阅读更多关于 limit the number of cores used by erlang

问题 I'm running experiments on a node with 2 x Quad-Core Xeon E5520 2.2GHz, 24.0GB RAM, and Erlang R15B02 (SMP enabled). I wonder if I can limit the number of cores used by the Erlang VM so that I can temporarily disable some cores and increase the number step by step in order to test scalability. I don't have root access on this node. So I'm expecting some method which is either by specifying parameters to erl or by Erlang code. 回答1: You can limit the number of cores Erlang uses via the +S

How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

阅读更多关于 How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

I am exploring the usage of MONITOR instruction (or the equivalent intrinsic, _mm_monitor ). Although I found literature describing them, I could not find any concrete examples/samples on how to use it. Can anyone share an example of how this instruction/intrinsic would be used in a driver? Essentially, I would like to use it to watch memory ranges. The monitor instruction arms the address monitoring hardware using the address specified in RAX/EAX/AX . Quote from Intel The state of the monitor is used by the instruction mwait . The effective address size used (16, 32 or 64-bit) depends on the

Running code on different processor (x86 assembly)

阅读更多关于 Running code on different processor (x86 assembly)

问题 In real mode on x86, what instructions would need to be used to run the code on a different processor, in a multiprocessor system? (I'm writing some pre-boot code in assembler that needs to set certain CPU registers, and do this on every CPU in the system, before the actual operating system boots.) 回答1: So you have a stand-alone (you said "pre-boot") program, like a bootloader, running in real mode? And this is on a PeeCee with the usual BIOS? In that case you have only one CPU running. In

how is a memory barrier in linux kernel is used

阅读更多关于 how is a memory barrier in linux kernel is used

There is an illustration in kernel source Documentation/memory-barriers.txt, like this: CPU 1 CPU 2 ======================= ======================= { B = 7; X = 9; Y = 8; C = &Y } STORE A = 1 STORE B = 2 <write barrier> STORE C = &B LOAD X STORE D = 4 LOAD C (gets &B) LOAD *C (reads B) Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1: +-------+ : : : : | | +------+ +-------+ | Sequence of update | |------>| B=2 |----- --->| Y->8 | | of perception on | | : +------+ \ +-------+ | CPU 2 | CPU 1 | : | A=1 | \ -

Running code on different processor (x86 assembly)

阅读更多关于 Running code on different processor (x86 assembly)

In real mode on x86, what instructions would need to be used to run the code on a different processor, in a multiprocessor system? (I'm writing some pre-boot code in assembler that needs to set certain CPU registers, and do this on every CPU in the system, before the actual operating system boots.) So you have a stand-alone (you said "pre-boot") program, like a bootloader, running in real mode? And this is on a PeeCee with the usual BIOS? In that case you have only one CPU running. In order to spin-up the other CPU units an operating system will typically execute what is called the universal

Linux find out Hyper-threaded core id

阅读更多关于 Linux find out Hyper-threaded core id

问题 I spent this morning trying to find out how to determine which processor id is the hyper-threaded core, but without luck. I wish to find out this information and use set_affinity() to bind a process to hyper-threaded thread or non-hyper-threaded thread to profile its performance. 回答1: I discovered the simply trick to do what I need. cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list If the first number is equal to the CPU number (0 in this example) then it's a real core, if not it