smp

Concurrent stores seen in a consistent order

馋奶兔 提交于 2019-12-11 03:36:34
问题 The Intel Architectures Software Developer's Manual, Aug. 2012, vol. 3A, sect. 8.2.2: Any two stores are seen in a consistent order by processors other than those performing the stores. But can this be so? The reason I ask is this: Consider a dual-core Intel i7 processor with HyperThreading. According to the Manual's vol. 1, Fig. 2-8, the i7's logical processors 0 and 1 share an L1/L2 cache, but its logical processors 2 and 3 share a different L1/L2 cache -- whereas all the logical processors

Linux Scheduler on NUMA and SMP

泄露秘密 提交于 2019-12-10 12:07:26
问题 I wanted to know if a copy of schedule() function runs on each processor, or is it just one schedule() running for each processor. If there is a copy of schedule() running on each processor/core, how are the processes dispatched to a particular CPU/cpu runqueue. Is it the job of load balancer? Is there only one load balancer running for all CPU's or it is done in a distributed fashion using flags/communication method? ps- I know the working of scheduling classes etc but I am having a hard

Boost threads and non-existant speedups on Linux SMPs

故事扮演 提交于 2019-12-08 03:51:22
问题 I have written a small example C++ program, using boost::thread. Since it's 215 lines, I've posted it on pastebin instead http://pastebin.com/LRZ24W7D The program creates a large number of floats (currently 1gb) and adds them up, first sequentially, and then using a number of threads (hosted inside the device_matrix class). Assuming the machine is a SMP, I'd expect to see a speedup from the code. And on my Windows machine, I see a four-fold speedup, when using 4 device_matrix instances

How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

眉间皱痕 提交于 2019-12-07 12:13:07
问题 I am exploring the usage of MONITOR instruction (or the equivalent intrinsic, _mm_monitor ). Although I found literature describing them, I could not find any concrete examples/samples on how to use it. Can anyone share an example of how this instruction/intrinsic would be used in a driver? Essentially, I would like to use it to watch memory ranges. 回答1: The monitor instruction arms the address monitoring hardware using the address specified in RAX/EAX/AX . Quote from Intel The state of the

limit the number of cores used by erlang

若如初见. 提交于 2019-12-06 02:20:22
问题 I'm running experiments on a node with 2 x Quad-Core Xeon E5520 2.2GHz, 24.0GB RAM, and Erlang R15B02 (SMP enabled). I wonder if I can limit the number of cores used by the Erlang VM so that I can temporarily disable some cores and increase the number step by step in order to test scalability. I don't have root access on this node. So I'm expecting some method which is either by specifying parameters to erl or by Erlang code. 回答1: You can limit the number of cores Erlang uses via the +S

How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

强颜欢笑 提交于 2019-12-06 01:54:13
I am exploring the usage of MONITOR instruction (or the equivalent intrinsic, _mm_monitor ). Although I found literature describing them, I could not find any concrete examples/samples on how to use it. Can anyone share an example of how this instruction/intrinsic would be used in a driver? Essentially, I would like to use it to watch memory ranges. The monitor instruction arms the address monitoring hardware using the address specified in RAX/EAX/AX . Quote from Intel The state of the monitor is used by the instruction mwait . The effective address size used (16, 32 or 64-bit) depends on the

Running code on different processor (x86 assembly)

久未见 提交于 2019-12-04 18:20:47
问题 In real mode on x86, what instructions would need to be used to run the code on a different processor, in a multiprocessor system? (I'm writing some pre-boot code in assembler that needs to set certain CPU registers, and do this on every CPU in the system, before the actual operating system boots.) 回答1: So you have a stand-alone (you said "pre-boot") program, like a bootloader, running in real mode? And this is on a PeeCee with the usual BIOS? In that case you have only one CPU running. In

how is a memory barrier in linux kernel is used

拜拜、爱过 提交于 2019-12-03 13:51:01
There is an illustration in kernel source Documentation/memory-barriers.txt, like this: CPU 1 CPU 2 ======================= ======================= { B = 7; X = 9; Y = 8; C = &Y } STORE A = 1 STORE B = 2 <write barrier> STORE C = &B LOAD X STORE D = 4 LOAD C (gets &B) LOAD *C (reads B) Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1: +-------+ : : : : | | +------+ +-------+ | Sequence of update | |------>| B=2 |----- --->| Y->8 | | of perception on | | : +------+ \ +-------+ | CPU 2 | CPU 1 | : | A=1 | \ -

Running code on different processor (x86 assembly)

女生的网名这么多〃 提交于 2019-12-03 12:03:12
In real mode on x86, what instructions would need to be used to run the code on a different processor, in a multiprocessor system? (I'm writing some pre-boot code in assembler that needs to set certain CPU registers, and do this on every CPU in the system, before the actual operating system boots.) So you have a stand-alone (you said "pre-boot") program, like a bootloader, running in real mode? And this is on a PeeCee with the usual BIOS? In that case you have only one CPU running. In order to spin-up the other CPU units an operating system will typically execute what is called the universal

Linux find out Hyper-threaded core id

删除回忆录丶 提交于 2019-12-03 09:26:50
问题 I spent this morning trying to find out how to determine which processor id is the hyper-threaded core, but without luck. I wish to find out this information and use set_affinity() to bind a process to hyper-threaded thread or non-hyper-threaded thread to profile its performance. 回答1: I discovered the simply trick to do what I need. cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list If the first number is equal to the CPU number (0 in this example) then it's a real core, if not it