numa

Why is my .Net app only using single NUMA node?

故事扮演 提交于 2019-11-29 01:42:38
I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows. In my app I am starting 64 threads, using Thread.Start() . When I run the app, it's CPU intensive, only the first 16 CPUs are busy, the other 16 CPUs are idle. Why? I am using Interlocked.Increment() a lot. Could this be a reason? Is there a way I can start threads on a specific NUMA node? In addition to gcserver we should enable GCCpuGroup and Thread_UseAllCpuGroups so the config should be more like:

NUMA aware cache aligned memory allocation

允我心安 提交于 2019-11-28 19:50:33
In linux systems, pthreads library provides us a function (posix_memalign) for cache alignment to prevent false sharing. And to choose a specific NUMA node of the arhitecture we can use libnuma library. What I want is something needing both two. I bind certain threads to some certain processors and I want allocate local data structures for each thread from the corresponding NUMA node in order to decrease delay in memory operations for the threads. How can I do this? If you're just looking to get the alignment functionality around a NUMA allocator, you can easily build your own. The idea is to

Measuring NUMA (Non-Uniform Memory Access). No observable asymmetry. Why?

放肆的年华 提交于 2019-11-28 15:19:51
问题 I've tried to measure the asymmetric memory access effects of NUMA, and failed. The Experiment Performed on an Intel Xeon X5570 @ 2.93GHz, 2 CPUs, 8 cores. On a thread pinned to core 0, I allocate an array x of size 10,000,000 bytes on core 0's NUMA node with numa_alloc_local. Then I iterate over array x 50 times and read and write each byte in the array. Measure the elapsed time to do the 50 iterations. Then, on each of the other cores in my server, I pin a new thread and again measure the

NUMA aware cache aligned memory allocation

空扰寡人 提交于 2019-11-27 12:29:50
问题 In linux systems, pthreads library provides us a function (posix_memalign) for cache alignment to prevent false sharing. And to choose a specific NUMA node of the arhitecture we can use libnuma library. What I want is something needing both two. I bind certain threads to some certain processors and I want allocate local data structures for each thread from the corresponding NUMA node in order to decrease delay in memory operations for the threads. How can I do this? 回答1: If you're just

Why is my .Net app only using single NUMA node?

£可爱£侵袭症+ 提交于 2019-11-27 03:24:09
问题 I have a server with 2 NUMA node with 16 CPUs each. I can see all the 32 CPUs in task manager, first 16 (NUMA node 1) in the first 2 rows and the next 16 (NUMA node 2) in the last 2 rows. In my app I am starting 64 threads, using Thread.Start() . When I run the app, it's CPU intensive, only the first 16 CPUs are busy, the other 16 CPUs are idle. Why? I am using Interlocked.Increment() a lot. Could this be a reason? Is there a way I can start threads on a specific NUMA node? 回答1: In addition

Can I get the NUMA node from a pointer address (in C on Linux)?

五迷三道 提交于 2019-11-27 01:40:58
问题 I've set up my code to carefully load and process data locally on my NUMA system. I think. That is, for debugging purposes I'd really like to be able to use the pointer addresses being accessed inside a particular function, which have been set up by many other functions, to directly identify the NUMA node(s) that the memory pointed at is residing on, so I can check that everything is located where it should be located. Is this possible? I found this request on msdn http://social.msdn