numa

NUMA: How to check in which part of RAM a C++ array is allocated?

走远了吗. 提交于 2019-12-18 17:56:15
问题 I have a server with 2 CPU's and 64GB of RAM, 32GB per CPU. I know that each CPU has it's own part of RAM, lets call them RAM1 and RAM2. I would like to make my program know on which RAM (RAM1 or RAM2) it allocates it's data. I tried to check pointers values: // put the thread at i-th CPU, using pthread_setaffinity_np TData *a = new TData[N]; ... cout << "CPU = " << i << " adress = " << a << endl; but the output looks random. I suppose that is because addresses are virtual. Is there any

Array memory management

我与影子孤独终老i 提交于 2019-12-13 00:09:06
问题 I am doing my Computing Science project. I am doing Multiprocessor programming using C. One requirement for us it that, we cannot keep allocating small chunks of memory. Memory can be allocated in big chunks when needed. So imagine I use structures in my program. And the way that my program works requires dynamic memory allocation. But it is very costly in the hardware that we are using. So the best solution would be to allocate a big pool of memory at the beginning and whenever needed

many-core CPU's: Programming techniques to avoid disappointing scalability

别等时光非礼了梦想. 提交于 2019-12-12 09:38:10
问题 We've just bought a 32-core Opteron machine, and the speedups we get are a little disappointing: beyond about 24 threads we see no speedup at all (actually gets slower overall) and after about 6 threads it becomes significantly sub-linear. Our application is very thread-friendly: our job breaks down into about 170,000 little tasks which can each be executed separately, each taking 5-10 seconds. They all read from the same memory-mapped file of size about 4Gb. They make occasional writes to it

Move memory pages per-thread in NUMA architecture

浪尽此生 提交于 2019-12-11 08:39:26
问题 i have 2 questions in one: (i) Suppose thread X is running at CPU Y. Is it possible to use the syscalls migrate_pages - or even better move_pages (or their libnuma wrapper) - to move the pages associated with X to the node in which Y is connected? This question arrises because first argument of both syscalls is PID (and i need a per-thread approach for some researching i'm doing) (ii) in the case of positive answer for (i), how can i get all the pages used by some thread? My aim is, move the

Linux Scheduler on NUMA and SMP

泄露秘密 提交于 2019-12-10 12:07:26
问题 I wanted to know if a copy of schedule() function runs on each processor, or is it just one schedule() running for each processor. If there is a copy of schedule() running on each processor/core, how are the processes dispatched to a particular CPU/cpu runqueue. Is it the job of load balancer? Is there only one load balancer running for all CPU's or it is done in a distributed fashion using flags/communication method? ps- I know the working of scheduling classes etc but I am having a hard

numa, mbind, segfault

蹲街弑〆低调 提交于 2019-12-10 11:56:47
问题 I have allocated memory using valloc, let's say array A of [15*sizeof(double)]. Now I divided it into three pieces and I want to bind each piece (of length 5) into three NUMA nodes (let's say 0,1, and 2). Currently, I am doing the following: double* A=(double*)valloc(15*sizeof(double)); piece=5; nodemask=1; mbind(&A[0],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE); nodemask=2; mbind(&A[5],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE); nodemask=4; mbind(&A[10],piece

How is NUMA represented in virtual memory?

被刻印的时光 ゝ 提交于 2019-12-10 11:25:20
问题 There are many resources describing the architecture of NUMA from a hardware perspective and the performance implications of writing software that is NUMA-aware, but I have not yet found information regarding the how the mapping between virtual pages and physical frames is decided with respect to NUMA. More specifically, the application running on modern Linux still sees a single contiguous virtual address space. How can the application tell which parts of the address space are mapped onto

Allocating a Thread's Stack on a specific NUMA memory

我只是一个虾纸丫 提交于 2019-12-08 21:09:26
I would like to know if there is a way to create the stack of a thread on a specific NUMA node. I have written this code but i'm not sure if it does the trick or not. pthread_t thread1; int main(int argc, char**argv) { pthread_attr_t attr; pthread_attr_init(&attr); char** stackarray; int numanode = 1; stackarray = (char**) numa_alloc_onnode(sizeof(char*), numanode); // considering that the newly // created thread will be running on a core on node1 pthread_attr_setstack(&attr, stackarray[0], 1000000); pthread_create(&thread1, &attr, function, (void*)0); ... ... } Thank you for your help Here's

How to get the size of memory pointed by a pointer?

旧时模样 提交于 2019-12-08 08:24:23
问题 I am currently working on a NUMA machine. I am using numa_free to free my allocated memory. However, unlike free , numa_free needs to know how many bytes are to be freed. Is there any way to know that how many bytes are pointed to by a pointer without tracing it out? 回答1: There is no way to obtain memory size using underlying API. You must remember size during the allocation somewhere. For Example, You may write your own allocator, that allocates 4 extra bytes, stores in first 4 bytes size of

Which architecture to call Non-uniform memory access (NUMA)?

主宰稳场 提交于 2019-12-07 07:51:35
问题 According to wiki: Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. But it is not clear whether it is about any memory including caches or about main memory only. For example Xeon Phi processor have next architecture: Memory access to main memory (GDDR) is same for all cores. Meanwhile memory access to L2 cache is different for different cores, since first native L2 cache