shared-memory

Efficiency of Multithreaded Loops

北城余情 提交于 2019-11-29 07:10:50
Greetings noble community, I want to have the following loop: for(i = 0; i < MAX; i++) A[i] = B[i] + C[i]; This will run in parallel on a shared-memory quad-core computer using threads. The two alternatives below are being considered for the code to be executed by these threads, where tid is the id of the thread: 0, 1, 2 or 3. (for simplicity, assume MAX is a multiple of 4) Option 1: for(i = tid; i < MAX; i += 4) A[i] = B[i] + C[i]; Option 2: for(i = tid*(MAX/4); i < (tid+1)*(MAX/4); i++) A[i] = B[i] + C[i]; My question is if there's one that is more efficient then the other and why? The

Does madvise(___, ___, MADV_DONTNEED) instruct the OS to lazily write to disk?

不羁岁月 提交于 2019-11-29 07:02:47
Hypothetically, suppose I want to perform sequential writing to a potentially very large file. If I mmap() a gigantic region and madvise(MADV_SEQUENTIAL) on that entire region, then I can write to the memory in a relatively efficient manner. This I have gotten to work just fine. Now, in order to free up various OS resources as I am writing, I occasionally perform a munmap() on small chunks of memory that have already been written to. My concern is that munmap() and msync()will block my thread, waiting for the data to be physically committed to disk. I cannot slow down my writer at all, so I

Create a shared-memory vector of strings

霸气de小男生 提交于 2019-11-29 04:37:30
I am trying to create a class managing a shared-memory vector of (std)strings. typedef boost::interprocess::allocator<std::string, boost::interprocess::managed_shared_memory::segment_manager> shmem_allocator; typedef boost::interprocess::vector<std::string, shmem_allocator> shmem_vector; shmem_mgr::shmem_mgr() : shmem_(create_only, SHMEM_KEY, SHMEM_SIZE), allocator_(shmem_.get_segment_manager()) { mutex_ = shmem_.find_or_construct<interprocess_mutex>(SHMEM_MUTEX)(); condition_ = shmem_.find_or_construct<interprocess_condition>(SHMEM_CONDITION)(); //buffer_ is of type shmem_vector buffer_ =

R and shared memory for parallel::mclapply

杀马特。学长 韩版系。学妹 提交于 2019-11-29 04:09:26
I am trying to take advantage of a quad-core machine by parallelizing a costly operation that is performed on a list of about 1000 items. I am using R's parallel::mclapply function currently: res = rbind.fill(parallel::mclapply(lst, fun, mc.cores=3, mc.preschedule=T)) Which works. Problem is, any additional subprocess that is spawned has to allocate a large chunk of memory: Ideally, I would like each core to access shared memory from the parent R process, so that as I increase the number of cores used in mclapply, I don't hit RAM limitations before core limitations. I'm currently at a loss on

Mutex in shared memory when one user crashes?

余生颓废 提交于 2019-11-29 02:29:43
问题 Suppose that a process is creating a mutex in shared memory and locking it and dumps core while the mutex is locked. Now in another process how do I detect that mutex is already locked but not owned by any process? 回答1: If you're working in Linux or something similar, consider using named semaphores instead of (what I assume are) pthreads mutexes. I don't think there is a way to determine the locking PID of a pthreads mutex, short of building your own registration table and also putting it in

Dynamically create a list of shared arrays using python multiprocessing

谁说我不能喝 提交于 2019-11-29 02:25:40
I'd like to share several numpy arrays between different child processes with python's multiprocessing module. I'd like the arrays to be separately lockable, and I'd like the number of arrays to be dynamically determined at runtime. Is this possible? In this answer , J.F. Sebastian lays out a nice way to use python's numpy arrays in shared memory while multiprocessing. The array is lockable, which is what I want. I would like to do something very similar, except with a variable number of shared arrays. The number of arrays would be determined at runtime. His example code is very clear and does

Fully managed shared memory .NET implementations? [closed]

我的梦境 提交于 2019-11-29 02:16:07
I'm looking for free, fully-managed implementations of shared memory for .NET (P/Invoke is acceptable, mixed C++/CLI is not). Sounds like you are looking for Memory-Mapped Files , which are supported in the .NET 4.0 BCL. Starting with the .NET Framework version 4, you can use managed code to access memory-mapped files in the same way that native Windows functions access memory-mapped files, as described in Managing Memory-Mapped Files in Win32 in the MSDN Library Well, the .NET framework is free, recommended. .NET 4.0 supports the System.IO.MemoryMappedFiles namespace classes. Shared memory is

what does it mean configuring MPI for shared memory?

我与影子孤独终老i 提交于 2019-11-29 01:31:02
问题 I have a bit of research related question. Currently I have finished implementation of structure skeleton frame work based on MPI (specifically using openmpi 6.3). the frame work is supposed to be used on single machine. now, I am comparing it with other previous skeleton implementations (such as scandium, fast-flow, ..) One thing I have noticed is that the performance of my implementation is not as good as the other implementations. I think this is because, my implementation is based on MPI

Does different process has seperate copy of Shared Static variable or common copy?

南笙酒味 提交于 2019-11-29 00:10:09
I am trying to understand the fundamental of shared memory concept. I trying to create a shared library having one function and one STATIC array variable. I want to access static array variable through the function of that shared library. Here is my shared library //foo.c #include <stdio.h> static int DATA[1024]={1 ,2 ,3 ,...., 1024}; inline void foo(void) { int j, k=0; for(j=0;j<1024;j++) { k=DATA[j]; } k+=0; } I have created shared library object (libfoo.so) by following instructions from shared library Now my questions are 1> If I access foo() from two different program ( program1 and

Performance difference between IPC shared memory and threads memory

て烟熏妆下的殇ゞ 提交于 2019-11-28 18:22:56
I hear frequently that accessing a shared memory segment between processes has no performance penalty compared to accessing process memory between threads. In other words, a multi-threaded application will not be faster than a set of processes using shared memory (excluding locking or other synchronization issues). But I have my doubts: 1) shmat() maps the local process virtual memory to the shared segment. This translation has to be performed for each shared memory address and can represent a significant cost. In a multi-threaded application there is no extra translation required: all VM