shared-memory

How to use coalesced memory access

微笑、不失礼 提交于 2019-11-30 20:10:53
I have 'N' threads to perform simultaneously on device which they need M*N float from the global memory. What is the correct way to access the global memory coalesced? In this matter, how the shared memory can help? Usually, a good coalesced access can be achieved when the neighbouring threads access neighbouring cells in memory. So, if tid holds the index of your thread, then accessing: arr[tid] --- gives perfect coalescence arr[tid+5] --- is almost perfect, probably misaligned arr[tid*4] --- is not that good anymore, because of the gaps arr[random(0..N)] --- horrible! I am talking from the

ftruncate not working on POSIX shared memory in Mac OS X

…衆ロ難τιáo~ 提交于 2019-11-30 20:04:07
I have written a code on Mac OS X to use POSIX shared memory as shown below: #include <sys/mman.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <sys/types.h> int main() { int fileHandle = shm_open("TW_ShMem1",O_CREAT|O_RDWR, 0666); if(fileHandle==-1) { //error. } else { //Here, it is failing on Mac OS X if(-1==ftruncate(fileHandle, 8192)) { shm_unlink("TW_ShMem1"); fileHandle = -1; } else { return 0; } } return 1; } ftruncate on Linux is working without any problem. On Mac OS X, it is returning -1 and errno is EINVAL (as seen in the debugger). Why is it failing? What

Is it possible to store pointers in shared memory without using offsets?

南楼画角 提交于 2019-11-30 19:17:58
When using shared memory, each process may mmap the shared region into a different area of its respective address space. This means that when storing pointers within the shared region, you need to store them as offsets of the start of the shared region. Unfortunately, this complicates use of atomic instructions (e.g. if you're trying to write a lock free algorithm ). For example, say you have a bunch of reference counted nodes in shared memory, created by a single writer. The writer periodically atomically updates a pointer 'p' to point to a valid node with positive reference count. Readers

Sharing a complex python object in memory between separate processes

故事扮演 提交于 2019-11-30 19:16:10
问题 I have a complex python object, of size ~36GB in memory, which I would like to share between multiple separate python processes. It is stored on disk as a pickle file, which I currently load separately for every process. I want to share this object to enable execution of more processes in parallel, under the amount of memory available. This object is used, in a sense, as a read-only database. Every process initiates multiple access requests per second, and every request is just for a small

C++ boost libraries shared_memory_object undefined reference to 'shm_open'

不想你离开。 提交于 2019-11-30 17:30:28
I tried to compile the following code on ubuntu 11.04: #include <boost/interprocess/shared_memory_object.hpp> #include <iostream> int main() { boost::interprocess::shared_memory_object shdmem(boost::interprocess::open_or_create, "Highscore", boost::interprocess::read_write); shdmem.truncate(1024); std::cout << shdmem.get_name() << std::endl; boost::interprocess::offset_t size; if (shdmem.get_size(size)) std::cout << size << std::endl; } only to get the following errors: /tmp/cc786obC.o: In function `boost::interprocess::shared_memory_object::priv_open_or_create(boost::interprocess::detail:

How do I measure the size of a boost interprocess vector in shared memory?

社会主义新天地 提交于 2019-11-30 17:17:28
问题 I'm using boost::interprocess::vector to share some strings between processes, and I want to make sure I do not overflow the shared memory segment it lives in. How do I find how much space the vector takes in memory, and how much memory a special segment-allocated string will take? typedef boost::interprocess::managed_shared_memory::segment_manager SegmentManager; typedef boost::interprocess::allocator<char, SegmentManager> CharAllocator; typedef boost::interprocess::basic_string<char, std:

Shared Memory With Two Processes In C?

眉间皱痕 提交于 2019-11-30 15:51:32
问题 I want to do the following: Parent process creates a child process. Then the child process reads n int's from the user and store them in a shared memory. The parent process then displays them. I reached the following: #include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> #include <stdio.h> #define SHMSIZE 27 int main() { int shmid; int *shm; int *n; if(fork() == 0) { shmid = shmget(2009, SHMSIZE, 0); shm = shmat(shmid, 0, 0); n = shm; int i; for(i=0; i<5; i++) { printf("Enter

Making my NumPy array shared across processes

折月煮酒 提交于 2019-11-30 13:57:55
问题 I have read quite a few of the questions on SO about sharing arrays and it seems simple enough for simple arrays but I am stuck trying to get it working for the array I have. import numpy as np data=np.zeros(250,dtype='float32, (250000,2)float32') I have tried converting this to a shared array by trying to somehow make mp.Array accept the data , I have also tried creating the array as using ctypes as such: import multiprocessing as mp data=mp.Array('c_float, (250000)c_float',250) The only way

Where is linux shared memory actually located?

▼魔方 西西 提交于 2019-11-30 13:19:51
问题 I just wanted to know where shared memory resides in a Linux system? Is it in physical memory or virtual memory? I am aware about the process's virtual memory send box, they are different from process to process and processes don't see each other memory, but we can pass the data between processes using IPC. To implement the simple scenario I have just created a simple shared memory program and try to print the shared memory address and value return from shmat function, however both the

Use shared GPU memory with TensorFlow?

一曲冷凌霜 提交于 2019-11-30 11:20:00
So I installed the GPU version of TensorFlow on a Windows 10 machine with a GeForce GTX 980 graphics card on it. Admittedly, I know very little about graphics cards, but according to dxdiag it does have: 4060MB of dedicated memory (VRAM) and; 8163MB of shared memory for a total of about 12224MB . What I noticed, though, is that this "shared" memory seems to be pretty much useless. When I start training a model, the VRAM will fill up and if the memory requirement exceeds these 4GB , TensorFlow will crash with a "resource exhausted" error message. I CAN, of course, prevent reaching that point by