I hear frequently that accessing a shared memory segment between processes has no performance penalty compared to accessing process memory between threads. In other words, a
Setting up the shared memory requires some extra work by the kernel, so attaching/detaching a shared memory region from your process may be slower than a regular memory allocation (or it may not be... I've never benchmarked that). But, once it's attached to your processes virtual memory map, shared memory is no different than any other memory for accesses, except in the case where you have multiple processors contending for the same cache-line sized chunks. So, in general, shared memory should be just as fast as any other memory for most accesses, but, depending on what you put there, and how many different threads/processes access it, you can get some slowdown for specific usage patterns.