I have a bunch of buffers (25 to 30 of them) in my application that are are fairly large (.5mb) and accessed simulataneousley. To make it even worse the data in them is gene
How to avoid polluting the caches with data like this is covered in What Every Programmer Should Know About Memory (PDF) - This is written from the perspective of Red Hat development so perfect for you. However, most of it is cross-platform.
What you want is called "Non-Temporal Access" and tell the processor to expect that the value you are reading now will not be needed again for a while. The processor then avoids caching that value.
See page 49 of the PDF I linked above. It uses the intel intrinsic to do the streaming around the cache.
On the read side, processors, until recently, lacked support aside from weak hints using non-temporal access (NTA) prefetch instructions. There is no equivalent to write-combining for reads, which is especially bad for uncacheable memory such as memory-mapped I/O. Intel, with the SSE4.1 extensions, introduced NTA loads. They are implemented using a small number of streaming load buffers; each buffer contains a cache line. The first movntdqa instruction for a given cache line will load a cache line into a buffer, possibly replacing another cache line. Subsequent 16-byte aligned accesses to the same cache line will be serviced from the load buffer at little cost. Unless there are other reasons to do so, the cache line will not be loaded into a cache, thus enabling the loading of large amounts of memory without polluting the caches. The compiler provides an intrinsic for this instruction:
#include
__m128i _mm_stream_load_si128 (__m128i *p);
This intrinsic should be used multiple times, with addresses of 16-byte blocks passed as the parameter, until each cache line is read. Only then should the next cache line be started. Since there are a few streaming read buffers it might be possible to read from two memory locations at once
It would be perfect for you if when reading, the buffers are read in linear order through memory. You use streaming reads to do so. When you want to modify them, the buffers are modified in linear order, and you can use streaming writes to do that if you don't expect to read them again any time soon from the same thread.