Although I have read about movntdqa instructions regarding this but have figured out a clean way to express a memory range uncacheable or read data so as to not pollute the
All good advice from user786653; the Ulrich Drepper article especially. I'll add:
Uncached or not, the VM HW is going to have to look up page info in the TLB, which has a limited capacity. Don't underestimate the impact of TLB thrashing on random access performance. If you're not already, see the results here for why you really want to be using huge pages for your array data and not the teeny 4K default (which goes back to the days of "640K ought to be enough for anybody"). Of course if you're talking really huge arrays bigger than even a TLB full of 2MB pages can reference, even that won't help with this.
What have you got against the 'nt' instructions (e.g _mm_stream_ps
intrinsic) ? I'm unconvinced declaring pages uncached will get you any better performance than appropriate use of those, and they're much easier to use than the alternatives. Would be very interested to see evidence to the contrary though.