How do declare a memory range as uncacheable using gcc on x86 platform?

眉间皱痕 提交于 2019-11-29 08:29:37

问题


Although I have read about movntdqa instructions regarding this but have figured out a clean way to express a memory range uncacheable or read data so as to not pollute the cache. I want to do this from gcc. My main goal is to swap to random locations in an large array. Hoping to accelerate this operation by avoiding caching since there is very little data resue.


回答1:


I think what you're describing is Memory Type Range Registers. You can control these under Linux (if available and you're user 0) using /proc/mttr / ioctl(2) see here for an example. As it works on a physical address range I think you're going to have a hard time using it in a reasonable way.

A better way is to look at the compiler intrinsics GCC provides and find one or more, that expresses your intent. Have a look at Ulrich Drepper's series on "What every programmer should know about memory", in particular part 5 which deals with bypassing the cache. It looks like _mm_prefetch(ptr, _MM_HINT_NTA) might be appropriate for your needs.

As always when it comes to performance - measure, measure, measure. Drepper's series has excellent parts detailing how this can be done (part 7) as well as code examples and other strategies to try when speeding up the memory performance of your code.




回答2:


All good advice from user786653; the Ulrich Drepper article especially. I'll add:

  • Uncached or not, the VM HW is going to have to look up page info in the TLB, which has a limited capacity. Don't underestimate the impact of TLB thrashing on random access performance. If you're not already, see the results here for why you really want to be using huge pages for your array data and not the teeny 4K default (which goes back to the days of "640K ought to be enough for anybody"). Of course if you're talking really huge arrays bigger than even a TLB full of 2MB pages can reference, even that won't help with this.

  • What have you got against the 'nt' instructions (e.g _mm_stream_ps intrinsic) ? I'm unconvinced declaring pages uncached will get you any better performance than appropriate use of those, and they're much easier to use than the alternatives. Would be very interested to see evidence to the contrary though.



来源:https://stackoverflow.com/questions/7412169/how-do-declare-a-memory-range-as-uncacheable-using-gcc-on-x86-platform

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!