cpu-cache | 易学教程

How do data caches route the object in this example?

阅读更多关于 How do data caches route the object in this example?

问题 Consider the diagrammed data cache architecture. (ASCII art follows.) -------------------------------------- | CPU core A | CPU core B | | |------------|------------| Devices | | Cache A1 | Cache B1 | with DMA | |-------------------------| | | Cache 2 | | |------------------------------------| | RAM | -------------------------------------- Suppose that an object is shadowed on a dirty line of Cache A1, an older version of the same object is shadowed on a clean line of Cache 2, and the newest

WC vs WB memory? Other types of memory on x86_64?

阅读更多关于 WC vs WB memory? Other types of memory on x86_64?

问题 Could you describe the meanings and the differences between WC and WB memory on x86_64? For completeness, please, describe other types of memory on x86_64, if any. 回答1: I will first start with Writeback caching (WB) since it is easier to understand. Writeback caching As the name implies this caching strategy tries to delay the writes to the system memory as long as possible. The idea is to use only the cache, ideally. However, since the cache has a finite size smaller than the finite size of

WC vs WB memory? Other types of memory on x86_64?

阅读更多关于 WC vs WB memory? Other types of memory on x86_64?

Interconnect between per-core L2 and L3 in Core i7

阅读更多关于 Interconnect between per-core L2 and L3 in Core i7

问题 The Intel core i7 has per-core L1 and L2 caches, and a large shared L3 cache. I need to know what kind of an interconnect connects the multiple L2s to the single L3. I am a student, and need to write a rough behavioral model of the cache subsystem. Is it a crossbar? A single bus? a ring? The references I came across mention structural details of the caches, but none of them mention what kind of on-chip interconnect exists. Thanks, -neha 回答1: Modern i7's use a ring. From Tom's Hardware:

Are two consequent CPU stores on x86 flushed to the cache keeping the order?

阅读更多关于 Are two consequent CPU stores on x86 flushed to the cache keeping the order?

问题 Assume there are two threads running on x86 CPU0 and CPU1 respectively. Thread running on CPU0 executes the following commands: A=1 B=1 Cache line containing A initially owned by CPU1 and that containing B owned by CPU0. I have two questions: If I understand correctly, both stores will be put into CPU’s store buffer. However, for the first store A=1 the cache of CPU1 must be invalidated while the second store B=1 can be flushed immediately since CPU0 owns the cache line containing it. I know

What is the best NHibernate cache L2 provider?

阅读更多关于 What is the best NHibernate cache L2 provider?

问题 I've seen there is a plenty of them. NCache, Velocity and so forth but I haven't found a table comparing them. What's the best considering the following criterias: Easy to understand. Is being maintained lately. Is free or has a good enough free version. Works. 回答1: I can't speak for what's best or worst, but I'll throw in my experience with NCache in case it helps. Disclaimer: NHibernate and I had some disagreements, we have since gone our separate ways :) The Good The performance was great

What use is the INVD instruction?

阅读更多关于 What use is the INVD instruction?

问题 The x86 INVD invalidates the cache hierarchy without writing the contents back to memory, apparently. I'm curious, what use is such an instruction? Given how one has very little control over what data may be in the various cache levels and even less control over what may have already been flushed asynchronously, it seems to be little more than a way to make sure you just don't know what data is held in memory anymore. 回答1: Excellent question! One use-case for such a blunt-acting instruction

WBINVD instruction usage

阅读更多关于 WBINVD instruction usage

问题 I'm trying to use the WBINV instruction on linux to clear the processor's L1 cache. The following program compiles, but produces a segmentation fault when I try to run it. int main() {asm ("wbinvd"); return 1;} I'm using gcc 4.4.3 and run Linux kernel 2.6.32-33 on my x86 box. Processor info: Intel(R) Core(TM)2 Duo CPU T5270 @ 1.40GHz I built the program as follows: $ gcc $ ./a.out Segmentation Fault Can somebody tell me what I'm doing wrong? How do I get this to run? P.S: I'm running a few

C++ cache aware programming

阅读更多关于 C++ cache aware programming

问题 is there a way in C++ to determine the CPU's cache size? i have an algorithm that processes a lot of data and i'd like to break this data down into chunks such that they fit into the cache. Is this possible? Can you give me any other hints on programming with cache-size in mind (especially in regard to multithreaded/multicore data processing)? Thanks! 回答1: According to "What every programmer should know about memory", by Ulrich Drepper you can do the following on Linux: Once we have a formula

What specifically marks an x86 cache line as dirty - any write, or is an explicit change required?

阅读更多关于 What specifically marks an x86 cache line as dirty - any write, or is an explicit change required?

问题 This question is specifically aimed at modern x86-64 cache coherent architectures - I appreciate the answer can be different on other CPUs. If I write to memory, the MESI protocol requires that the cache line is first read into cache, then modified in the cache (the value is written to the cache line which is then marked dirty). In older write-though micro-architectures, this would then trigger the cache line being flushed, under write-back the cache line being flushed can be delayed for some