cpu-cache | 易学教程

Does processor stall during cache coherence operation

阅读更多关于 Does processor stall during cache coherence operation

问题 Let's assume that variable a = 0 Processor1: a = 1 Processor2: print(a) Processor1 executes it's instruction first then in next cycle processor2 reads variable to print it. So is: processor2 gonna stall until cache coherence operation completes and it will print 1 P1: |--a=1--|---cache--coherence---|---------------- P2: ------|stalls due to coherence-|--print(a=1)---| time: -----------------------------------------------> processor2 will operate before cache coherence operation completes and

Force a migration of a cache line to another core

阅读更多关于 Force a migration of a cache line to another core

问题 In C++ (using any of the low level intrinsics available on the platform) for x86 hardware (say Intel Skylake for example), is it possible to send a cacheline to another core without forcing the thread on that core to load the line explicitly? My usecase is in a concurrent data-structure. In this, for some cases a core goes through some places in memory that might be owned by some other core(s) while probing for spots. The threads on those cores are typically are blocked on a condition

MESI protocol. Write with cache miss. Why needs main memory value fetch?

阅读更多关于 MESI protocol. Write with cache miss. Why needs main memory value fetch?

问题 I'm wondering about MESI protocol implementation of writing with the allocation on write miss policy . Let's say that we have write request and got cache miss with no other copies of cache line. This diagram says that the next step is to fetch value from main memory (or L2 cache), store it and mark cache line as M (modified). I suppose then the new value is stored in cache block. The question is: Why we need the step of fetching data from main memory? Why we can't simply write the new value

MESI protocol. Write with cache miss. Why needs main memory value fetch?

阅读更多关于 MESI protocol. Write with cache miss. Why needs main memory value fetch?

What happens for a RIP-relative load next to the current instruction? Cache hit?

阅读更多关于 What happens for a RIP-relative load next to the current instruction? Cache hit?

问题 I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory. This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU? Let's assume it's a relatively recent Intel architecture like Kaby Lake. 回答1: Yes, it's

What happens for a RIP-relative load next to the current instruction? Cache hit?

阅读更多关于 What happens for a RIP-relative load next to the current instruction? Cache hit?

Do store instructions block subsequent instructions on a cache miss?

阅读更多关于 Do store instructions block subsequent instructions on a cache miss?

问题 Let's say we have a processor with two cores (C0 and C1) and a cache line starting at address k that is owned by C0 initially. If C1 issues a store instruction on a 8-byte slot at line k , will that affect the throughput of the following instructions that are being executed on C1? The intel optimziation manual has the following paragraph When an instruction writes data to a memory location [...], the processor ensures that it has the line containing this memory location is in its L1d cache [.

How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

阅读更多关于 How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

问题 Following is the question: We have memory system with both virtual of 64-bits and physical address of 48-bits. The L1 TLB is fully associative with 64 entries. The page size in virtual memory is 16KB. L1 cache is of 32KB and 2-way set associative, L2 cache is of 2MB and 4-way set associative. Block size of both L1 and L2 cache is 64B. L1 cache is using virtually indexed physically tagged (VIPT) scheme. We are required to compute tags, indices and offsets. This is the solution that I have

Some questions related to cache performance (computer architecture)

阅读更多关于 Some questions related to cache performance (computer architecture)

问题 Details about the X5650 processor at https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20X5650%20-%20AT80614004320AD%20(BX80614X5650).html important notes: L3 cache size : 12288KB cache line size : 64 Consider the following two functions, which each increment the values in an array by 100. void incrementVector1(INT4* v, int n) { for (int k = 0; k < 100; ++k) { for (int i = 0; i < n; ++i) { v[i] = v[i] + 1; } } } void incrementVector2(INT4* v, int n) { for (int i = 0; i < n; ++i) { for (int k = 0

Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical

阅读更多关于 Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical

问题 This question comes in context of a section on virtual memory in an undergraduate computer architecture course. Neither the teaching assistants nor the professor were able to answer it sufficiently, and online resources are limited. Question: Suppose a processor with the following specifications: 8KB pages 32-bit virtual addresses 28-bit physical addresses a two-level page table, with a 1KB page table at the first level, and 8KB page tables at the second level 4-byte page table entries a 16