cpu-cache

Does processor stall during cache coherence operation

天大地大妈咪最大 提交于 2021-02-07 23:43:54
问题 Let's assume that variable a = 0 Processor1: a = 1 Processor2: print(a) Processor1 executes it's instruction first then in next cycle processor2 reads variable to print it. So is: processor2 gonna stall until cache coherence operation completes and it will print 1 P1: |--a=1--|---cache--coherence---|---------------- P2: ------|stalls due to coherence-|--print(a=1)---| time: -----------------------------------------------> processor2 will operate before cache coherence operation completes and

Force a migration of a cache line to another core

天涯浪子 提交于 2021-02-07 22:43:21
问题 In C++ (using any of the low level intrinsics available on the platform) for x86 hardware (say Intel Skylake for example), is it possible to send a cacheline to another core without forcing the thread on that core to load the line explicitly? My usecase is in a concurrent data-structure. In this, for some cases a core goes through some places in memory that might be owned by some other core(s) while probing for spots. The threads on those cores are typically are blocked on a condition

MESI protocol. Write with cache miss. Why needs main memory value fetch?

我与影子孤独终老i 提交于 2021-02-05 08:09:06
问题 I'm wondering about MESI protocol implementation of writing with the allocation on write miss policy . Let's say that we have write request and got cache miss with no other copies of cache line. This diagram says that the next step is to fetch value from main memory (or L2 cache), store it and mark cache line as M (modified). I suppose then the new value is stored in cache block. The question is: Why we need the step of fetching data from main memory? Why we can't simply write the new value

MESI protocol. Write with cache miss. Why needs main memory value fetch?

不问归期 提交于 2021-02-05 08:06:32
问题 I'm wondering about MESI protocol implementation of writing with the allocation on write miss policy . Let's say that we have write request and got cache miss with no other copies of cache line. This diagram says that the next step is to fetch value from main memory (or L2 cache), store it and mark cache line as M (modified). I suppose then the new value is stored in cache block. The question is: Why we need the step of fetching data from main memory? Why we can't simply write the new value

What happens for a RIP-relative load next to the current instruction? Cache hit?

旧街凉风 提交于 2021-02-05 07:11:25
问题 I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory. This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU? Let's assume it's a relatively recent Intel architecture like Kaby Lake. 回答1: Yes, it's

What happens for a RIP-relative load next to the current instruction? Cache hit?

核能气质少年 提交于 2021-02-05 07:11:06
问题 I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory. This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU? Let's assume it's a relatively recent Intel architecture like Kaby Lake. 回答1: Yes, it's

Do store instructions block subsequent instructions on a cache miss?

ⅰ亾dé卋堺 提交于 2021-02-05 05:10:24
问题 Let's say we have a processor with two cores (C0 and C1) and a cache line starting at address k that is owned by C0 initially. If C1 issues a store instruction on a 8-byte slot at line k , will that affect the throughput of the following instructions that are being executed on C1? The intel optimziation manual has the following paragraph When an instruction writes data to a memory location [...], the processor ensures that it has the line containing this memory location is in its L1d cache [.

How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

旧城冷巷雨未停 提交于 2021-02-04 21:08:05
问题 Following is the question: We have memory system with both virtual of 64-bits and physical address of 48-bits. The L1 TLB is fully associative with 64 entries. The page size in virtual memory is 16KB. L1 cache is of 32KB and 2-way set associative, L2 cache is of 2MB and 4-way set associative. Block size of both L1 and L2 cache is 64B. L1 cache is using virtually indexed physically tagged (VIPT) scheme. We are required to compute tags, indices and offsets. This is the solution that I have

Some questions related to cache performance (computer architecture)

孤街醉人 提交于 2021-02-04 08:28:45
问题 Details about the X5650 processor at https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%20X5650%20-%20AT80614004320AD%20(BX80614X5650).html important notes: L3 cache size : 12288KB cache line size : 64 Consider the following two functions, which each increment the values in an array by 100. void incrementVector1(INT4* v, int n) { for (int k = 0; k < 100; ++k) { for (int i = 0; i < n; ++i) { v[i] = v[i] + 1; } } } void incrementVector2(INT4* v, int n) { for (int i = 0; i < n; ++i) { for (int k = 0

Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical

断了今生、忘了曾经 提交于 2021-02-04 07:31:49
问题 This question comes in context of a section on virtual memory in an undergraduate computer architecture course. Neither the teaching assistants nor the professor were able to answer it sufficiently, and online resources are limited. Question: Suppose a processor with the following specifications: 8KB pages 32-bit virtual addresses 28-bit physical addresses a two-level page table, with a 1KB page table at the first level, and 8KB page tables at the second level 4-byte page table entries a 16