cpu-cache

Why does false sharing still affect non atomics, but much less than atomics?

别来无恙 提交于 2020-06-16 18:58:29
问题 Consider the following example that proves false sharing existence: using type = std::atomic<std::int64_t>; struct alignas(128) shared_t { type a; type b; } sh; struct not_shared_t { alignas(128) type a; alignas(128) type b; } not_sh; One thread increments a by steps of 1, another thread increments b . Increments compile to lock xadd with MSVC, even though the result is unused. For a structure where a and b are separated, the values accumulated in a few seconds is about ten times greater for

When use write-through cache policy for pages

£可爱£侵袭症+ 提交于 2020-05-30 03:37:05
问题 I was reading the MDS attack paper RIDL: Rogue In-Flight Data Load. The set pages as write-back, write-through, write-combined or uncacheable and with different experiments determines that the Line Fill Buffer is the cause of the micro-architectural leaks. On a tangent: I was aware that memory can be uncacheable, but I assumed that cacheable data was always cached in a write-back cache, i.e. I assumed that the L1, L2 and LLC were always write-back caches. I read up on the differences between

Can an inner level of cache be write back inside an inclusive outer-level cache?

本秂侑毒 提交于 2020-05-29 07:40:07
问题 I have asked a similar question: Can a lower level cache have higher associativity and still hold inclusion? Suppose we have 2-level of cache. (L1 being nearest to CPU (inner / lower-level) and L2 being outside that, nearest to main memory) can L1 cache be write back? My attempt) I think we must have only write through cache and we cannot have write back cache in L1. If a block is replaced in the L1 cache then it has to be written back to L2 and also to main memory in order to hold inclusion.

When accessing memory, will the page table accessed/dirty bit be set under a cache hit situation?

可紊 提交于 2020-05-15 06:01:06
问题 As far as I know, a memory access of CPU involves CPU cache and MMU. CPU will try to find its target in cache and if a cache miss happens, CPU will turn to MMU. During accessing by MMU, the accessed/dirty bit of correspondent page table entry will be set by hardware. However to the best of my knowledge, most CPU design won't trigger the MMU unless there's a cache miss, and here my problem is, will the accessed/dirty bit of page table entry still be set under a cache hit? Or it's architecture

How do the store buffer and Line Fill Buffer interact with each other?

醉酒当歌 提交于 2020-05-14 19:47:47
问题 I was reading the MDS attack paper RIDL: Rogue In-Flight Data Load. They discuss how the Line Fill Buffer can cause leakage of data. There is the About the RIDL vulnerabilities and the "replaying" of loads question that discusses the micro-architectural details of the exploit. One thing that isn't clear to me after reading that question is why we need a Line Fill Buffer if we already have a store buffer. John McCalpin discusses how the store buffer and Line Fill Buffer are connected in How

Intel's CLWB instruction invalidating cache lines

允我心安 提交于 2020-03-09 05:34:40
问题 I am trying to find configuration or memory access pattern for Intel's clwb instruction that would not invalidate cache line. I am testing on Intel Xeon Gold 5218 processor with NVDIMMs. Linux version is 5.4.0-3-amd64. I tried using Device−DAX mode and directly mapping this char device to the address space. I also tried adding this non-volatile memory as a new NUMA node and using numactl --membind command to bind memory to it. In both cases when I use clwb to cached address, it is evicted. I

What happens to expected memory semantics (such as read after write) when a thread is scheduled on a different CPU core?

 ̄綄美尐妖づ 提交于 2020-02-24 11:13:30
问题 Code within a single thread has certain memory guarantees, such as read after write (i.e. writing some value to a memory location, then reading it back should give the value you wrote). What happens to such memory guarantees if a thread is rescheduled to execute on a different CPU core? Say a thread writes 10 to memory location X, then gets rescheduled to a different core. That core's L1 cache might have a different value for X (from another thread that was executing on that core previously),

How does the indexing of the Ice Lake's 48KiB L1 data cache work?

孤街浪徒 提交于 2020-01-24 04:27:05
问题 The Intel manual optimization (revision September 2019) shows a 48 KiB 8-way associative L1 data cache for the Ice Lake microarchitecture. 1 Software-visible latency/bandwidth will vary depending on access patterns and other factors. This baffled me because: There are 96 sets (48 KiB / 64 / 8), which is not a power of two. The indexing bits of a set and the indexing bits of the byte offset add to more than 12 bits, this makes the cheap-PIPT-as-VIPT-trick not available for 4KiB pages. All in

Inclusive or exclusive ? L1, L2 cache in Intel Core IvyBridge processor

帅比萌擦擦* 提交于 2020-01-21 02:13:06
问题 I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want to know the following with respect to my system PART1 : L1 is inclusive or exclusive ? L2 is inclusive or exclusive ? PART2 : If L1 and L2 are both inclusive then to find the access time of L2 we first declare an array(1MB) of size more than L2 cache(256KB) , then start accessing the whole array to load into L2 cache.

Inclusive or exclusive ? L1, L2 cache in Intel Core IvyBridge processor

╄→гoц情女王★ 提交于 2020-01-21 02:12:04
问题 I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want to know the following with respect to my system PART1 : L1 is inclusive or exclusive ? L2 is inclusive or exclusive ? PART2 : If L1 and L2 are both inclusive then to find the access time of L2 we first declare an array(1MB) of size more than L2 cache(256KB) , then start accessing the whole array to load into L2 cache.