cpu-architecture

Can an inner level of cache be write back inside an inclusive outer-level cache?

本秂侑毒 提交于 2020-05-29 07:40:07
问题 I have asked a similar question: Can a lower level cache have higher associativity and still hold inclusion? Suppose we have 2-level of cache. (L1 being nearest to CPU (inner / lower-level) and L2 being outside that, nearest to main memory) can L1 cache be write back? My attempt) I think we must have only write through cache and we cannot have write back cache in L1. If a block is replaced in the L1 cache then it has to be written back to L2 and also to main memory in order to hold inclusion.

Why not just predict both branches?

别等时光非礼了梦想. 提交于 2020-05-25 04:57:05
问题 CPU's use branch prediction to speed up code, but only if the first branch is actually taken. Why not simply take both branches? That is, assume both branches will be hit, cache both sides, and the take the proper one when necessary. The cache does not need to be invalidated. While this requires the compiler to load both branches before hand(more memory, proper layout, etc), I imagine that proper optimization could streamline both so that one can get near optimal results from a single

what is Interruptible-restartable instructions in ARM cortex m0/m0+

怎甘沉沦 提交于 2020-05-15 10:25:48
问题 I am currently reading ARM Cortex M0+ User Guide on ARM website shown below http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0662b/CHDBIBGJ.html In User Manual, following paragraph is mentioned: Interruptible-restartable instructions The interruptible-restartable instructions are LDM, STM, PUSH, POP and, in 32-cycle multiplier implementations, MULS. When an interrupt occurs during the execution of one of these instructions, the processor abandons execution of the instruction.

When accessing memory, will the page table accessed/dirty bit be set under a cache hit situation?

可紊 提交于 2020-05-15 06:01:06
问题 As far as I know, a memory access of CPU involves CPU cache and MMU. CPU will try to find its target in cache and if a cache miss happens, CPU will turn to MMU. During accessing by MMU, the accessed/dirty bit of correspondent page table entry will be set by hardware. However to the best of my knowledge, most CPU design won't trigger the MMU unless there's a cache miss, and here my problem is, will the accessed/dirty bit of page table entry still be set under a cache hit? Or it's architecture

Why is LOCK a full barrier on x86?

倖福魔咒の 提交于 2020-05-15 03:45:27
问题 Why does the LOCK prefix cause a full barrier on x86? (And thus it drains the store buffer and has sequential consistency) For LOCK /read-modify-write operations, a full barrier shouldn't be required and exclusive access to the cache line seems to be sufficient. Is it a design choice or is there some other limitation? 回答1: Long time ago, before the Intel 80486, Intel processors didn't have on-chip caches or write buffers. Therefore, by design, all writes become immediately globally visible in

How do the store buffer and Line Fill Buffer interact with each other?

醉酒当歌 提交于 2020-05-14 19:47:47
问题 I was reading the MDS attack paper RIDL: Rogue In-Flight Data Load. They discuss how the Line Fill Buffer can cause leakage of data. There is the About the RIDL vulnerabilities and the "replaying" of loads question that discusses the micro-architectural details of the exploit. One thing that isn't clear to me after reading that question is why we need a Line Fill Buffer if we already have a store buffer. John McCalpin discusses how the store buffer and Line Fill Buffer are connected in How

32-byte aligned routine does not fit the uops cache

て烟熏妆下的殇ゞ 提交于 2020-05-07 05:27:10
问题 KbL i7-8550U I'm researching the behavior of uops-cache and came across a misunderstanding regarding it. As specified in the Intel Optimization Manual 2.5.2.2 (emp. mine): The Decoded ICache consists of 32 sets. Each set contains eight Ways. Each Way can hold up to six micro-ops. - All micro-ops in a Way represent instructions which are statically contiguous in the code and have their EIPs within the same aligned 32-byte region. - Up to three Ways may be dedicated to the same 32-byte aligned

32-byte aligned routine does not fit the uops cache

北城余情 提交于 2020-05-07 05:26:06
问题 KbL i7-8550U I'm researching the behavior of uops-cache and came across a misunderstanding regarding it. As specified in the Intel Optimization Manual 2.5.2.2 (emp. mine): The Decoded ICache consists of 32 sets. Each set contains eight Ways. Each Way can hold up to six micro-ops. - All micro-ops in a Way represent instructions which are statically contiguous in the code and have their EIPs within the same aligned 32-byte region. - Up to three Ways may be dedicated to the same 32-byte aligned

Error while creating a module that implements a register file that does vector subtraction (Verilog)

风流意气都作罢 提交于 2020-04-18 03:46:55
问题 I am very new to Verilog and I have been given a task to create a module that implements a register file with subtraction functionality. I do have an basic idea (I think) I do know that I need to do this by supplying the output of a XOR gate bundle as the second operand of the adder and a cary-in that is 1 (high or true) when the operation is subtraction and 0 (low or false) when it is anything else. I dont know how to do this. Any help would be appreciated. Here is what I have so far: module

Does the store buffer hold physical or virtual addresses on modern x86?

别说谁变了你拦得住时间么 提交于 2020-04-14 07:35:54
问题 Modern Intel and AMD chips have large store buffers to buffer stores before commit to the L1 cache. Conceptually, these entries hold the store data and store address. For the address part, do these buffer entries hold virtual or physical addresses, or both? 来源: https://stackoverflow.com/questions/61190976/does-the-store-buffer-hold-physical-or-virtual-addresses-on-modern-x86