cpu-architecture | 易学教程

Does page walk take advantage of shared tables?

阅读更多关于 Does page walk take advantage of shared tables?

问题 Suppose two address spaces share a largish lump of non-contiguous memory. The system might want to share physical page table(s) between them. These tables wouldn't use Global bits (even if supported), and would tie them to asid s if supported. There are immediate benefits since the data cache will be less polluted than by a copy, less pinned ram, etc. Does the page walk take explicit advantage of this in any known architecture? If so, does that imply the mmu is explicitly caching & sharing

Long latency instruction

阅读更多关于 Long latency instruction

问题 I would like a long-latency single-uop x86 1 instruction, in order to create long dependency chains as part of testing microarchitectural features. Currently I'm using fsqrt , but I'm wondering is there is something better. Ideally, the instruction will score well on the following criteria: Long latency Stable/fixed latency One or a few uops (especially: not microcoded) Consumes as few uarch resources as possible (load/store buffers, page walkers, etc) Able to chain (latency-wise) with itself

Why do longer pipelines make a single delay slot insufficient?

阅读更多关于 Why do longer pipelines make a single delay slot insufficient?

问题 I read the following statement in Patterson & Hennessy's Computer Organization and Design textbook: As processors go to both longer pipelines and issuing multiple instructions per clock cycle, the branch delay becomes longer, and a single delay slot is insufficient. I can understand why "issuing multiple instructions per clock cycle" can make a single delay slot insufficient, but I don't know why "longer pipelines" cause it. Also, I do not understand why longer pipelines cause the branch

What are shadow registers in MIPS and how are they used?

阅读更多关于 What are shadow registers in MIPS and how are they used?

问题 When I read about MIPS architecture, I came across shadow registers which are said to be copies of general purpose registers. I couldn't understand the following: When are shadow registers used? 回答1: MIPS shadow registers are used to reduce register load/store overhead in handling interrupts. An interrupt to which a shadow register set is assigned does not need to save any of the existing context to provide free registers or load any interrupt-specific data stored in the shadow registers at

What are shadow registers in MIPS and how are they used?

阅读更多关于 What are shadow registers in MIPS and how are they used?

Store forwarding Address vs Data: What the difference between STD and STA in the Intel Optimization guide?

阅读更多关于 Store forwarding Address vs Data: What the difference between STD and STA in the Intel Optimization guide?

问题 I'm wondering if any Intel experts out there can tell me the difference between STD and STA with respect to the Intel Skylake core. In the Intel optimization guide, there's a picture describing the "super-scalar ports" of the Intel Cores. Here's the PDF. The picture is on page 40. . Here's another picture from page 78, this picture describes "Store Address" and "Store Data": Prepares the store forwarding and store retirement logic with the address of the data being stored. Prepares the store

Can a lower level cache have higher associativity and still hold inclusion?

阅读更多关于 Can a lower level cache have higher associativity and still hold inclusion?

问题 Can a lower level cache have higher associativity and still hold inclusion? Suppose we have 2-level of cache.(L1 being nearest to CPU and L2 being nearest to main memory) L1 cache is 2-way set associative with 4 sets and let's say L2 cache is direct mapped with 16 cache lines and assume that both caches have same block size. Then I think it will follow inclusion property even though L1(lower level) has higher associativity than L2 (upper level). As per my understanding, lower level cache can

How does Spectre attack read the cache it tricked CPU to load?

阅读更多关于 How does Spectre attack read the cache it tricked CPU to load?

问题 I understand the part of the paper where they trick the CPU to speculatively load the part of the victim memory into the CPU cache. Part I do not understand is how they retrieve it from cache. 回答1: They don't retrieve it directly (out of bounds read bytes are not "retired" by the CPU and cannot be seen by the attacker in the attack). A vector of attack is to do the "retrieval" a bit at a time. After the CPU cache has been prepared (flushing the cache where it has to be), and has been "taught"

Microarchitectural zeroing of a register via the register renamer: performance versus a mov?

阅读更多关于 Microarchitectural zeroing of a register via the register renamer: performance versus a mov?

问题 I read on a blog post that recent X86 microarchitectures are also able to handle common register zeroing idioms (such as xor-ing a register with itself) in the register renamer; in the words of the author: "the register renamer also knows how to execute these instructions – it can zero the registers itself." Does anybody know how this works in practice? I know that some ISAs, like MIPS, contain an architectural register that is always set to zero in hardware; does this mean that internally,

Why segmentation cannot be completely disable?

阅读更多关于 Why segmentation cannot be completely disable?

问题 According to AMD manual segmentation can not be disabled. My question is why, why it's impossible? Another question, it says that 64-bit disables it, what does that mean? Is segmentation completly disabled on 64-bit mode? AMD Manual: https://s7.postimg.cc/hk15o6swr/Capture.png 回答1: Introduction In 64-bit mode, whenever a non-null segment selector is loaded into any of the segment registers, the processor automatically loads the corresponding segment descriptor in the hidden part of the