cpu-architecture | 易学教程

What happens for a RIP-relative load next to the current instruction? Cache hit?

阅读更多关于 What happens for a RIP-relative load next to the current instruction? Cache hit?

问题 I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory. This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU? Let's assume it's a relatively recent Intel architecture like Kaby Lake. 回答1: Yes, it's

What happens for a RIP-relative load next to the current instruction? Cache hit?

阅读更多关于 What happens for a RIP-relative load next to the current instruction? Cache hit?

Is TLB inclusive?

阅读更多关于 Is TLB inclusive?

问题 Is TLB hierarchy inclusive on modern x86 CPU (e.g. Skylake, or maybe other Lakes)? For example, prefetchtn brings data to the level cache n + 1 as well as a corresponding TLB entry in DTLB. Will it be contained in the STLB as well? 回答1: AFAIK, on Intel SnB-family 2nd-level TLB is a victim cache for first-level iTLB and dTLB. (I can't find a source for this and IDK where I read it originally. So take this with a grain of salt . I had originally thought this was a well-known fact, but it might

Do store instructions block subsequent instructions on a cache miss?

阅读更多关于 Do store instructions block subsequent instructions on a cache miss?

问题 Let's say we have a processor with two cores (C0 and C1) and a cache line starting at address k that is owned by C0 initially. If C1 issues a store instruction on a 8-byte slot at line k , will that affect the throughput of the following instructions that are being executed on C1? The intel optimziation manual has the following paragraph When an instruction writes data to a memory location [...], the processor ensures that it has the line containing this memory location is in its L1d cache [.

How to build as an ia32 solution from visual studio using cmake

阅读更多关于 How to build as an ia32 solution from visual studio using cmake

问题 I have a module project using cmake with the following configuration: cmake_minimum_required(VERSION 3.13) project(app) set(CMAKE_CXX_STANDARD 11) add_library(app MODULE src/library.cpp src/library.h) Once solution generated using cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_BUILD_TYPE=Release , I can find an app.sln solution. I open it with Visual Studio 2019 and click on the button Local Windows Debugger . I can see also a drop-down menu containing the value x64 and an item

How to build as an ia32 solution from visual studio using cmake

阅读更多关于 How to build as an ia32 solution from visual studio using cmake

How to build as an ia32 solution from visual studio using cmake

阅读更多关于 How to build as an ia32 solution from visual studio using cmake

How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

阅读更多关于 How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

问题 Following is the question: We have memory system with both virtual of 64-bits and physical address of 48-bits. The L1 TLB is fully associative with 64 entries. The page size in virtual memory is 16KB. L1 cache is of 32KB and 2-way set associative, L2 cache is of 2MB and 4-way set associative. Block size of both L1 and L2 cache is 64B. L1 cache is using virtually indexed physically tagged (VIPT) scheme. We are required to compute tags, indices and offsets. This is the solution that I have

Can memory store be reordered really, in an OoOE processor?

阅读更多关于 Can memory store be reordered really, in an OoOE processor?

问题 We know that two instructions can be reordered by an OoOE processor. For example, there are two global variables shared among different threads. int data; bool ready; A writer thread produce data and turn on a flag ready to allow readers to consume that data. data = 6; ready = true; Now, on an OoOE processor, these two instructions can be reordered (instruction fetch, execution). But what about the final commit/write-back of the results? i.e., will the store be in-order? From what I learned,

Can two processes simultaneously run on one CPU core?

阅读更多关于 Can two processes simultaneously run on one CPU core?

问题 Can two processes simultaneously run on one CPU core, which has hyper threading? I learn from the Internet. But, I do not see a clear straight answer. Edit: Thanks for discussion and sharing! My purse to post my question here is not to discuss about parallel computing. It will be too big to be discussed here. I just want to know if a multithread application can benefit more from hyper threading than a multi process application. After further reading, I have following as my learning notes. 1)