cpu-architecture

What happens for a RIP-relative load next to the current instruction? Cache hit?

旧街凉风 提交于 2021-02-05 07:11:25
问题 I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory. This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU? Let's assume it's a relatively recent Intel architecture like Kaby Lake. 回答1: Yes, it's

What happens for a RIP-relative load next to the current instruction? Cache hit?

核能气质少年 提交于 2021-02-05 07:11:06
问题 I am reading Agner Fog's book on x86 assembly. I am wondering about how RIP-relative addressing works in this scenario. Specifically, assume my RIP offset is +1. This suggests the data I want to read is right next to this instruction in memory. This piece of data is likely already fetched into the L1 instruction cache. Assuming that this data is not also in the L1d, what exactly will happen on the CPU? Let's assume it's a relatively recent Intel architecture like Kaby Lake. 回答1: Yes, it's

Is TLB inclusive?

好久不见. 提交于 2021-02-05 06:40:08
问题 Is TLB hierarchy inclusive on modern x86 CPU (e.g. Skylake, or maybe other Lakes)? For example, prefetchtn brings data to the level cache n + 1 as well as a corresponding TLB entry in DTLB. Will it be contained in the STLB as well? 回答1: AFAIK, on Intel SnB-family 2nd-level TLB is a victim cache for first-level iTLB and dTLB. (I can't find a source for this and IDK where I read it originally. So take this with a grain of salt . I had originally thought this was a well-known fact, but it might

Do store instructions block subsequent instructions on a cache miss?

ⅰ亾dé卋堺 提交于 2021-02-05 05:10:24
问题 Let's say we have a processor with two cores (C0 and C1) and a cache line starting at address k that is owned by C0 initially. If C1 issues a store instruction on a 8-byte slot at line k , will that affect the throughput of the following instructions that are being executed on C1? The intel optimziation manual has the following paragraph When an instruction writes data to a memory location [...], the processor ensures that it has the line containing this memory location is in its L1d cache [.

How to build as an ia32 solution from visual studio using cmake

僤鯓⒐⒋嵵緔 提交于 2021-02-04 21:28:32
问题 I have a module project using cmake with the following configuration: cmake_minimum_required(VERSION 3.13) project(app) set(CMAKE_CXX_STANDARD 11) add_library(app MODULE src/library.cpp src/library.h) Once solution generated using cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_BUILD_TYPE=Release , I can find an app.sln solution. I open it with Visual Studio 2019 and click on the button Local Windows Debugger . I can see also a drop-down menu containing the value x64 and an item

How to build as an ia32 solution from visual studio using cmake

纵饮孤独 提交于 2021-02-04 21:28:29
问题 I have a module project using cmake with the following configuration: cmake_minimum_required(VERSION 3.13) project(app) set(CMAKE_CXX_STANDARD 11) add_library(app MODULE src/library.cpp src/library.h) Once solution generated using cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_BUILD_TYPE=Release , I can find an app.sln solution. I open it with Visual Studio 2019 and click on the button Local Windows Debugger . I can see also a drop-down menu containing the value x64 and an item

How to build as an ia32 solution from visual studio using cmake

让人想犯罪 __ 提交于 2021-02-04 21:28:07
问题 I have a module project using cmake with the following configuration: cmake_minimum_required(VERSION 3.13) project(app) set(CMAKE_CXX_STANDARD 11) add_library(app MODULE src/library.cpp src/library.h) Once solution generated using cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_BUILD_TYPE=Release , I can find an app.sln solution. I open it with Visual Studio 2019 and click on the button Local Windows Debugger . I can see also a drop-down menu containing the value x64 and an item

How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB

旧城冷巷雨未停 提交于 2021-02-04 21:08:05
问题 Following is the question: We have memory system with both virtual of 64-bits and physical address of 48-bits. The L1 TLB is fully associative with 64 entries. The page size in virtual memory is 16KB. L1 cache is of 32KB and 2-way set associative, L2 cache is of 2MB and 4-way set associative. Block size of both L1 and L2 cache is 64B. L1 cache is using virtually indexed physically tagged (VIPT) scheme. We are required to compute tags, indices and offsets. This is the solution that I have

Can memory store be reordered really, in an OoOE processor?

自古美人都是妖i 提交于 2021-02-04 16:12:48
问题 We know that two instructions can be reordered by an OoOE processor. For example, there are two global variables shared among different threads. int data; bool ready; A writer thread produce data and turn on a flag ready to allow readers to consume that data. data = 6; ready = true; Now, on an OoOE processor, these two instructions can be reordered (instruction fetch, execution). But what about the final commit/write-back of the results? i.e., will the store be in-order? From what I learned,

Can two processes simultaneously run on one CPU core?

邮差的信 提交于 2021-02-04 14:49:28
问题 Can two processes simultaneously run on one CPU core, which has hyper threading? I learn from the Internet. But, I do not see a clear straight answer. Edit: Thanks for discussion and sharing! My purse to post my question here is not to discuss about parallel computing. It will be too big to be discussed here. I just want to know if a multithread application can benefit more from hyper threading than a multi process application. After further reading, I have following as my learning notes. 1)