micro-architecture

How modern X86 processors actually compute multiplications?

女生的网名这么多〃 提交于 2020-06-08 18:44:51
问题 I was watching some lecture on algorithms, and the professor used multiplication as an example of how naive algorithms can be improved... It made me realize that multiplication is not that obvious, although when I am coding I just consider it a simple atomic operation, multiplication requires a algorithm to run, it does not work like summing numbers. So I wonder, what algorithm modern desktop processors actually use? I guess they don't rely on logarithm tables, and don't make loops with

How modern X86 processors actually compute multiplications?

穿精又带淫゛_ 提交于 2020-06-08 18:42:09
问题 I was watching some lecture on algorithms, and the professor used multiplication as an example of how naive algorithms can be improved... It made me realize that multiplication is not that obvious, although when I am coding I just consider it a simple atomic operation, multiplication requires a algorithm to run, it does not work like summing numbers. So I wonder, what algorithm modern desktop processors actually use? I guess they don't rely on logarithm tables, and don't make loops with

Are two store buffer entries needed for split line/page stores on recent Intel?

心不动则不痛 提交于 2020-06-08 16:57:10
问题 It is generally understood that one store buffer entry is allocated per store, and this store buffer entry holds the store data and physical address 1 . In the case that a store crosses a 4096-byte page boundary, two different translations may be needed, one for each page, and hence two different physical addresses may need to be stored. Does this mean that page-crossing stores take 2 store buffer entries? If so, does it apply also to line-crossing stores? 1 ... and perhaps some/all of the

How to tell length of an x86-64 instruction opcode using CPU itself?

亡梦爱人 提交于 2020-06-08 12:19:13
问题 I know that there are libraries that can "parse" binary machine code / opcode to tell the length of an x86-64 CPU instruction. But I'm wondering, since CPU has internal circuitry to determine this, is there a way to use processor itself to tell the instruction size from a binary code? (Maybe even a hack?) 回答1: The Trap Flag (TF) in EFLAGS/RFLAGS makes the CPU single-step, i.e. take an exception after running one instruction. So if you write a debugger, you can use the CPU's single-stepping

How to tell length of an x86-64 instruction opcode using CPU itself?

房东的猫 提交于 2020-06-08 12:19:06
问题 I know that there are libraries that can "parse" binary machine code / opcode to tell the length of an x86-64 CPU instruction. But I'm wondering, since CPU has internal circuitry to determine this, is there a way to use processor itself to tell the instruction size from a binary code? (Maybe even a hack?) 回答1: The Trap Flag (TF) in EFLAGS/RFLAGS makes the CPU single-step, i.e. take an exception after running one instruction. So if you write a debugger, you can use the CPU's single-stepping

How to tell length of an x86-64 instruction opcode using CPU itself?

我是研究僧i 提交于 2020-06-08 12:18:12
问题 I know that there are libraries that can "parse" binary machine code / opcode to tell the length of an x86-64 CPU instruction. But I'm wondering, since CPU has internal circuitry to determine this, is there a way to use processor itself to tell the instruction size from a binary code? (Maybe even a hack?) 回答1: The Trap Flag (TF) in EFLAGS/RFLAGS makes the CPU single-step, i.e. take an exception after running one instruction. So if you write a debugger, you can use the CPU's single-stepping

How do the store buffer and Line Fill Buffer interact with each other?

醉酒当歌 提交于 2020-05-14 19:47:47
问题 I was reading the MDS attack paper RIDL: Rogue In-Flight Data Load. They discuss how the Line Fill Buffer can cause leakage of data. There is the About the RIDL vulnerabilities and the "replaying" of loads question that discusses the micro-architectural details of the exploit. One thing that isn't clear to me after reading that question is why we need a Line Fill Buffer if we already have a store buffer. John McCalpin discusses how the store buffer and Line Fill Buffer are connected in How

Does the store buffer hold physical or virtual addresses on modern x86?

别说谁变了你拦得住时间么 提交于 2020-04-14 07:35:54
问题 Modern Intel and AMD chips have large store buffers to buffer stores before commit to the L1 cache. Conceptually, these entries hold the store data and store address. For the address part, do these buffer entries hold virtual or physical addresses, or both? 来源: https://stackoverflow.com/questions/61190976/does-the-store-buffer-hold-physical-or-virtual-addresses-on-modern-x86

Are load ops deallocated from the RS when they dispatch, complete or some other time?

主宰稳场 提交于 2020-02-24 00:38:11
问题 On modern Intel 1 x86, are load uops freed from the RS (Reservation Station) at the point they dispatch 2 , or when they complete 3 , or somewhere in-between 4 ? 1 I am also interested in AMD Zen and sequels, so feel free to include that too, but for the purposes of making the question manageable I limit it to Intel. Also, AMD seems to have a somewhat different load pipeline from Intel which may make investigating this on AMD a separate task. 2 Dispatch here means leave the RS for execution.

Are load ops deallocated from the RS when they dispatch, complete or some other time?

怎甘沉沦 提交于 2020-02-24 00:37:41
问题 On modern Intel 1 x86, are load uops freed from the RS (Reservation Station) at the point they dispatch 2 , or when they complete 3 , or somewhere in-between 4 ? 1 I am also interested in AMD Zen and sequels, so feel free to include that too, but for the purposes of making the question manageable I limit it to Intel. Also, AMD seems to have a somewhat different load pipeline from Intel which may make investigating this on AMD a separate task. 2 Dispatch here means leave the RS for execution.