cpu-architecture

How branch predictor and branch target buffer co-exist?

╄→гoц情女王★ 提交于 2020-12-30 03:51:16
问题 My question is how they co-exist and work together in modern CPU architecture? 回答1: You've got it slightly reversed. On every fetch you index into your branch predictor, which tells you whether the instruction that you have just received will be decoded into a taken branch. If not, you fetch the next sequential address. But if your branch predictor says that it will be a taken branch, you don't know which instruction to fetch next, since you haven't decoded this instruction yet. So in order

How branch predictor and branch target buffer co-exist?

ε祈祈猫儿з 提交于 2020-12-30 03:43:08
问题 My question is how they co-exist and work together in modern CPU architecture? 回答1: You've got it slightly reversed. On every fetch you index into your branch predictor, which tells you whether the instruction that you have just received will be decoded into a taken branch. If not, you fetch the next sequential address. But if your branch predictor says that it will be a taken branch, you don't know which instruction to fetch next, since you haven't decoded this instruction yet. So in order

How branch predictor and branch target buffer co-exist?

╄→гoц情女王★ 提交于 2020-12-30 03:43:01
问题 My question is how they co-exist and work together in modern CPU architecture? 回答1: You've got it slightly reversed. On every fetch you index into your branch predictor, which tells you whether the instruction that you have just received will be decoded into a taken branch. If not, you fetch the next sequential address. But if your branch predictor says that it will be a taken branch, you don't know which instruction to fetch next, since you haven't decoded this instruction yet. So in order

How branch predictor and branch target buffer co-exist?

两盒软妹~` 提交于 2020-12-30 03:42:07
问题 My question is how they co-exist and work together in modern CPU architecture? 回答1: You've got it slightly reversed. On every fetch you index into your branch predictor, which tells you whether the instruction that you have just received will be decoded into a taken branch. If not, you fetch the next sequential address. But if your branch predictor says that it will be a taken branch, you don't know which instruction to fetch next, since you haven't decoded this instruction yet. So in order

Matrix-Multiplication: Why non-blocked outperforms blocked?

本秂侑毒 提交于 2020-12-30 02:22:19
问题 I'm trying to speed up a matrix multiplication algorithm by blocking the loops to improve cache performance, yet the non-blocked version remains significantly faster regardless of matrix size, block size (I've tried lots of values between 2 and 200, potenses of 2 and others) and optimization level. Non-blocked version: for(size_t i = 0; i < n; ++i) { for(size_t k = 0; k < n; ++k) { int r = a[i][k]; for(size_t j = 0; j < n; ++j) { c[i][j] += r * b[k][j]; } } } Blocked version: for(size_t kk =

Does memory fencing blocks threads in multi-core CPUs?

蓝咒 提交于 2020-12-29 13:54:34
问题 I was reading the Intel instruction set guide 64-ia-32 guide to get an idea on memory fences. My question is that for an example with SFENCE, in order to make sure that all store operations are globally visible, does the multi-core CPU parks all the threads even running on other cores till the cache coherence achieved ? 回答1: Barriers don't make other threads/cores wait. They make some operations in the current thread wait , depending on what kind of barrier it is. Out-of-order execution of

Does memory fencing blocks threads in multi-core CPUs?

六月ゝ 毕业季﹏ 提交于 2020-12-29 13:52:02
问题 I was reading the Intel instruction set guide 64-ia-32 guide to get an idea on memory fences. My question is that for an example with SFENCE, in order to make sure that all store operations are globally visible, does the multi-core CPU parks all the threads even running on other cores till the cache coherence achieved ? 回答1: Barriers don't make other threads/cores wait. They make some operations in the current thread wait , depending on what kind of barrier it is. Out-of-order execution of

Does memory fencing blocks threads in multi-core CPUs?

旧时模样 提交于 2020-12-29 13:52:00
问题 I was reading the Intel instruction set guide 64-ia-32 guide to get an idea on memory fences. My question is that for an example with SFENCE, in order to make sure that all store operations are globally visible, does the multi-core CPU parks all the threads even running on other cores till the cache coherence achieved ? 回答1: Barriers don't make other threads/cores wait. They make some operations in the current thread wait , depending on what kind of barrier it is. Out-of-order execution of

Does memory fencing blocks threads in multi-core CPUs?

徘徊边缘 提交于 2020-12-29 13:51:09
问题 I was reading the Intel instruction set guide 64-ia-32 guide to get an idea on memory fences. My question is that for an example with SFENCE, in order to make sure that all store operations are globally visible, does the multi-core CPU parks all the threads even running on other cores till the cache coherence achieved ? 回答1: Barriers don't make other threads/cores wait. They make some operations in the current thread wait , depending on what kind of barrier it is. Out-of-order execution of

Can a hyper-threaded processor core execute two threads at the exact same time?

孤街浪徒 提交于 2020-12-27 05:29:20
问题 I'm having a hard time understanding hyper-threading. If the logical core doesn't actually exist, what's the point of using hyper-threading?. The wikipedia article states that: For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. If the two logical cores share the same execution unit, that means one of the threads will have to be put on hold while the other executes, that being said,