branch-prediction

How to understand macro `likely` affecting branch prediction?

吃可爱长大的小学妹 提交于 2021-02-11 13:59:49
问题 I noticed if we know there is good chance for control flow is true or false, we can tell it to compiler, for instance, in Linux kernel, there are lots of likely unlikely , actually impled by __builtin_expect provided by gcc , so I want to find out how does it work, then checked the assembly out there: 20:branch_prediction_victim.cpp **** if (array_aka[j] >= 128) 184 .loc 3 20 0 is_stmt 1 185 00f1 488B85D0 movq -131120(%rbp), %rax 185 FFFDFF 186 00f8 8B8485F0 movl -131088(%rbp,%rax,4), %eax

Performance optimisations of x86-64 assembly - Alignment and branch prediction

故事扮演 提交于 2021-02-08 19:50:37
问题 I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen() , memset() , etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to get excellent results in terms of performance, but I sometimes get weird behaviour when I try to optimise more. For instance, adding or even removing some simple instructions, or simply reorganising some local labels used with jumps completely degrades the overall performances. And there’s absolutely

branch prediction on a function pointer

ぃ、小莉子 提交于 2021-02-04 05:48:14
问题 I have a loop that is running over and over again. The logic inside that loop is dependent on the mode that the program is in. To improve performance I was thinking that an array of function pointers can be initialized, functionPtr[], so that would just call functionPtrmode that runs the right logic. The loop will stay in the same mode for many cycles (the number is unknown upfront but many thousands). The program runs on an intel x64 machine only and needs no portability. I was hoping that

branch prediction on a function pointer

旧时模样 提交于 2021-02-04 05:46:26
问题 I have a loop that is running over and over again. The logic inside that loop is dependent on the mode that the program is in. To improve performance I was thinking that an array of function pointers can be initialized, functionPtr[], so that would just call functionPtrmode that runs the right logic. The loop will stay in the same mode for many cycles (the number is unknown upfront but many thousands). The program runs on an intel x64 machine only and needs no portability. I was hoping that

Viewing the parameters of the branch predictor in gem5

穿精又带淫゛_ 提交于 2021-01-28 13:41:44
问题 part question. First, how do I configure the size of a branch predictor? I can see that I can set the type using the se.py config script and the --bp-type argument. (In my case I'm setting it to LTAGE), but how do I change the size of the tables? And is there an easy way to see the total size of all tables? My second part, is looking at the code, I don't understand the LTAGE constructor: LTAGE::LTAGE(const LTAGEParams *params) : TAGE(params), loopPredictor(params->loop_predictor) { } The

Viewing the parameters of the branch predictor in gem5

不问归期 提交于 2021-01-28 13:39:12
问题 part question. First, how do I configure the size of a branch predictor? I can see that I can set the type using the se.py config script and the --bp-type argument. (In my case I'm setting it to LTAGE), but how do I change the size of the tables? And is there an easy way to see the total size of all tables? My second part, is looking at the code, I don't understand the LTAGE constructor: LTAGE::LTAGE(const LTAGEParams *params) : TAGE(params), loopPredictor(params->loop_predictor) { } The

Viewing the parameters of the branch predictor in gem5

橙三吉。 提交于 2021-01-28 13:37:19
问题 part question. First, how do I configure the size of a branch predictor? I can see that I can set the type using the se.py config script and the --bp-type argument. (In my case I'm setting it to LTAGE), but how do I change the size of the tables? And is there an easy way to see the total size of all tables? My second part, is looking at the code, I don't understand the LTAGE constructor: LTAGE::LTAGE(const LTAGEParams *params) : TAGE(params), loopPredictor(params->loop_predictor) { } The

How many instructions need to be killed on a miss-predict in a 6-stage scalar or superscalar MIPS?

我与影子孤独终老i 提交于 2021-01-27 14:31:32
问题 I am working on a pipeline with 6 stages: F D I X0 X1 W. I am asked how many instructions need to be killed when a branch miss-predict happens. I have come up with 4. I think this because the branch resolution happens in X1 and we will need to kill all the instructions that came after the branch. In the pipeline diagram, it looks like it would require killing 4 instructions that are in the process of flowing through the pipeline. Is that correct? I am also asked how many need to be killed if

Using rdmsr/rdpmc for branch prediction accuracy

我与影子孤独终老i 提交于 2021-01-27 04:21:27
问题 I am trying to understand how does a branch prediction unit work in a CPU. I have used papi and also linux's perf-events but both of them do not give accurate results (for my case). This is my code: void func(int* arr, int sequence_len){ for(int i = 0; i < sequence_len; i++){ // region starts if(arr[i]){ do_sth(); } // region ends } } My array consists of 0's and 1's. It has a pattern with a size of sequence_len . For example, if my size is 8, then it has a pattern of 0 1 0 1 0 0 1 1 or

Using rdmsr/rdpmc for branch prediction accuracy

浪尽此生 提交于 2021-01-27 04:21:08
问题 I am trying to understand how does a branch prediction unit work in a CPU. I have used papi and also linux's perf-events but both of them do not give accurate results (for my case). This is my code: void func(int* arr, int sequence_len){ for(int i = 0; i < sequence_len; i++){ // region starts if(arr[i]){ do_sth(); } // region ends } } My array consists of 0's and 1's. It has a pattern with a size of sequence_len . For example, if my size is 8, then it has a pattern of 0 1 0 1 0 0 1 1 or