intel

How is the bootstrap processor (BSP) selected on Intel ring and mesh architectures

大城市里の小女人 提交于 2019-12-24 01:07:51
问题 Section 2.13.2 mentions that the arbitration ID is used to determine which processor issues the no-op cycle first and I have seen this on multiple sources and the intel manual. The intel manual that references the MP initialisation sequence only addresses Pentium 4 when when there was a 'system bus' and before that there was originally an 'APIC bus'. I am under the impression that arbitration ID was only needed in those architectures where multiple cpus shared the same bus. But now, with the

Performance difference between two seemingly equivalent assembly codes

倖福魔咒の 提交于 2019-12-23 19:49:41
问题 tl;dr : I have two functionally equivalent C codes that I compile with Clang (the fact that it's C code doesn't matter much; only the assembly is interesting I think), and IACA tells me that one should be faster, but I don't understand why, and my benchmarks show the same performance for the two codes. I have the following C code (ignore #include "iacaMarks.h" , IACA_START , IACA_END for now): ref.c: #include "iacaMarks.h" #include <x86intrin.h> #define AND(a,b) _mm_and_si128(a,b) #define OR

Return statement does not get executed in c

筅森魡賤 提交于 2019-12-23 18:39:53
问题 So, I have a curious case and can't quite figure out what I've done wrong. Here's the scenario: I have written a creator function that should return a pointer to a function. To fill the structure with data, I read in a text file. Depending on what text file I use as input, the error either occurs or it doesn't occur. (The error occurs for a text file with ~4000 lines and not for a file with ~200 if that makes a difference). The strange thing is that the code executes until right before the

Compiler macro to detect BMI2 instruction set

牧云@^-^@ 提交于 2019-12-23 13:13:12
问题 I was searching on the web to find a proper solution, without much success. So I hope one of you know something about it: Is there any way to detect the "Intel Bit Manipulation Instruction Sets 2" (BMI2) compile time? I want to make some conditional thing based on the availability of it. 回答1: With GCC you can check for the __BMI2__ macro. This macro will be defined if the target supports BMI2 (e.g. -mbmi2 , -march=haswell ). This is the macro that the instrinsic's headers ( x86intrin.h ,

Best way to shuffle 64-bit portions of two __m128i's

我们两清 提交于 2019-12-23 07:49:54
问题 I have two __m128i s, a and b , that I want to shuffle so that the upper 64 bits of a fall in the lower 64 bits of dst and the lower 64 bits of b fall in the upper 64 of dst . i.e. dst[ 0:63] = a[64:127] dst[64:127] = b[0:63] Equivalent to: __m128i dst = _mm_unpacklo_epi64(_mm_srli_si128i(a, 8), b); or __m128i dst = _mm_castpd_si128(mm_shuffle_pd(_mm_castsi128_pd(a),_mm_castsi128_pd(b),1)); Is there a better way to do this than the first method? The second one is just one instruction, but the

Android Back Button Exiting Apps Instead Of Running Its New Code

[亡魂溺海] 提交于 2019-12-23 05:33:09
问题 I am making an app for Android using HTML-JavaScript on Intel-XDK. I'm overriding the Android Back Button function and Android Menu Button using the following code. <script src="cordova.js" type="text/javascript"></script> <script type="text/javascript"> /* Android Back Button ----------------------------------------------- */ function backButtonPressed() { isPaused = true; // To Pause } document.addEventListener("backbutton", backButtonPressed, false); /* Android Menu Button ----------------

How 'push imm' encodes?

☆樱花仙子☆ 提交于 2019-12-22 13:17:11
问题 The << Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, N-Z >> says: | Opcode* | Instruction | Op/En | 64-Bit Mode | Compat/Leg Mode | Description | | 6A | PUSH imm8 | C | Valid | Valid | Push imm8. | | 68 | PUSH imm16 | C | Valid | Valid | Push imm16. | | 68 | PUSH imm32 | C | Valid | Valid | Push imm32. | # cat -n test.asm 1 bits 64 2 3 push byte 12 4 push word 12 5 push dword 12 6 push qword 12 7 # nasm test.asm test.asm:5: error:

Is it possible for the RESOURCE_STALLS.RS event to occur even when the RS is not completely full?

若如初见. 提交于 2019-12-22 08:54:00
问题 The description of the RESOURCE_STALLS.RS hardware performance event for Intel Broadwell is the following: This event counts stall cycles caused by absence of eligible entries in the reservation station (RS). This may result from RS overflow, or from RS deallocation because of the RS array Write Port allocation scheme (each RS entry has two write ports instead of four. As a result, empty entries could not be used, although RS is not really full). This counts cycles that the pipeline backend

Xeon CPU (E5-2603) backward memory prefetch

十年热恋 提交于 2019-12-22 08:37:08
问题 Is backward memory prefetch as fast as forward memory prefetch in a Xeon CPU (E5-2603)? I want to implement an algorithm that requires both a forward loop and a backward loop over data. Since each iteration requires result from last iteration, I can't reverse the order of the loops. Thank you. 回答1: You can run experiments to determine whether the data prefetchers are able to handle forward sequential accesses and backward sequential accesses. I have a Haswell CPU and so the prefetchers might

BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?

被刻印的时光 ゝ 提交于 2019-12-22 06:55:59
问题 Are there any way to determine or any resource where I can find the branch Target Buffer size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake Intel processors? 回答1: Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/ BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf 3.7 Branch prediction in Intel Sandy Bridge and Ivy