intel | 易学教程

Understanding %rip register in intel assembly

阅读更多关于 Understanding %rip register in intel assembly

问题 Concerning the following small code, which was illustrated in another post about the size of structure and all the possibilities to align data correctly : struct { char Data1; short Data2; int Data3; char Data4; } x; unsigned fun ( void ) { x.Data1=1; x.Data2=2; x.Data3=3; x.Data4=4; return(sizeof(x)); } I get the corresponding disassembly (with 64 bits) 0000000000000000 <fun>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb> b: 66 c7 05 00

Why does Hyper-threading get reported as supported on processors without it?

阅读更多关于 Why does Hyper-threading get reported as supported on processors without it?

问题 I'm trying to gather system information and noticed the following on an Intel Xeon E5420: After executing CPUID(EAX=1) , EDX[28] is set, indicating Hyper-threading support, despite the fact that the processor is listed on the Intel website as not supporting Hyper-threading (ark.intel.com) Does anyone have an explanation for this? 回答1: Here's the definition of that bit according to the Intel Developer's Manual: Max APIC IDs reserved field is Valid. A value of 0 for HTT indicates there is only

How might I convert Intel 80386 Machine Code to Assembly Language?

阅读更多关于 How might I convert Intel 80386 Machine Code to Assembly Language?

问题 I've been given the following task: Consider the following sequence of hexadecimal values: 55 89 E5 83 EC 08 83 E4 F0 31 C9 BA 01 00 00 00 B8 0D 00 00 00 01 D1 01 CA 48 79 F9 31 C0 C9 C3 This sequence of bytes represents a subroutine in Intel 80386 machine language in 32-bit mode. When the instructions in this subroutine are executed, they leave values in the registers %ecx and %edx. What are the values? What is the program in C that carries out the computation done by this subroutine, then

How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?

阅读更多关于 How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?

问题 We've always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it's their lookout. Maybe the test department had an AMD machine somewhere to check we didn't ship anything completely broken, but that was about it. Up until a few of years ago we just used the MSVC compiler and since it doesn't really offer a lot of processor tuning options beyond SSE level, noone worried too much about whether the code

How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?

阅读更多关于 How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?

Can one construct a “good” hash function using CRC32C as a base?

阅读更多关于 Can one construct a “good” hash function using CRC32C as a base?

问题 Given that SSE 4.2 (Intel Core i7 & i5 parts) includes a CRC32 instruction, it seems reasonable to investigate whether one could build a faster general-purpose hash function. According to this only 16 bits of a CRC32 are evenly distributed. So what other transformation would one apply to overcome that? Update How about this? Only 16 bits are suitable for a hash value. Fine. If your table is 65535 or less then great. If not, run the CRC value through the Nehalem POPCNT (population count)

How are denormalized floats handled in C#?

阅读更多关于 How are denormalized floats handled in C#?

问题 Just read this fascinating article about the 20x-200x slowdowns you can get on Intel CPUs with denormalized floats (floating point numbers very close to 0). There is an option with SSE to round these off to 0, restoring performance when such floating point values are encountered. How do C# apps handle this? Is there an option to enable/disable _MM_FLUSH_ZERO ? 回答1: There is no such option. The FPU control word in a C# app is initialized by the CLR at startup. Changing it is not an option

Is the Intel Xeon Phi usable without a costly Intel Compiler?

阅读更多关于 Is the Intel Xeon Phi usable without a costly Intel Compiler?

问题 Does the Intel Xeon Phi coprocessor, to be usable as parallel platform, require a license of the Intel Composer XE compiler, or are there alternative compilers? 回答1: There are a few options I can list here to use/get the Intel compiler...gcc, as you know, is not equipped to vectorize code for this platform. There is a non-commercial license of the Intel compiler for Linux* that provides the same Intel Xeon Phi coprocessor enabled Intel Development tools as a commercial/eval/academic license

C code loop performance [continued]

阅读更多关于 C code loop performance [continued]

问题 This question continues on my question here (on the advice of Mystical): C code loop performance Continuing on my question, when i use packed instructions instead of scalar instructions the code using intrinsics would look very similar: for(int i=0; i<size; i+=16) { y1 = _mm_load_ps(output[i]); … y4 = _mm_load_ps(output[i+12]); for(k=0; k<ksize; k++){ for(l=0; l<ksize; l++){ w = _mm_set_ps1(weight[i+k+l]); x1 = _mm_load_ps(input[i+k+l]); y1 = _mm_add_ps(y1,_mm_mul_ps(w,x1)); … x4 = _mm_load

Loop in Intel x86 Assembly going on forever

阅读更多关于 Loop in Intel x86 Assembly going on forever

问题 I'm currently learning Intel x86 Assembly, and I've run into a problem while trying to construct a simple loop, which loops 10 times. It's supposed to stop after the 10 loops, but it keeps going on and on, forever. This is the code that I am using: section .data msg db "Hello, World!", 0x0a len equ $-msg section .text global _start _start: mov cx, 10 ; loop counter _loop_start: mov ebx, 0x01 mov ecx, msg mov edx, len mov eax, 0x04 int 0x80 dec cx cmp cx, 0 jge _loop_start _done: mov ebx, 0x00