intel | 易学教程

Are RMW instructions considered harmful on modern x86?

阅读更多关于 Are RMW instructions considered harmful on modern x86?

问题 I recall that read-modify-write instructions are generally to be avoided when optimizing x86 for speed. That is, you should avoid something like add [rsi], 10 , which adds to the memory location stored in rsi . The recommendation was usually to split it into a read-modify instruction, followed by a store, so something like: mov rax, 10 add rax, [rsp] mov [rsp], rax Alternately, you might use explicit load and stores and a reg-reg add operation: mov rax, [esp] add rax, 10 mov [rsp], rax Is

How to check with Intel intrinsics if AVX extensions is supported by the CPU?

阅读更多关于 How to check with Intel intrinsics if AVX extensions is supported by the CPU?

问题 I'm writing a program using Intel intrinsics. I want to use _mm_permute_pd intrinsic, which is only available on CPUs with AVX. For CPUs without AVX I can use _mm_shuffle_pd but according to the specs it is much slower than _mm_permute_pd . Do the header files for Intel intrinsics define constants that allow me to distinguish whether AVX is supported so that I can write sth like this: #ifdef __IS_AVX_SUPPORTED__ // is there sth like this defined? // use _mm_permute_pd # else // use _mm

Keep target address of load in register until instruction is retired

阅读更多关于 Keep target address of load in register until instruction is retired

问题 I want to use Precise Event-Based Sampling (PEBS) to record all the addresses of specific events (say cache misses for example), on a XeonE5 Sandy Bridge. However, the Performance Analysis Guide for Core TM i7 Processor and Intel® XeonTM 5500 processors , p.24, contains the following warning: As the PEBS mechanism captures the values of the register at completion of the instruction, the dereferenced address for the following type of load instruction (Intel asm convention) cannot be

Intel AVX2 Assembly Development

阅读更多关于 Intel AVX2 Assembly Development

问题 I am Optimizing the my Video Decoder using Intel assembly for 64-bit architecture. For optimization am using AVX2 instruction set. My development Environment:- OS :- Win 7(64-bit) IDE:- MSVS 2008(Prof) CPU:- Core i5(support up to AVX) Assembler:- YASM I would like to know is there any emulators to run and debug my AVX2 code without upgrading the hardware. Majorly am looking to run & debug my application on existing environment. Any suggestions? 回答1: You can download the Intel SDE (Software

Why is there three leal instructions for this IA32 assembly code?

阅读更多关于 Why is there three leal instructions for this IA32 assembly code?

问题 I compiled this C function: int calc(int x, int y, int z) { return x + 3*y + 19*z; } And I got this in calc.s, and I am annotating what is happening: .file "calc.c" .text .globl calc .type calc, @function calc: pushl %ebp //Save paramaters movl %esp, %ebp //Move stack pointer into %ebp movl 12(%ebp), %eax //Move y into %eax movl 16(%ebp), %ecx //Move z into %ecx leal (%eax,%eax,2), %eax //%eax = 3*y addl 8(%ebp), %eax //%eax = x+3y leal (%ecx,%ecx,8), %edx // ? leal (%ecx,%edx,2), %edx // ?

Does using an Intel register for its “intended purpose” increase efficiency?

阅读更多关于 Does using an Intel register for its “intended purpose” increase efficiency?

问题 This article claims that each register has an intended purpose and more importantly, When the engineers at Intel designed the original 8086 processor, they had a special purpose in mind for each register. As they designed the instruction set, they created many optimizations and special instructions based on the function they expected each register to perform. Using registers according to Intel's original plan allows the code to take full advantage of these optimizations. Unfortunately, this

Is Intel based graphic card compatible with tensorflow/GPU?

阅读更多关于 Is Intel based graphic card compatible with tensorflow/GPU?

问题 Is this graphic card compatible with tensorflow/GPU ? *-display description: VGA compatible controller product: Haswell-ULT Integrated Graphics Controller vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 09 width: 64 bits clock: 33MHz capabilities: msi pm vga_controller bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:44 memory:c2000000-c23fffff memory:b0000000-bfffffff ioport:7000(size=64) 回答1: At the moment no. Only Nvidia GPUs and

Remotely Verifying the Application in execution

阅读更多关于 Remotely Verifying the Application in execution

问题 Is it possible to prove to the remote party that the application I am running in my system is the same as I am claiming that I am running using DRTM or SRTM? If yes then How? 回答1: Theoretically: yes. The concept is called remote attestation. The basic idea is: First you have a sound chain of trust built on your platform, like: BIOS ==> Boot loader ==> OS ==> Applications The resulting measurements are stored in the PCRs. Now you can let the TPM sign this set of PCRs, that's called quote . You

intel pin RTN_InsertCall multiple function arguments

阅读更多关于 intel pin RTN_InsertCall multiple function arguments

问题 I'm trying to obtain the values of the arguments to a function using intel pin. Single argument functions are simple enough using the example ManualExamples/malloctrace.cpp . However, when I try to get the argument values with multiple arguments I run into trouble. Eg. Trying to capture the argument values of the following function: void funcA(int a, int b, int c) { printf("Actual: %i %i %i\n", a,b,c); } With the following pin code VOID funcHandler(CHAR* name, int a, int b, int c) { printf(

Half-precision floating-point arithmetic on Intel chips

阅读更多关于 Half-precision floating-point arithmetic on Intel chips

问题 Is it possible to perform half-precision floating-point arithmetic on Intel chips? I know how to load/store/convert half-precision floating-point numbers [1] but I do not know how to add/multiply them without converting to single-precision floating-point numbers. [1] https://software.intel.com/en-us/articles/performance-benefits-of-half-precision-floats 回答1: Is it possible to perform half-precision floating-point arithmetic on Intel chips? Yes, apparently the on-chip GPU in Skylake and later