intel | 易学教程

What is floating point speculation and how does it differ from the compiler's floating point model

阅读更多关于 What is floating point speculation and how does it differ from the compiler's floating point model

问题 The Intel C++ compiler provides two options for controlling floating point: -fp-speculation (fast/safe/strict/off) -fp-model (precise/fast/strict and source/double/extended) I think I understand what fp-model does. But what is fp-speculation and how does it relate to fp-model? I have yet to find any intel doc which explains this! 回答1: -fp-model influences how floating-point computations are carried out, and can change the numeric result (by licensing unsafe optimizations or by changing the

haxm hangs emulator on osx

阅读更多关于 haxm hangs emulator on osx

问题 After installing Intel HAXM on my osx 10.6.8, eclipse juno, adt 21.1.0 I am not able to run any of the emulators. The emulator process fries my cpu as in: where cpu usage never goes below 100%. Yet I have a huge black screen on the emulator. HAXM extension does not throw any errors. At console, I read [2013-04-02 20:09:58 - myapp] Launching a new emulator with Virtual Device 'x86' [2013-04-02 20:10:03 - Emulator] HAX is working and emulator runs in fast virt mode [2013-04-02 20:10:07 - myapp]

64 bit Assembly introduction

阅读更多关于 64 bit Assembly introduction

问题 I am looking for an articles which introduce Intel 64 bit processor and Assembly: list of x64 registers, commands syntax etc. for programmers familiar with 32 bit Assembly. Kind of "What's new" for 64 bit processor. 回答1: The Intel 64 and IA-32 Architectures Software Developer's Manuals have everything you need. 回答2: http://www.codeproject.com/KB/vista/vista_x64.aspx http://msdn.microsoft.com/en-us/library/ms235286%28VS.80%29.aspx What are the calling conventions for UNIX & Linux system calls

What is the overhead of using Intel Last Branch Record?

阅读更多关于 What is the overhead of using Intel Last Branch Record?

问题 Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches. http://css.csail.mit.edu/6.858/2012/readings/ia32/ia32-3b.pdf document has more information in case you are interested. a) Can someone give an idea of how much LBR slows down program execution of common programs - both CPU and IO intensive ? b) Will branch prediction be turned OFF when LBR tracing is ON ? 回答1: The paper Intel Code Execution

SIMD instructions lowering CPU frequency

阅读更多关于 SIMD instructions lowering CPU frequency

问题 I read this article. It talked about why AVX-512 instruction: Intel’s latest processors have advanced instructions (AVX-512) that may cause the core, or maybe the rest of the CPU to run slower because of how much power they use. I think on Agner's blog also mentioned something similar (but I can't find the exact post). I wonder what other instructions supported by Skylake have the similar effect that they will lower the power to maximize the throughput later? All the v prefixed instructions

What are “non-virtualizable” instructions in x86 architecture?

阅读更多关于 What are “non-virtualizable” instructions in x86 architecture?

问题 Before the advent of hardware assisted virtualization there were instructions that could not be virtualized due to various reasons. Can somebody please explain what those instructions are and why they cannot be virtualized? 回答1: To virtualize an ISA, certain requirements must be met. Popek and Goldberg used something like the following: A machine has at least two modes (a) user mode and (b) system mode . Typically, applications run in user mode and the operating system runs in system mode .

How can I write self-modifying code that runs efficiently on modern x64 processors?

阅读更多关于 How can I write self-modifying code that runs efficiently on modern x64 processors?

问题 I'm trying to speed up a variable-bitwidth integer compression scheme and I'm interested in generating and executing assembly code on-the-fly. Currently a lot of time is spent on mispredicted indirect branches, and generating code based on the series of bitwidths as found seems to be the only way avoid this penalty. The general technique is referred to as "subroutine threading" (or "call threading", although this has other definitions as well). The goal is to take advantage of the processors

What does Intel mean by “retired”?

阅读更多关于 What does Intel mean by “retired”?

问题 In the Intel Manual, there is mention of a lot of performance events which have descriptions like "Mispredicted taken branch instructions retired.". What exactly does retired mean in this context? Note that I have already looked at Intel's Performance Analysis Guide, which states that "retired" has a very precise meaning (on page 8), referring to the diagram on page 7, but I think I lack the background knowledge to understand exactly what is mean by Retirement / Writeback . What exactly is

Intel SSE and AVX Examples and Tutorials [closed]

阅读更多关于 Intel SSE and AVX Examples and Tutorials [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . Is there any good C/C++ tutorials or examples for learning Intel SSE and AVX instructions? I found few on Microsoft MSDN and Intel sites, but it would be great to understand it from the basics.. 回答1: For the visually inclined SIMD programmer, Stefano Tommesani's site is the best introduction to x86 SIMD

Ineffective remainder loop in my code

阅读更多关于 Ineffective remainder loop in my code

问题 I have this function: bool interpolate(const Mat &im, float ofsx, float ofsy, float a11, float a12, float a21, float a22, Mat &res) { bool ret = false; // input size (-1 for the safe bilinear interpolation) const int width = im.cols-1; const int height = im.rows-1; // output size const int halfWidth = res.cols >> 1; const int halfHeight = res.rows >> 1; float *out = res.ptr<float>(0); const float *imptr = im.ptr<float>(0); for (int j=-halfHeight; j<=halfHeight; ++j) { const float rx = ofsx +