cpu-architecture | 易学教程

How to determine SSE prefetch instruction size?

阅读更多关于 How to determine SSE prefetch instruction size?

问题 I am working with code which contains inline assembly for SSE prefetch instructions. A preprocessor constant determines whether the instructions for 32-, 64- or 128-bye prefetches are used. The application is used on a wide variety of platforms, and so far I have had to investigate in each case which is the best option for the given CPU. I understand that this is the cache line size. Is this information obtainable automatically? It doesn't seem to be explicitly present in /proc/cpuinfo. 回答1:

Why predict a branch, instead of simply executing both in parallel?

阅读更多关于 Why predict a branch, instead of simply executing both in parallel?

问题 I believe that when creating CPUs, branch prediction is a major slow down when the wrong branch is chosen. So why do CPU designers choose a branch instead of simply executing both branches, then cutting one off once you know for sure which one was chosen? I realize that this could only go 2 or 3 branches deep within a short number of instructions or the number of parallel stages would get ridiculously large, so at some point you would still need some branch prediction since you definitely

How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

阅读更多关于 How would the MONITOR instruction (_mm_monitor intrinsic) be used by a driver?

问题 I am exploring the usage of MONITOR instruction (or the equivalent intrinsic, _mm_monitor ). Although I found literature describing them, I could not find any concrete examples/samples on how to use it. Can anyone share an example of how this instruction/intrinsic would be used in a driver? Essentially, I would like to use it to watch memory ranges. 回答1: The monitor instruction arms the address monitoring hardware using the address specified in RAX/EAX/AX . Quote from Intel The state of the

Sandy-Bridge CPU specification

阅读更多关于 Sandy-Bridge CPU specification

问题 I was able to put together bits here and there about the Sandy Bridge-E architecture but I am not totally sure about all the parameters e.g. the size of the L2 cache. Can anyone please confirm they are all correct? My main source was the 64-ia-32-architectures-optimization-manual.pdf 回答1: On sandy bridge, each core has 256KB of L2 (see the datasheet, section 1.1). for 6 cores, that's 1.5MB, but since each core only accesses its own, it's better to always look at it as 256KB per core. Moreover

Is Intel's Last Branch Record feature unique to Intel processors?

阅读更多关于 Is Intel's Last Branch Record feature unique to Intel processors?

问题 Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches. They are supported across Intel Core 2, Intel Xeon and Intel Atom processor families. http://css.csail.mit.edu/6.858/2012/readings/ia32/ia32-3b.pdf document has more information in case you are interested. Is LBR-like feature available only in Intel microprocessors OR something similar exists in ARM etc. ? 回答1: To sum up, as Carl mentioned,

According to Intel my cache should be 24-way associative though its 12-way, how is that?

阅读更多关于 According to Intel my cache should be 24-way associative though its 12-way, how is that?

问题 According to “Intel 64 and IA-32 architectures optimization reference manual,” April 2012 page 2-23 The physical addresses of data kept in the LLC data arrays are distributed among the cache slices by a hash function, such that addresses are uniformly distributed. The data array in a cache block may have 4/8/12/16 ways corresponding to 0.5M/1M/1.5M/2M block size. However, due to the address distribution among the cache blocks from the software point of view, this does not appear as a normal N

While pipelining, can you consecutively write mov to the same register, or does it require 3 NOPs like add does?

阅读更多关于 While pipelining, can you consecutively write mov to the same register, or does it require 3 NOPs like add does?

This is the correct way to implement mov and add through x86 when incorporating pipelining and the necessary NOPs you need. mov $10, eax NOP NOP NOP add $2, eax If I wanted to change eax with mov, could I immedietely overwrite it with another mov since you're just overwriting what is already there, or do I need to write 3 NOPs again so it can finish the WMEDF cycle? mov $10, eax mov $12, eax or mov $10, eax NOP NOP NOP mov $12, eax This is the correct way to implement mov and add through x86 when incorporating pipelining and the necessary NOPs you need. Totally incorrect for x86. NOP is never

In x86 Intel VT-X non-root mode, can an interrupt be delivered at every instruction boundary?

阅读更多关于 In x86 Intel VT-X non-root mode, can an interrupt be delivered at every instruction boundary?

问题 Other than certain normal specified conditions where interrupts are not delivered to the virtual processor (cli, if=0, etc), are all instructions in the guest actually interruptible? That is to say, when an incoming hardware interrupt is given to the LAPIC then to the processor, supposedly some internal magic happens to translate that to a virtual interrupt to the guest (using virtual APIC, no exiting). When that happens, does the currently executing instruction immediately serialize the OOO

Which architecture to call Non-uniform memory access (NUMA)?

阅读更多关于 Which architecture to call Non-uniform memory access (NUMA)?

问题 According to wiki: Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. But it is not clear whether it is about any memory including caches or about main memory only. For example Xeon Phi processor have next architecture: Memory access to main memory (GDDR) is same for all cores. Meanwhile memory access to L2 cache is different for different cores, since first native L2 cache

Using System.getProperty(“os.arch”) to check if it is armeabi cpu

阅读更多关于 Using System.getProperty(“os.arch”) to check if it is armeabi cpu

问题 I'm having the following issue with RenderScript on some old 4.2.2- devices (galaxy s3 mini, galaxy ace 3, galaxy fresh, etc.) - Android - Renderscript Support Library - Error loading RS jni library. I want to implement the suggested solution but what exactly will be the value returned by System.getProperty("os.arch"); for armeabi devices (not armeabi-v7 devices). Thanks. 回答1: The method System.getProperty is a generic method of Java, here you can find the documentation. On Linux it returns