cpu | 易学教程

Why is a conditional move not vulnerable for Branch Prediction Failure?

阅读更多关于 Why is a conditional move not vulnerable for Branch Prediction Failure?

After reading this post (answer on StackOverflow) (at the optimization section), I was wondering why conditional moves are not vulnerable for Branch Prediction Failure. I found on an article on cond moves here (PDF by AMD) . Also there, they claim the performance advantage of cond. moves. But why is this? I don't see it. At the moment that that ASM-instruction is evaluated, the result of the preceding CMP instruction is not known yet. Thanks. Mis-predicted branches are expensive A modern processor generally executes between one and three instructions each cycle if things go well (if it does

Associativity gives us parallelizability. But what does commutativity give?

阅读更多关于 Associativity gives us parallelizability. But what does commutativity give?

问题 Alexander Stepanov notes in one of his brilliant lectures at A9 (highly recommended, by the way) that the associative property gives us parallelizability – an extremely useful and important trait these days that the compilers, CPUs and programmers themselves can leverage: // expressions in parentheses can be done in parallel // because matrix multiplication is associative Matrix X = (A * B) * (C * D); But what, if anything, does the commutative property give us? Reordering? Out of order

How does the CPU do subtraction?

阅读更多关于 How does the CPU do subtraction?

问题 I have some basic doubts, but every time I sit to try my hands at interview questions, these questions and my doubts pop up. Say A = 5, B = -2. Assuming that A and B are 4-bytes, how does the CPU do the A + B addition? I understand that A will have sign bit (MSB) as 0 to signify a positive value and B will have sign bit as 1 to signify a negative integer. Now when in C++ program, I want to print A + B , does the addition module of the ALU (Arithmetic Logic Unit) first check for sign bit and

Accessing CPU temperature in python

阅读更多关于 Accessing CPU temperature in python

问题 I need an example code for accessing CPU temperature in python. I'm running windows 7, BTW. 回答1: Use the WMI module + Open Hardware Monitor + its WMI interface described here. Sample code: import wmi w = wmi.WMI(namespace="root\OpenHardwareMonitor") temperature_infos = w.Sensor() for sensor in temperature_infos: if sensor.SensorType==u'Temperature': print(sensor.Name) print(sensor.Value) 回答2: Download http://openhardwaremonitor.org/downloads/ and http://www.cputhermometer.com/ and extract

Why is a CPU branch instruction slow?

阅读更多关于 Why is a CPU branch instruction slow?

问题 Since I started programming, I have read in every place to avoid wasteful branches at all costs. That's fine, although none of the articles explained why I should do this. What exactly happens when the CPU decodes a branch instruction and decides to do a jump? And what is the "thing" that makes it slower than other instructions (like addition)? 回答1: A branch instruction is not inherently slower than any other instruction. However, the reason you heard that branches should avoided is because

User CPU time vs System CPU time?

阅读更多关于 User CPU time vs System CPU time?

问题 Could you explain more about "user CPU time" and "system CPU time"? I have read a lot, but I couldn't understand it well. 回答1: The difference is whether the time is spent in user space or kernel space. User CPU time is time spent on the processor running your program's code (or code in libraries); system CPU time is the time spent running code in the operating system kernel on behalf of your program. 回答2: The term ‘user CPU time’ can be a bit misleading at first. To be clear, the total time

CPU Privilege Rings: Why rings 1 and 2 aren't used?

阅读更多关于 CPU Privilege Rings: Why rings 1 and 2 aren't used?

问题 A couple of questions regarding the x86 CPU privilege rings: Why aren't rings 1 and 2 used by most operating systems? Is it just to maintain code compatibility with other architectures, or is there a better reason? Are there any operating systems which actually use those rings? Or are they completely unused? 回答1: As a hobbyist operating system writer, I found that because paging (a major part of the modern protection model) only has a concept of privileged (ring 0,1,2) and unprivileged, the

Is bit shifting O(1) or O(n)?

阅读更多关于 Is bit shifting O(1) or O(n)?

问题 Are shift operations O(1) or O(n) ? Does it make sense that computers generally require more operations to shift 31 places instead of shifting 1 place? Or does it make sense the number of operations required for shifting is constant regardless of how many places we need to shift? PS: wondering if hardware is an appropriate tag.. 回答1: Some instruction sets are limited to one bit shift per instruction. And some instruction sets allow you to specify any number of bits to shift in one instruction

Return address prediction stack buffer vs stack-stored return address?

阅读更多关于 Return address prediction stack buffer vs stack-stored return address?

问题 Have been reading Agner Fog's "The microarchitecture of Intel, AMD and VIA CPUs" and on page 34 he describes "return address prediction": http://www.agner.org/optimize/microarchitecture.pdf 3.15 Returns (all processors except P1) A better method is used for returns. A Last-In-First-Out buffer, called the return stack buffer,remembers the return address every time a call instruction is executed, and it uses this for predicting where the corresponding return will go. This mechanism makes sure

How can I do a CPU cache flush in x86 Windows?

阅读更多关于 How can I do a CPU cache flush in x86 Windows?

I am interested in forcing a CPU cache flush in Windows (for benchmarking reasons, I want to emulate starting with no data in CPU cache), preferably a basic C implementation or Win32 call. Is there a known way to do this with a system call or even something as sneaky as doing say a large memcpy ? Intel i686 platform (P4 and up is okay as well). Gunther Piez Fortunately, there is more than one way to explicitly flush the caches. The instruction "wbinvd" writes back modified cache content and marks the caches empty. It executes a bus cycle to make external caches flush their data. Unfortunately,