cpu-architecture

difference between speculation and prediction

倾然丶 夕夏残阳落幕 提交于 2019-12-02 23:10:58
In computer architecture, what is difference between (branch) prediction and speculation?? These seems very similar, but i think there is a subtle distinction between them. Guffa Branch prediction is done by the processor to try to determine where the execution will continue after a conditional jump, so that it can read the next instruction(s) from memory. Speculative execution goes one step further and determines what the result would be from executing the next instruction(s). If the branch prediction was correct, the result is used, otherwise it is discarded. Note that speculative execution

Do sse instructions consume more power/energy?

故事扮演 提交于 2019-12-02 22:20:19
Very simple question, probably difficult answer: Does using SSE instructions for example for parallel sum/min/max/average operations consume more power than doing any other instructions (e.g. a single sum)? For example, on Wikipedia I couldn't find any information in this respect. The only hint of an answer I could find is here , but it's a little bit generic and there is no reference to any published material in this respect. Mysticial I actually did a study on this a few years ago. The answer depends on what exactly your question is: In today's processors, power consumption is not much

Intel CPUs Instruction Queue provides static branch prediction?

空扰寡人 提交于 2019-12-02 22:16:45
In Volume 3 of the Intel Manuals it contains the description of a hardware event counter: BACLEAR_FORCE_IQ Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ is also responsible for providing conditional branch prediction direction based on a static scheme and dynamic data provided by the L2 Branch Prediction Unit. If the conditional branch target is not found in the Target Array and the IQ predicts that the branch is taken, then the IQ will force the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by the BAC generates approximately an 8 cycle

Difference between armeabi and armeabi-v7a

白昼怎懂夜的黑 提交于 2019-12-02 22:15:07
As far as I can tell from the docs, the difference between the two supported flavors of ARM architecture in Android NDK is only in the set of supported CPU instructions. Is that really so? Is there no difference in calling conventions, or system call sequence, or something else? I'm wondering what will happen if I compile a module to an ARM object file (with a compiler other than NDK - Free Pascal specifically), specifying ARMv6 as the architecture, and then link it to both armeabi and armeabi-v7a shared libraries. The FPC bits are not supposed to perform neither system calls nor Java calls,

How prevalent is branch prediction on current CPUs?

大城市里の小女人 提交于 2019-12-02 22:04:33
Due to the huge impact on performance, I never wonder if my current day desktop CPU has branch prediction. Of course it does. But how about the various ARM offerings? Does iPhone or android phones have branch prediction? The older Nintendo DS? How about PowerPC based Wii? PS 3? Whether they have a complex prediction unit is not so important, but if they have at least some dynamic prediction, and whether they do some execution of instructions following an expected branch. What is the cutoff for CPUs with branch prediction? A hand held calculator from decades ago obviously doesn't have one,

How many bits is a WORD and is that constant over different architectures?

有些话、适合烂在心里 提交于 2019-12-02 20:54:37
Is a machine WORD always the same or does it depend on the machine architecture? And is the meaning of the word WORD context sensitive or generally applicable? The machine word size depends on the architecture, but also how the operating system is running the application. In Windows x64 for example an application can be run either as a 64 bit application (having a 64 bit mahine word), or as a 32 bit application (having a 32 bit machine word). So the size of a machine word can differ even on the same machine. The term WORD has different meaning depending on how it's used. It can either mean a

Why does instruction/data alignment exist?

无人久伴 提交于 2019-12-02 19:16:58
问题 I frequently see information about architecturally instructions and data must be aligned to word, half-word, etc. boundaries. While it is not difficult to follow these rules, I am just wondering if someone can tell why (generally) this requirement exists? EDIT: basically found my answer here: why is data structure alignment important for performance? 回答1: this is because of high efficient data bus design issue. With aligned memoey address, the data can fit in the data bus efficiently. For a

How many and what size cycles will be needed to perform longword transferred to the CPU

一世执手 提交于 2019-12-02 19:09:37
问题 The task is for architecture ColdFire processor MCF5271: I don't understand how many and what size cycles will be needed to perform a longword transfer to the CPU, or word transfers. I'm reading the chart and I don't see what the connection is? Any comments are very appreciated. I've attached 2 examples with the answers. DATA BUS SIZE 回答1: The MCF5271 manual discusses the external interface of the processor in Chapter 17. The processor implements a byte-addressable address space with a 32-bit

What exactly is a dual-issue processor?

三世轮回 提交于 2019-12-02 17:30:40
I came across several references to the concept of a dual issue processor (I hope this even makes sense in a sentence). I can't find any explanation of what exactly dual issue is. Google gives me links to micro-controller specification, but the concept isn't explained anywhere. Here's an example of such reference . Am I looking in the wrong place? A brief paragraph on what it is would be very helpful. Dual issue means that each clock cycle the processor can move two instructions from one stage of the pipeline to another. Where this happens depends on the processor and the company's terminology

Design code to fit in CPU Cache?

非 Y 不嫁゛ 提交于 2019-12-02 17:17:44
When writing simulations my buddy says he likes to try to write the program small enough to fit into cache. Does this have any real meaning? I understand that cache is faster than RAM and the main memory. Is it possible to specify that you want the program to run from cache or at least load the variables into cache? We are writing simulations so any performance/optimization gain is a huge benefit. If you know of any good links explaining CPU caching, then point me in that direction. At least with a typical desktop CPU, you can't really specify much about cache usage directly. You can still try