cpu-architecture | 易学教程

Difference between core and processor

阅读更多关于 Difference between core and processor

What is the difference between a core and a processor? I've already looked for it on Google, but I'm just having multi-core and multi-processor definition, but it doesn't match what I am looking for. Leeor A core is usually the basic computation unit of the CPU - it can run a single program context (or multiple ones if it supports hardware threads such as hyperthreading on Intel CPUs), maintaining the correct program state, registers, and correct execution order, and performing the operations through ALUs . For optimization purposes, a core can also hold on-core caches with copies of

Globally Invisible load instructions

阅读更多关于 Globally Invisible load instructions

Can some of the load instructions be never globally visible due to store load forwarding ? To put it another way, if a load instruction gets its value from the store buffer, it never has to read from the cache. As it is generally stated that a load is globally visible when it reads from the L1D cache, the ones that do not read from the L1D should make it globally invisible. The concept of global visibility for loads is tricky, because a load doesn't modify the global state of memory, and other threads can't directly observe it. But once the dust settles after out-of-order / speculative

Does lock xchg have the same behavior as mfence?

阅读更多关于 Does lock xchg have the same behavior as mfence?

What I'm wondering is if lock xchg will have similar behavior to mfence from the perspective of one thread accessing a memory location that is being mutated (lets just say at random) by other threads. Does it guarantee I get the most up to date value? Of memory read/write instructions that follow after? The reason for my confusion is: 8.2.2 “Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.” -Intel 64 Developers Manual Vol. 3 Does this apply across threads? mfence states: Performs a serializing operation on all load-from-memory and

On 32-bit CPUs, is an 'integer' type more efficient than a 'short' type?

阅读更多关于 On 32-bit CPUs, is an 'integer' type more efficient than a 'short' type?

On a 32-bit CPU, an integer is 4 bytes and a short integer is 2 bytes. If I am writing a C/C++ application that uses many numeric values that will always fit within the provided range of a short integer, is it more efficient to use 4 byte integers or 2 byte integers? I have heard it suggested that 4 byte integers are more efficient as this fits the bandwidth of the bus from memory to the CPU. However, if I am adding together two short integers, would the CPU package both values in a single pass in parallel (thus spanning the 4 byte bandwidth of the bus)? Yes, you should definitely use a 32 bit

Setup targeting both x86 and x64?

阅读更多关于 Setup targeting both x86 and x64?

问题 I have a program that requires both x64 and x86 dlls (it figures out which ones it needs at run time), but when trying to create a setup, it complains: File AlphaVSS.WinXP.x64.dll' targeting 'AMD64' is not compatible with th project's target platform 'x86' File AlphaVSS.Win2003.x64.dll' targeting 'AMD64' is not compatible with th project's target platform 'x86' File AlphaVSS.Win2008.x64.dll' targeting 'AMD64' is not compatible with th project's target platform 'x86' How can I make my setup

CPU and Data alignment

阅读更多关于 CPU and Data alignment

Pardon me if you feel this has been answered numerous times, but I need answers to the following queries! Why data has to be aligned (on 2-byte / 4-byte / 8-byte boundaries)? Here my doubt is when the CPU has address lines Ax Ax-1 Ax-2 ... A2 A1 A0 then it is quite possible to address the memory locations sequentially. So why there is the need to align the data at specific boundaries? How to find the alignment requirements when I am compiling my code and generating the executable? If for e.g the data alignment is 4-byte boundary, does that mean each consecutive byte is located at modulo 4

How has CPU architecture evolution affected virtual function call performance?

阅读更多关于 How has CPU architecture evolution affected virtual function call performance?

问题 Years ago I was learning about x86 assembler, CPU pipelining, cache misses, branch prediction, and all that jazz. It was a tale of two halves. I read about all the wonderful advantages of the lengthy pipelines in the processor viz instruction reordering, cache preloading, dependency interleaving, etc. The downside was that any deviation for the norm was enormously costly. For example, IIRC a certain AMD processor in the early-gigahertz era had a 40 cycle penalty every time you called a

what is a store buffer?

阅读更多关于 what is a store buffer?

can anyone explain what is load buffer and how it's different from invalidation queues. and also difference between store buffers and write combining buffers? The paper by Paul E Mckenny http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf explains very nicely about the store buffers and invalidation queues but unfortunately doesn't talk about write combining buffers Nathan Binkert An invalidate queue is more like a store buffer, but it's part of the memory system, not the CPU. Basically it is a queue that keeps track of invalidations and ensures that they complete

What are stalled-cycles-frontend and stalled-cycles-backend in 'perf stat' result?

阅读更多关于 What are stalled-cycles-frontend and stalled-cycles-backend in 'perf stat' result?

问题 Does anybody know what is the meaning of stalled-cycles-frontend and stalled-cycles-backend in perf stat result ? I searched on the internet but did not find the answer. Thanks $ sudo perf stat ls Performance counter stats for 'ls': 0.602144 task-clock # 0.762 CPUs utilized 0 context-switches # 0.000 K/sec 0 CPU-migrations # 0.000 K/sec 236 page-faults # 0.392 M/sec 768956 cycles # 1.277 GHz 962999 stalled-cycles-frontend # 125.23% frontend cycles idle 634360 stalled-cycles-backend # 82.50%

How does an assembly instruction turn into voltage changes on the CPU?

阅读更多关于 How does an assembly instruction turn into voltage changes on the CPU?

问题 I've been working in C and CPython for the past 3 - 5 years. Consider that my base of knowledge here. If I were to use an assembly instruction such as MOV AL, 61h to a processor that supported it, what exactly is inside the processor that interprets this code and dispatches it as voltage signals? How would such a simple instruction likely be carried out? Assembly even feels like a high level language when I try to think of the multitude of steps contained in MOV AL, 61h or even XOR EAX, EBX .