cpu-architecture | 易学教程

What exactly is a dual-issue processor?

阅读更多关于 What exactly is a dual-issue processor?

问题 I came across several references to the concept of a dual issue processor (I hope this even makes sense in a sentence). I can't find any explanation of what exactly dual issue is. Google gives me links to micro-controller specification, but the concept isn't explained anywhere. Here's an example of such reference. Am I looking in the wrong place? A brief paragraph on what it is would be very helpful. 回答1: Dual issue means that each clock cycle the processor can move two instructions from one

What does “extend immediate to 32 bits” mean in MIPS?

阅读更多关于 What does “extend immediate to 32 bits” mean in MIPS?

问题 I'm reading about the Instruction Decode (ID) phase in the MIPS datapath, and I've got the following quote: "Once operands are known, read the actual data (from registers) or extend the data to 32 bits (immediates)." Can someone explain what the "extend the data to 32 bits (immediates)" part means? I know that registers all contain 32 bits, and I know what an immediate is. I just don't understand why you need to extend the immediate from 26 to 32 bits. Thanks! 回答1: On a 32-bit CPU, most of

Virtually addressed Cache

阅读更多关于 Virtually addressed Cache

问题 Relation between cache size and page size How does the associativity and page size constrain the Cache size in virtually addressed cache architecture? Particularly I am looking for an example on the following statement: If C≤(page_size x associativity), the cache index bits come only from page offset (same in Virtual address and Physical address). 回答1: Intel CPUs have used 8-way associative 32kiB L1D with 64B lines for many years, for exactly this reason. Pages are 4k, so the page offset is

which is optimal a bigger block cache size or a smaller one?

阅读更多关于 which is optimal a bigger block cache size or a smaller one?

问题 Given a cache size with constant capacity and associativity, for a given code to determine average of array elements, would a cache with higher block size be preferred? [from comments] Examine the code given below to compute the average of an array: total = 0; for(j=0; j < k; j++) { sub_total = 0; /* Nested loops to avoid overflow */ for(i=0; i < N; i++) { sub_total += A[jN + i]; } total += sub_total/N; } average = total/k; 回答1: Related: in the more general case of typical access patterns

What happens when you use a memory override prefix but all the operands are registers?

阅读更多关于 What happens when you use a memory override prefix but all the operands are registers?

问题 What happens when you use a memory override prefix but all the operands are registers? So, let's say you code mov eax, ebx or add eax, ebx and the default is 32-bit but you use a 67h override. How does the processor handle that situation? 回答1: The Intel Software Developer's Manual*, volume 2, section 2.1, details the behavior of each instruction prefix. It says use of the address-size prefix (67h) with an instruction that doesn't have a memory operand is reserved and may cause unpredictable

Why do 32-bit applications work on 64-bit x86 CPUs?

阅读更多关于 Why do 32-bit applications work on 64-bit x86 CPUs?

问题 32-bit application executables contain machine code for a 32-bit CPU, but the assembly and internal architecture (number of registers, register width, calling convention) of 32-bit and 64-bit Intel CPU's differ, so how can a 32-bit exe run on a 64-bit machine? Wikipedia's x86-64 article says: x86-64 is fully backwards compatible with 16-bit and 32-bit x86 code. Because the full x86 16-bit and 32-bit instruction sets remain implemented in hardware without any intervening emulation , existing

What happens when different CPU cores write to the same RAM address without synchronization?

阅读更多关于 What happens when different CPU cores write to the same RAM address without synchronization?

问题 Let's assume that 2 cores are trying to write different values to the same RAM address (1 byte), at the same moment of time (plus-minus eta), and without using any interlocked instructions or memory barriers. What happens in this case and what value will be written to the main RAM? The first one wins? The last one wins? Undetermined behavior? 回答1: x86 (like every other mainstream SMP CPU architecture) has coherent data caches . It's impossible for two difference caches (e.g. L1D of 2

Why do x86 jump/call instructions use relative displacements instead of absolute destinations?

阅读更多关于 Why do x86 jump/call instructions use relative displacements instead of absolute destinations?

问题 I am learning 8086 and there is one particular question which is bothering me and I have not been able to find any satisfactory answer yet. I understand that CPU executes the code sequentially and if want to change the code flow we would like the IP to point to the new/old address where the code of our interest is sitting. Now, my question is why we(I mean CPU) don't just go and update the IP with the address corresponding to the label when we encounter jump instruction? What is the need to

Is CPU access asymmetric to Network card

阅读更多关于 Is CPU access asymmetric to Network card

问题 When we have 2 CPU on a machine, do they have symmetric access to network cards (PCI)? Essentially, for a packet processing code, processing 14M packet per second from a network card, does that matter on which CPU it runs? 回答1: Not sure if you still need an answer, but I will post an answer anyway in case someone else might need it. And I assume you are asking about hardware topology rather than OS irq affinity problems. Comment from Jerry is not 100% correct. While NUMA is SMP, but access to

boost lockfree spsc_queue cache memory access

阅读更多关于 boost lockfree spsc_queue cache memory access

问题 I need to be extremely concerned with speed/latency in my current multi-threaded project. Cache access is something I'm trying to understand better. And I'm not clear on how lock-free queues (such as the boost::lockfree::spsc_queue) access/use memory on a cache level. I've seen queues used where the pointer of a large object that needs to be operated on by the consumer core is pushed into the queue. If the consumer core pops an element from the queue, I presume that means the element (a