cpu-architecture

What exactly is a dual-issue processor?

大兔子大兔子 提交于 2019-12-20 08:53:47
问题 I came across several references to the concept of a dual issue processor (I hope this even makes sense in a sentence). I can't find any explanation of what exactly dual issue is. Google gives me links to micro-controller specification, but the concept isn't explained anywhere. Here's an example of such reference. Am I looking in the wrong place? A brief paragraph on what it is would be very helpful. 回答1: Dual issue means that each clock cycle the processor can move two instructions from one

What does “extend immediate to 32 bits” mean in MIPS?

此生再无相见时 提交于 2019-12-20 06:28:59
问题 I'm reading about the Instruction Decode (ID) phase in the MIPS datapath, and I've got the following quote: "Once operands are known, read the actual data (from registers) or extend the data to 32 bits (immediates)." Can someone explain what the "extend the data to 32 bits (immediates)" part means? I know that registers all contain 32 bits, and I know what an immediate is. I just don't understand why you need to extend the immediate from 26 to 32 bits. Thanks! 回答1: On a 32-bit CPU, most of

Virtually addressed Cache

那年仲夏 提交于 2019-12-20 04:04:08
问题 Relation between cache size and page size How does the associativity and page size constrain the Cache size in virtually addressed cache architecture? Particularly I am looking for an example on the following statement: If C≤(page_size x associativity), the cache index bits come only from page offset (same in Virtual address and Physical address). 回答1: Intel CPUs have used 8-way associative 32kiB L1D with 64B lines for many years, for exactly this reason. Pages are 4k, so the page offset is

which is optimal a bigger block cache size or a smaller one?

﹥>﹥吖頭↗ 提交于 2019-12-20 03:05:08
问题 Given a cache size with constant capacity and associativity, for a given code to determine average of array elements, would a cache with higher block size be preferred? [from comments] Examine the code given below to compute the average of an array: total = 0; for(j=0; j < k; j++) { sub_total = 0; /* Nested loops to avoid overflow */ for(i=0; i < N; i++) { sub_total += A[jN + i]; } total += sub_total/N; } average = total/k; 回答1: Related: in the more general case of typical access patterns

What happens when you use a memory override prefix but all the operands are registers?

大兔子大兔子 提交于 2019-12-20 02:22:11
问题 What happens when you use a memory override prefix but all the operands are registers? So, let's say you code mov eax, ebx or add eax, ebx and the default is 32-bit but you use a 67h override. How does the processor handle that situation? 回答1: The Intel Software Developer's Manual*, volume 2, section 2.1, details the behavior of each instruction prefix. It says use of the address-size prefix (67h) with an instruction that doesn't have a memory operand is reserved and may cause unpredictable

Why do 32-bit applications work on 64-bit x86 CPUs?

我怕爱的太早我们不能终老 提交于 2019-12-19 21:44:39
问题 32-bit application executables contain machine code for a 32-bit CPU, but the assembly and internal architecture (number of registers, register width, calling convention) of 32-bit and 64-bit Intel CPU's differ, so how can a 32-bit exe run on a 64-bit machine? Wikipedia's x86-64 article says: x86-64 is fully backwards compatible with 16-bit and 32-bit x86 code. Because the full x86 16-bit and 32-bit instruction sets remain implemented in hardware without any intervening emulation , existing

What happens when different CPU cores write to the same RAM address without synchronization?

折月煮酒 提交于 2019-12-19 10:35:06
问题 Let's assume that 2 cores are trying to write different values to the same RAM address (1 byte), at the same moment of time (plus-minus eta), and without using any interlocked instructions or memory barriers. What happens in this case and what value will be written to the main RAM? The first one wins? The last one wins? Undetermined behavior? 回答1: x86 (like every other mainstream SMP CPU architecture) has coherent data caches . It's impossible for two difference caches (e.g. L1D of 2

Why do x86 jump/call instructions use relative displacements instead of absolute destinations?

为君一笑 提交于 2019-12-19 10:16:32
问题 I am learning 8086 and there is one particular question which is bothering me and I have not been able to find any satisfactory answer yet. I understand that CPU executes the code sequentially and if want to change the code flow we would like the IP to point to the new/old address where the code of our interest is sitting. Now, my question is why we(I mean CPU) don't just go and update the IP with the address corresponding to the label when we encounter jump instruction? What is the need to

Is CPU access asymmetric to Network card

纵然是瞬间 提交于 2019-12-19 08:57:35
问题 When we have 2 CPU on a machine, do they have symmetric access to network cards (PCI)? Essentially, for a packet processing code, processing 14M packet per second from a network card, does that matter on which CPU it runs? 回答1: Not sure if you still need an answer, but I will post an answer anyway in case someone else might need it. And I assume you are asking about hardware topology rather than OS irq affinity problems. Comment from Jerry is not 100% correct. While NUMA is SMP, but access to

boost lockfree spsc_queue cache memory access

那年仲夏 提交于 2019-12-19 08:44:08
问题 I need to be extremely concerned with speed/latency in my current multi-threaded project. Cache access is something I'm trying to understand better. And I'm not clear on how lock-free queues (such as the boost::lockfree::spsc_queue) access/use memory on a cache level. I've seen queues used where the pointer of a large object that needs to be operated on by the consumer core is pushed into the queue. If the consumer core pops an element from the queue, I presume that means the element (a