cpu-architecture

CPU and GPU differences

北战南征 提交于 2019-12-03 06:18:54
What is the difference between a single processing unit of CPU and single processing unit of GPU? Most places I've come along on the internet cover the high level differences between the two. I want to know what instructions can each perform and how fast are they and how are these processing units integrated in the compete architecture? It seems like a question with a long answer. So lots of links are fine. edit: In the CPU, the FPU runs real number operations. How fast are the same operations being done in each GPU core? If fast then why is it fast? I know my question is very generic but my

TLB misses vs cache misses?

风格不统一 提交于 2019-12-03 05:41:33
问题 Could someone please explain the difference between a TLB (Translation lookaside buffer) miss and a cache miss? I believe I found out TLB refers to some sort of virtual memory address but I wasn't overly clear what this actually meant? I understand cache misses result when a block of memory (the size of a cache line) is loaded into the (L3?) cache and if a required address is not held within the current cache lines- this is a cache miss. 回答1: Well, all of today's modern operating systems use

Design code to fit in CPU Cache?

百般思念 提交于 2019-12-03 04:44:27
问题 When writing simulations my buddy says he likes to try to write the program small enough to fit into cache. Does this have any real meaning? I understand that cache is faster than RAM and the main memory. Is it possible to specify that you want the program to run from cache or at least load the variables into cache? We are writing simulations so any performance/optimization gain is a huge benefit. If you know of any good links explaining CPU caching, then point me in that direction. 回答1: At

Differences between arm “versions?” (ARMv7 only)

↘锁芯ラ 提交于 2019-12-03 04:30:41
Basically I would like to know the difference between ARMv7l and ARMv7 h l? I got a arm processor with armv7l and there are a lot of rpm's for armv7 h l. I don't exactly know what I have to search for to get information about that. What is this "suffix" called? Are there any other types? What are they doing differently? I would assume that it's indicating packages compiled for l ittle-endian and h ard-float ABI as appropriate - i.e. it's a software thing and only tangentially related to the hardware. In other words, you don't actually have an "armv7l" processor - you have an ARMv7 processor

descriptor concept in NIC

大兔子大兔子 提交于 2019-12-03 03:13:34
问题 I am trying to understand the concept of Rx and Tx descriptors used in Network driver code. Are Descriptors in software(RAM) or hardware (NIC card). How do they get filled. EDIT: So in a Realtek card driver code. I have a following struct defined. struct Desc { uint32_t opts1; uint32_t opts2; uint64_t addr; }; txd->addr = cpu_to_le64(mapping); txd->opts2 = cpu_to_le32(opts2); txd->opts1 = cpu_to_le32(opts1 & ~DescOwn); So are the opts1 and opts2 and there bits like DescOwn card specific? Will

What does “subsequent read” mean in the context of volatile variables?

别等时光非礼了梦想. 提交于 2019-12-03 02:17:39
Java memory visibility documentation says that: A write to a volatile field happens-before every subsequent read of that same field. I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1? It sounds to me like it's saying that it provides lockless acquire/release memory-ordering semantics between threads . See Jeff Preshing's article explaining the concept

How does CLFLUSH work for an address that is not in cache yet?

烂漫一生 提交于 2019-12-03 02:12:26
We are trying to use the Intel CLFLUSH instruction to flush the cache content of a process in Linux at the userspace. We create a very simple C program that first access a large array and then call the CLFLUSH to flush the virtual address space of the whole array. We measure the latency it takes for CLFLUSH to flush the whole array. The size of the array in the program is an input and we vary the input from 1MB to 40MB with a step of 2MB. In our understanding, the CLFLUSH should flush the content in the cache . So we expect to see the latency of flushing the whole array first increase linearly

What is meant by data cache and instruction cache?

走远了吗. 提交于 2019-12-03 01:43:44
问题 From here: Instructions and data have different access patterns, and access different regions of memory. Thus, having the same cache for both instructions and data may not always work out. Thus, it's rather common to have two caches: an instruction cache that only stores instructions, and a data cache that only stores data. It's intuitive to know the distinction between instructions and data, but now I'm not show sure of the difference in this context? What constitutes as data and gets put

Where is the L1 memory cache of Intel x86 processors documented?

[亡魂溺海] 提交于 2019-12-03 01:34:29
问题 I am trying to profile and optimize algorithms and I would like to understand the specific impact of the caches on various processors. For recent Intel x86 processors (e.g. Q9300), it is very hard to find detailed information about cache structure. In particular, most web sites (including Intel.com) that post processor specs do not include any reference to L1 cache. Is this because the L1 cache does not exist or is this information for some reason considered unimportant? Are there any

Write-back vs Write-Through caching?

那年仲夏 提交于 2019-12-03 00:29:15
问题 My understanding is that the main difference between the two methods is that in "write-through" method data is written to the main memory through the cache immediately, while in "write-back" data is written in a "latter time". We still need to wait for the memory in "latter time" so What is the benefit of "write-through"? 回答1: The benefit of write-through to main memory is that it simplifies the design of the computer system. With write-through, the main memory always has an up-to-date copy