cpu-architecture

What is locality of reference?

爱⌒轻易说出口 提交于 2019-11-28 16:24:27
I am having problem in understanding locality of reference. Can anyone please help me out in understanding what it means and what is, Spatial Locality of reference Temporal Locality of reference This would not matter if your computer was filled with super-fast memory. But unfortunately that's not the case and computer-memory looks something like this 1 : +----------+ | CPU | <<-- Our beloved CPU, superfast and always hungry for more data. +----------+ |L1 - Cache| <<-- works at 100% of CPU speed (fast) +----------+ |L2 - Cache| <<-- works at 25% of CPU speed (medium) +----+-----+ | | <<-- This

Cycles/cost for L1 Cache hit vs. Register on x86?

廉价感情. 提交于 2019-11-28 16:24:20
I remember assuming that an L1 cache hit is 1 cycle (i.e. identical to register access time) in my architecture class, but is that actually true on modern x86 processors? How many cycles does an L1 cache hit take? How does it compare to register access? paulsm4 Here's a great article on the subject: http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1 To answer your question - yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite costly ;) PS: The specifics will vary, but this link has some good ballpark figures: Approximate cost to

Which CPU architectures support Compare And Swap (CAS)?

▼魔方 西西 提交于 2019-11-28 16:23:00
just curious to know which CPU architectures support compare and swap atomic primitives? jdkoftinoff Powerpc has more powerful primitives available: "lwarx" and "stwcx" lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the "stwcx", a conditional store instruction, to fail. So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and more powerful atomic operations like "atomic increment circular buffer index" A different and easier way to answer this question may be to list

How is CPU usage calculated?

烈酒焚心 提交于 2019-11-28 16:03:10
On my desktop, I have a little widget that tells me my current CPU usage. It also shows the usage for each of my two cores. I always wondered, how does the CPU calculate how much of its processing power is being used? Also, if the CPU is hung up doing some intense calculations, how can it (or whatever handles this activity) examine the usage, without getting hung up as well? In silico The CPU doesn't do the usage calculations by itself. It may have hardware features to make that task easier, but it's mostly the job of the operating system. So obviously the details of implementations will vary

What are stalled-cycles-frontend and stalled-cycles-backend in 'perf stat' result?

人走茶凉 提交于 2019-11-28 15:57:05
Does anybody know what is the meaning of stalled-cycles-frontend and stalled-cycles-backend in perf stat result ? I searched on the internet but did not find the answer. Thanks $ sudo perf stat ls Performance counter stats for 'ls': 0.602144 task-clock # 0.762 CPUs utilized 0 context-switches # 0.000 K/sec 0 CPU-migrations # 0.000 K/sec 236 page-faults # 0.392 M/sec 768956 cycles # 1.277 GHz 962999 stalled-cycles-frontend # 125.23% frontend cycles idle 634360 stalled-cycles-backend # 82.50% backend cycles idle 890060 instructions # 1.16 insns per cycle # 1.08 stalled cycles per insn 179378

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

喜你入骨 提交于 2019-11-28 15:54:57
问题 From the 11th Chapter( Performance and Scalability ) and the section named Context Switching of the JCIP book: When a new thread is switched in, the data it needs is unlikely to be in the local processor cache, so a context-switch causes a flurry of cache misses, and thus threads run a little more slowly when they are first scheduled. Can someone explain in an easy to understand way the concept of cache miss and its probable opposite ( cache hit )? Why context-switching would cause a lot of

What's the difference between a word and byte?

随声附和 提交于 2019-11-28 15:07:31
I've done some research. A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits? I asked a prof this question and he said most machines these days are byte-addressable, but what would that make a word? Byte : Today, a byte is almost always 8 bit. However, that wasn't always the case and there's no "standard" or something that dictates this. Since 8 bits is a convenient number to work with it became the de facto standard. Word : The natural size with

Multiple accesses to main memory and out-of-order execution

眉间皱痕 提交于 2019-11-28 12:50:05
问题 Let us assume that I have two pointers that are pointing to unrelated addresses that are not cached, so they will both have to come all the way from main memory when being dereferenced. int load_and_add(int *pA, int *pB) { int a = *pA; // will most likely miss in cache int b = *pB; // will most likely miss in cache // ... some code that does not use a or b int c = a + b; return c; } If out-of-order execution allows executing the code before the value of c is computed, how will the fetching of

Does Program Counter hold current address or the address of the next instruction?

我怕爱的太早我们不能终老 提交于 2019-11-28 12:29:15
Being a beginner and self-learner, I am learning assembly and currently reading the chapter 3 of the book, The C Companion by Allen Hollub. I can't understand the description of Program Counter or PC he describes in an imaginary demo machine with two byte word. Here is the description of PC in page 57. "The PC always holds the address of the instruction currently being executed. It is automatically updated as each instruction executed to hold the address of the next instruction to be executed. ... ... The important concept here is that the PC holds the address of the next instruction, not the

With variable length instructions how does the computer know the length of the instruction being fetched? [duplicate]

北慕城南 提交于 2019-11-28 11:20:22
This question already has an answer here: Instruction decoding when instructions are length-variable 4 answers In architectures where not all the instructions are the same length, how does the computer know how much to read for one instruction? For example in Intel IA-32 some instructions are 4 bytes, some are 8 bytes, so it how does it know whether to read 4 or 8 bytes? Is it that the first instruction red when the machine is powered on has a known size and each instruction contains the size of the next one? First, the processor does not need to know how many bytes to fetch, it can fetch a