computer-architecture

What are stalled-cycles-frontend and stalled-cycles-backend in 'perf stat' result?

人走茶凉 提交于 2019-11-28 15:57:05
Does anybody know what is the meaning of stalled-cycles-frontend and stalled-cycles-backend in perf stat result ? I searched on the internet but did not find the answer. Thanks $ sudo perf stat ls Performance counter stats for 'ls': 0.602144 task-clock # 0.762 CPUs utilized 0 context-switches # 0.000 K/sec 0 CPU-migrations # 0.000 K/sec 236 page-faults # 0.392 M/sec 768956 cycles # 1.277 GHz 962999 stalled-cycles-frontend # 125.23% frontend cycles idle 634360 stalled-cycles-backend # 82.50% backend cycles idle 890060 instructions # 1.16 insns per cycle # 1.08 stalled cycles per insn 179378

What does it mean by word size in computer?

我们两清 提交于 2019-11-28 15:49:18
问题 I have tried to get a grasp of what "word" means and I have looked in the wiki and the definition is vague. So my question is what is "word size" ? Is it the length of the data bus, address bus? 回答1: "Word size" refers to the number of bits processed by a computer's CPU in one go (these days, typically 32 bits or 64 bits). Data bus size, instruction size, address size are usually multiples of the word size. Just to confuse matters, for backwards compatibility, Microsoft Windows API defines a

Aligning to cache line and knowing the cache line size

陌路散爱 提交于 2019-11-28 15:13:30
To prevent false sharing, I want to align each element of an array to a cache line. So first I need to know the size of a cache line, so I assign each element that amount of bytes. Secondly I want the start of the array to be aligned to a cache line. I am using Linux and 8-core x86 platform. First how do I find the cache line size. Secondly, how do I align to a cache line in C. I am using the gcc compiler. So the structure would be following for example, assuming a cache line size of 64. element[0] occupies bytes 0-63 element[1] occupies bytes 64-127 element[2] occupies bytes 128-191 and so on

Where is -32768 coming from?

点点圈 提交于 2019-11-28 11:25:20
问题 This is LC3 Assembly code I am working with .ORIG x3000 LOOP LDI R0, KBSR BRzp LOOP From LC3 Assembly, I know that LDI is a load indirect addressing mode, meaning it read in an address stored at an location and then read the value at that location From Lc3 Keyboard, I know that KBSR is the keyboard status register, which is one when keyboard has received a new character. Here is my test run in Lc3 simulator? I entered the character 'a' After executing LDI R0, KBSR, register 0 stores a value

Why misaligned address access incur 2 or more accesses?

元气小坏坏 提交于 2019-11-28 10:29:41
The normal answers to why data alignment is to access more efficiently and to simplify the design of CPU. A relevant question and its answers is here . And another source is here . But they both do not resolve my question. Suppose a CPU has a access granularity of 4 bytes. That means the CPU reads 4 bytes at a time. The material I listed above both says that if I access a misaligned data, say address 0x1, then the CPU has to do 2 accesses (one from addresses 0x0, 0x1, 0x2 and 0x3, one from addresses 0x4, 0x5, 0x6 and 0x7) and combine the results. I can't see why. Why just can't CPU read data

How do x86 page tables work?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-28 04:23:56
I'm familiar with the MIPS architecture, which is has a software-managed TLB. So how and where you (the operating system) wants to store the page tables and the page table entries is completely up to you. For example I did a project with a single inverted page table; I saw others using 2-level page tables per process. But what's the story with x86? From what I know the TLB is hardware-managed. Does x86 tell basically tell you, "Hey this is where the page table entries you're currently using need to go [physical address range]"? But wait, I've always thought x86 uses multi-level page tables, so

CPU Switches from User mode to Kernel Mode : What exactly does it do? How does it makes this transition?

两盒软妹~` 提交于 2019-11-27 19:23:04
CPU Switches from User mode to Kernel Mode : What exactly does it do? How does it makes this transition? EDIT: Even if it is architecture dependent please provide me with an answer. The architecture is up to you. Tell me for the architecture you know about. I want to get an idea about what all things will be involved in it. Note: this is mostly relevant to x86 architecture. Here's a somewhat simplified explanation. The transition is usually caused by one of the following: Fault (e.g. a page fault or some other exception caused by executing an instruction) Interrupt (e.g. a keyboard interrupt

What branch misprediction does the Branch Target Buffer detect?

我是研究僧i 提交于 2019-11-27 15:00:21
问题 I am currently looking at the various parts of the CPU pipeline which can detect branch mispredictions. I have found these are: Branch Target Buffer (BPU CLEAR) Branch Address Calculator (BA CLEAR) Jump Execution Unit (not sure of the signal name here??) I know what 2 and 3 detect, but I do not understand what misprediction is detected within the BTB. The BAC detects where the BTB has erroneously predicted a branch for a non-branch instruction, where the BTB has failed to detect a branch, or

Aligning to cache line and knowing the cache line size

萝らか妹 提交于 2019-11-27 09:04:34
问题 To prevent false sharing, I want to align each element of an array to a cache line. So first I need to know the size of a cache line, so I assign each element that amount of bytes. Secondly I want the start of the array to be aligned to a cache line. I am using Linux and 8-core x86 platform. First how do I find the cache line size. Secondly, how do I align to a cache line in C. I am using the gcc compiler. So the structure would be following for example, assuming a cache line size of 64.

What happens when a computer program runs?

旧城冷巷雨未停 提交于 2019-11-27 02:28:11
I know the general theory but I can't fit in the details. I know that a program resides in the secondary memory of a computer. Once the program begins execution it is entirely copied to the RAM. Then the processor retrive a few instructions (it depends on the size of the bus) at a time, puts them in registers and executes them. I also know that a computer program uses two kinds of memory: stack and heap, which are also part of the primary memory of the computer. The stack is used for non-dynamic memory, and the heap for dynamic memory (for example, everything related to the new operator in C++