cpu-architecture

Associativity gives us parallelizability. But what does commutativity give?

…衆ロ難τιáo~ 提交于 2019-11-26 21:16:10
问题 Alexander Stepanov notes in one of his brilliant lectures at A9 (highly recommended, by the way) that the associative property gives us parallelizability – an extremely useful and important trait these days that the compilers, CPUs and programmers themselves can leverage: // expressions in parentheses can be done in parallel // because matrix multiplication is associative Matrix X = (A * B) * (C * D); But what, if anything, does the commutative property give us? Reordering? Out of order

What is the stack engine in the Sandybridge microarchitecture?

…衆ロ難τιáo~ 提交于 2019-11-26 21:02:09
I am reading http://www.realworldtech.com/sandy-bridge/ and I'm facing some problems in understanding some issues: The dedicated stack pointer tracker is also present in Sandy Bridge and renames the stack pointer, eliminating serial dependencies and removing a number of uops. What is a dedicated stack pointer tracker actually? For Sandy Bridge (and the P4), Intel still uses the term ROB. But it is critical to understand that, in this context, it only refers the status array for in-flight uops What does it mean in fact? Please make it clear. Like Agner Fog's microarch doc explains , the stack

Does an x86 CPU reorder instructions?

故事扮演 提交于 2019-11-26 21:00:40
问题 I have read that some CPUs reorder instructions, but this is not a problem for single threaded programs (the instructions would still be reordered in single threaded programs, but it would appear as if the instructions were executed in order), it is only a problem for multithreaded programs. To solve the problem of instructions reordering, we can insert memory barriers in the appropriate places in the code. But does an x86 CPU reorder instructions? If it does not, then there is no need to use

Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC

╄→尐↘猪︶ㄣ 提交于 2019-11-26 20:21:50
On recent CPUs (at least the last decade or so) Intel has offered three fixed-function hardware performance counters, in addition to various configurable performance counters. The three fixed counters are: INST_RETIRED.ANY CPU_CLK_UNHALTED.THREAD CPU_CLK_UNHALTED.REF_TSC The first counts retired instructions, the second number of actual cycles, and the last is what interests us. The description for Volume 3 of the Intel Software Developers manual is: This event counts the number of reference cycles at the TSC rate when the core is not in a halt state and not in a TM stop-clock state. The core

Why are C++ int and long types both 4 bytes?

这一生的挚爱 提交于 2019-11-26 20:13:29
问题 Many sources, including Microsoft, reference both the int and long type as being 4 bytes and having a range of (signed) -2,147,483,648 to 2,147,483,647. What is the point of having a long primitive type if it doesn't actually provide a larger range of values? 回答1: The only things guaranteed about integer types are: sizeof(char) == 1 sizeof(char) <= sizeof(short) sizeof(short) <= sizeof(int) sizeof(int) <= sizeof(long) sizeof(long) <= sizeof(long long) sizeof(char) * CHAR_BIT >= 8 sizeof(short

How can I get the iOS device CPU architecture in runtime

为君一笑 提交于 2019-11-26 19:38:34
问题 Is there a way to identify the iOS device CPU architecture in runtime? Thank you. 回答1: You can use sysctlbyname : #include <sys/types.h> #include <sys/sysctl.h> #include <mach/machine.h> NSString *getCPUType(void) { NSMutableString *cpu = [[NSMutableString alloc] init]; size_t size; cpu_type_t type; cpu_subtype_t subtype; size = sizeof(type); sysctlbyname("hw.cputype", &type, &size, NULL, 0); size = sizeof(subtype); sysctlbyname("hw.cpusubtype", &subtype, &size, NULL, 0); // values for

Why is the page size of Linux (x86) 4 KB, how is that calculated?

我怕爱的太早我们不能终老 提交于 2019-11-26 19:35:43
问题 The default memory page size of the Linux kernel on x86 architecture was 4 KB, I wonder how was that calculated, and why ? 回答1: The default page size is fixed by what the MMU (memory management unit) of the CPU supports. In 32-bit protected mode x86 supports two kinds of pages: normal ones, 4 KiB huge ones, 4 MiB Not all x86 processors support large pages. One needs to have a CPU with Page Size Extension (PSE) capabilities. This excludes pre-Pentium processors. Virtually all current

Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

拈花ヽ惹草 提交于 2019-11-26 19:09:48
I was playing with the code in this answer , slightly modifying it: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 1000000 .loop: ;T is a symbol defined with the CLI (-DT=...) TIMES T imul eax, eax lfence TIMES T imul edx, edx dec ecx jnz .loop mov eax, 60 ;sys_exit xor edi, edi syscall Without the lfence I the results I get are consistent with the static analysis in that answer. When I introduce a single lfence I'd expect the CPU to execute the imul edx, edx sequence of the k-th iteration in parallel with the imul eax, eax sequence of the next ( k+1-th ) iteration. Something like this

Difference between word addressable and byte addressable

核能气质少年 提交于 2019-11-26 18:55:57
问题 Can someone explain what's the different between Word and Byte addressable? How is it related to memory size etc.? 回答1: A byte is a memory unit for storage A memory chip is full of such bytes . Memory units are addressable . That is the only way we can use memory . In reality, memory is only byte addressable . It means: A binary address always points to a single byte only. A word is just a group of bytes – 2 , 4 , 8 depending upon the data bus size of the CPU. To understand the memory

What is the difference between Trap and Interrupt?

风流意气都作罢 提交于 2019-11-26 17:53:09
问题 What is the difference between Trap and Interrupt? If the terminology is different for different systems, then what do they mean on x86? 回答1: A trap is an exception in a user process. It's caused by division by zero or invalid memory access. It's also the usual way to invoke a kernel routine (a system call) because those run with a higher priority than user code. Handling is synchronous (so the user code is suspended and continues afterwards). In a sense they are "active" - most of the time,