cpu-architecture | 易学教程

Why do we need to compile for different platforms (e.g. Windows/Linux)?

阅读更多关于 Why do we need to compile for different platforms (e.g. Windows/Linux)?

I've learned the basics about CPUs/ASM/C and don't understand why we need to compile C code differently for different OS targets. What the compiler does is create Assembler code that then gets assembled to binary machine code. The ASM code of course is different per CPU architecture (e.g. ARM) as the instruction set architecture is different. But as Linux and Windows run on the same CPU, the machine operations like MOVE/ADD/... should be identical. While I do know that there are OS-specific functions like printing to a terminal, this functionality could be provided by different implementations

CPU and GPU differences

阅读更多关于 CPU and GPU differences

问题 What is the difference between a single processing unit of CPU and single processing unit of GPU? Most places I've come along on the internet cover the high level differences between the two. I want to know what instructions can each perform and how fast are they and how are these processing units integrated in the compete architecture? It seems like a question with a long answer. So lots of links are fine. edit: In the CPU, the FPU runs real number operations. How fast are the same

Differences between arm “versions?” (ARMv7 only)

阅读更多关于 Differences between arm “versions?” (ARMv7 only)

问题 Basically I would like to know the difference between ARMv7l and ARMv7 h l? I got a arm processor with armv7l and there are a lot of rpm's for armv7 h l. I don't exactly know what I have to search for to get information about that. What is this "suffix" called? Are there any other types? What are they doing differently? 回答1: I would assume that it's indicating packages compiled for l ittle-endian and h ard-float ABI as appropriate - i.e. it's a software thing and only tangentially related to

How to deal with linker error : error-cannot find -lgcc

阅读更多关于 How to deal with linker error : error-cannot find -lgcc

this is my makefile: task0 : main.o numbers.o add.o gcc -m32 -g -Wall -o task0 main.o numbers.o add.o main.o : main.c gcc -g -Wall -m32 -ansi -c -o main.c numbers.o : numbers.c gcc -g -Wall -m32 -ansi -c -o numbers.c add.o: add.s nasm -g -f elf -w+all -o add.o add.s clean : rm -f *.o task0 and this is the terminal output: gcc -m32 -g -Wall -o task0 main.o numbers.o add.o /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc.a when searching for -lgcc /usr/bin/ld: cannot find -lgcc /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc_s.so when

What does “subsequent read” mean in the context of volatile variables?

阅读更多关于 What does “subsequent read” mean in the context of volatile variables?

问题 Java memory visibility documentation says that: A write to a volatile field happens-before every subsequent read of that same field. I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1? 回答1: It sounds to me like it's saying that it provides lockless acquire

How does CLFLUSH work for an address that is not in cache yet?

阅读更多关于 How does CLFLUSH work for an address that is not in cache yet?

问题 We are trying to use the Intel CLFLUSH instruction to flush the cache content of a process in Linux at the userspace. We create a very simple C program that first access a large array and then call the CLFLUSH to flush the virtual address space of the whole array. We measure the latency it takes for CLFLUSH to flush the whole array. The size of the array in the program is an input and we vary the input from 1MB to 40MB with a step of 2MB. In our understanding, the CLFLUSH should flush the

Why INC and ADD 1 have different performances? [duplicate]

阅读更多关于 Why INC and ADD 1 have different performances? [duplicate]

This question already has answers here : INC instruction vs ADD 1: Does it matter? (2 answers) I've read many times over the years that you should do XOR ax, ax because it is faster... or when programming in C use counter++ or counter+=1 because they would INC or ADD... Or that in the Netburst Pentium 4 the INC was slower than ADD 1 so the compiler had to be warned that your target was a Netburst so it would translate all var++ to ADD 1... My question is: Why INC and ADD have different performances? Why for example INC was claimed to be slower on Netburst while faster than ADD in other

How would you generically detect cache line associativity from user mode code?

阅读更多关于 How would you generically detect cache line associativity from user mode code?

问题 I'm putting together a small patch for the cachegrind/callgrind tool in valgrind which will auto-detect, using completely generic code, CPU instruction and cache configuration (right now only x86/x64 auto-configures, and other architectures don't provide CPUID type configuration to non-privileged code). This code will need to execute entirely in a non-privileged context i.e. pure user mode code. It also needs to be portable across very different POSIX implementations, so grokking /proc

difference between speculation and prediction

阅读更多关于 difference between speculation and prediction

问题 In computer architecture, what is difference between (branch) prediction and speculation?? These seems very similar, but i think there is a subtle distinction between them. 回答1: Branch prediction is done by the processor to try to determine where the execution will continue after a conditional jump, so that it can read the next instruction(s) from memory. Speculative execution goes one step further and determines what the result would be from executing the next instruction(s). If the branch

Do sse instructions consume more power/energy?

阅读更多关于 Do sse instructions consume more power/energy?

问题 Very simple question, probably difficult answer: Does using SSE instructions for example for parallel sum/min/max/average operations consume more power than doing any other instructions (e.g. a single sum)? For example, on Wikipedia I couldn't find any information in this respect. The only hint of an answer I could find is here, but it's a little bit generic and there is no reference to any published material in this respect. 回答1: I actually did a study on this a few years ago. The answer