cpu-architecture

Why do we need to compile for different platforms (e.g. Windows/Linux)?

非 Y 不嫁゛ 提交于 2019-12-03 17:36:02
I've learned the basics about CPUs/ASM/C and don't understand why we need to compile C code differently for different OS targets. What the compiler does is create Assembler code that then gets assembled to binary machine code. The ASM code of course is different per CPU architecture (e.g. ARM) as the instruction set architecture is different. But as Linux and Windows run on the same CPU, the machine operations like MOVE/ADD/... should be identical. While I do know that there are OS-specific functions like printing to a terminal, this functionality could be provided by different implementations

CPU and GPU differences

一个人想着一个人 提交于 2019-12-03 17:00:33
问题 What is the difference between a single processing unit of CPU and single processing unit of GPU? Most places I've come along on the internet cover the high level differences between the two. I want to know what instructions can each perform and how fast are they and how are these processing units integrated in the compete architecture? It seems like a question with a long answer. So lots of links are fine. edit: In the CPU, the FPU runs real number operations. How fast are the same

Differences between arm “versions?” (ARMv7 only)

我的未来我决定 提交于 2019-12-03 14:44:50
问题 Basically I would like to know the difference between ARMv7l and ARMv7 h l? I got a arm processor with armv7l and there are a lot of rpm's for armv7 h l. I don't exactly know what I have to search for to get information about that. What is this "suffix" called? Are there any other types? What are they doing differently? 回答1: I would assume that it's indicating packages compiled for l ittle-endian and h ard-float ABI as appropriate - i.e. it's a software thing and only tangentially related to

How to deal with linker error : error-cannot find -lgcc

南笙酒味 提交于 2019-12-03 12:21:54
this is my makefile: task0 : main.o numbers.o add.o gcc -m32 -g -Wall -o task0 main.o numbers.o add.o main.o : main.c gcc -g -Wall -m32 -ansi -c -o main.c numbers.o : numbers.c gcc -g -Wall -m32 -ansi -c -o numbers.c add.o: add.s nasm -g -f elf -w+all -o add.o add.s clean : rm -f *.o task0 and this is the terminal output: gcc -m32 -g -Wall -o task0 main.o numbers.o add.o /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc.a when searching for -lgcc /usr/bin/ld: cannot find -lgcc /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc_s.so when

What does “subsequent read” mean in the context of volatile variables?

时光总嘲笑我的痴心妄想 提交于 2019-12-03 11:49:00
问题 Java memory visibility documentation says that: A write to a volatile field happens-before every subsequent read of that same field. I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1? 回答1: It sounds to me like it's saying that it provides lockless acquire

How does CLFLUSH work for an address that is not in cache yet?

南笙酒味 提交于 2019-12-03 11:48:53
问题 We are trying to use the Intel CLFLUSH instruction to flush the cache content of a process in Linux at the userspace. We create a very simple C program that first access a large array and then call the CLFLUSH to flush the virtual address space of the whole array. We measure the latency it takes for CLFLUSH to flush the whole array. The size of the array in the program is an input and we vary the input from 1MB to 40MB with a step of 2MB. In our understanding, the CLFLUSH should flush the

Why INC and ADD 1 have different performances? [duplicate]

非 Y 不嫁゛ 提交于 2019-12-03 11:29:11
This question already has answers here : INC instruction vs ADD 1: Does it matter? (2 answers) I've read many times over the years that you should do XOR ax, ax because it is faster... or when programming in C use counter++ or counter+=1 because they would INC or ADD... Or that in the Netburst Pentium 4 the INC was slower than ADD 1 so the compiler had to be warned that your target was a Netburst so it would translate all var++ to ADD 1... My question is: Why INC and ADD have different performances? Why for example INC was claimed to be slower on Netburst while faster than ADD in other

How would you generically detect cache line associativity from user mode code?

一世执手 提交于 2019-12-03 11:21:47
问题 I'm putting together a small patch for the cachegrind/callgrind tool in valgrind which will auto-detect, using completely generic code, CPU instruction and cache configuration (right now only x86/x64 auto-configures, and other architectures don't provide CPUID type configuration to non-privileged code). This code will need to execute entirely in a non-privileged context i.e. pure user mode code. It also needs to be portable across very different POSIX implementations, so grokking /proc

difference between speculation and prediction

你离开我真会死。 提交于 2019-12-03 09:51:25
问题 In computer architecture, what is difference between (branch) prediction and speculation?? These seems very similar, but i think there is a subtle distinction between them. 回答1: Branch prediction is done by the processor to try to determine where the execution will continue after a conditional jump, so that it can read the next instruction(s) from memory. Speculative execution goes one step further and determines what the result would be from executing the next instruction(s). If the branch

Do sse instructions consume more power/energy?

浪子不回头ぞ 提交于 2019-12-03 07:28:27
问题 Very simple question, probably difficult answer: Does using SSE instructions for example for parallel sum/min/max/average operations consume more power than doing any other instructions (e.g. a single sum)? For example, on Wikipedia I couldn't find any information in this respect. The only hint of an answer I could find is here, but it's a little bit generic and there is no reference to any published material in this respect. 回答1: I actually did a study on this a few years ago. The answer