intel

How does the communication between CPU happen?

六月ゝ 毕业季﹏ 提交于 2021-02-19 05:40:08
问题 Another question about L2/L3 caches explained that L3 can be used for inter process communication (IPC). Are there other methods/pathways for this communication to happen? The reason why it seems that there are other pathways is because Intel nearly halved the amount of L3 cache per core in their newest processor lineup (1.375 MiB per core in SKL-X) vs. previous generations (2.5 MiB per core in Broadwell EP). Per-core private L2 increased from 256k to 1M, though. 回答1: There are inter

x86_64 Opcode encoding formats in the intel manual

試著忘記壹切 提交于 2021-02-16 14:04:29
问题 What are the "Op/En" formats listed in the Intel x86_64 reference manual? For example in the Add opcode I can take a guess at some such as "I" = Immediate, but is there a comprehensive list for these? 回答1: The intro sections of Intel's vol.2 manual explain how to read each entry: Section 3.1.1.4 Operand Encoding Column in the Instruction Summary Table The “operand encoding” column is abbreviated as Op/En in the Instruction Summary table heading. Instruction operand encoding information is

Parallel program giving error “Undefined reference to _Kmpc_ok_to_fork”

一世执手 提交于 2021-02-10 22:42:19
问题 I am trying to compile the OPENMP fortran code on linux. I have around 230 subroutines. The code I used to compile the code is as follows: 1) At first I compiled each subroutine with the following command ifort -c -override-limits -openmp *.for Then all the subroutines have now a separate object file. 2) Then I tried to compile the object files to the executable by the following command ifort *.o -o myprogram I got the following error : WINDWAVE.F90:(.text+0x1c9d): undefined reference to `_

What does the D flag in the code segment descriptor do for x86-64 instructions?

谁说我不能喝 提交于 2021-02-10 18:14:50
问题 I'm trying to understand the workings of the D flag in the code segment descriptor when used in the x86-64 code. It's set in the D/B bit 22 of the code segment descriptor as shown on this diagram: The Intel documentation (from section 3.4.5 Segment Descriptors) states the following: D/B (default operation size/default stack pointer size and/or upper bound) flag Performs different functions depending on whether the segment descriptor is an executable code segment, an expand-down data segment,

Working example Intel RdRand in C language. How to generate a float type number in the range -100.001 through +100.001

戏子无情 提交于 2021-02-10 14:41:00
问题 There is an Intel DRNG Library that allows you to use a random number generator based on the processor's crystal entropy effect. The library itself and an instruction of its use: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-library-implementation-and-uses There is an example inside a library that just prints the contents of a randomly generated array. Please, share the working example in C, which allows using this library to generate a float type number

Working example Intel RdRand in C language. How to generate a float type number in the range -100.001 through +100.001

混江龙づ霸主 提交于 2021-02-10 14:40:32
问题 There is an Intel DRNG Library that allows you to use a random number generator based on the processor's crystal entropy effect. The library itself and an instruction of its use: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-library-implementation-and-uses There is an example inside a library that just prints the contents of a randomly generated array. Please, share the working example in C, which allows using this library to generate a float type number

I don't understand cache miss count between cachegrind vs. perf tool

别等时光非礼了梦想. 提交于 2021-02-08 19:46:37
问题 I am studying about cache effect using a simple micro-benchmark. I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line. In my machine, cache line size=64Byte, so I think totally cache occur N/8 miss operation and cache grind show that. However, perf tool displays different result. It only occur 34,265 cache miss operations. I am doubted about hardware prefetch, so turn off this function in BIOS. anyway, result is same. I really don't know

Pin tool and itrace

╄→尐↘猪︶ㄣ 提交于 2021-02-08 10:12:53
问题 Hello i run the pin toll itrace.cpp file to get the trace of the code. #include <stdio.h> #include "pin.H" FILE * trace; // This function is called before every instruction is executed // and prints the IP VOID printip(VOID *ip) { fprintf(trace, "%p\n", ip); } // Pin calls this function every time a new instruction is encountered VOID Instruction(INS ins, VOID *v) { // Insert a call to printip before every instruction, and pass it the IP INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip,

What is the meaning of Perf events: dTLB-loads and dTLB-stores?

情到浓时终转凉″ 提交于 2021-02-08 07:46:34
问题 I'm trying to understand the meaning of the perf events: dTLB-loads and dTLB-stores? 回答1: When virtual memory is enabled, the virtual address of every single memory access needs to be looked up in the TLB to obtain the corresponding physical address and determine access permissions and privileges (or raise an exception in case of an invalid mapping). The dTLB-loads and dTLB-stores events represent a TLB lookup for a data memory load or store access, respectively. The is the perf definition of

What is the meaning of Perf events: dTLB-loads and dTLB-stores?

别说谁变了你拦得住时间么 提交于 2021-02-08 07:45:07
问题 I'm trying to understand the meaning of the perf events: dTLB-loads and dTLB-stores? 回答1: When virtual memory is enabled, the virtual address of every single memory access needs to be looked up in the TLB to obtain the corresponding physical address and determine access permissions and privileges (or raise an exception in case of an invalid mapping). The dTLB-loads and dTLB-stores events represent a TLB lookup for a data memory load or store access, respectively. The is the perf definition of