OpenMP: don't use hyperthreading cores (half `num_threads()` w/ hyperthreading)

丶灬走出姿态 提交于 2019-12-05 08:58:14

If you were running under Linux [also assuming an x86 arch], you could look at /proc/cpuinfo. There are two fields cpu cores and siblings. The first is number of [real] cores and the latter is the number of hyperthreads. (e.g. on my system they are 4 and 8 respectively for my four core hyperthreaded machine).

Because Linux can detect this [and from the link in Zulan's comment], the information is also available from the x86 cpuid instruction.

Either way, there is also an environment variable for this: OMP_NUM_THREADS which may be easier to use in conjunction with a launcher/wrapper script

One thing you may wish to consider is that beyond a certain number of threads, you can saturate the memory bus, and no increase in threads [or cores] will improve performance, and, may in fact, reduce performance.

From this question: Atomically increment two integers with CAS there is a link to a video talk from CppCon 2015 that is in two parts: https://www.youtube.com/watch?v=lVBvHbJsg5Y and https://www.youtube.com/watch?v=1obZeHnAwz4

They're about 1.5 hours each, but, IMO, well worth it.

In the talk, the speaker [who has done a lot of multithread/multicore optimization] says, that from his experience, the memory bus/system tends to get saturated after about four threads.

Z boson

Hyper-Threading is Intel's implementation of simultaneous multithreading (SMT). Current AMD processors don't implement SMT (the Bulldozer microarchitecture family has something else AMD calls cluster based multithreading but the Zen microarchitecture is suppose to have SMT). OpenMP has no builtin support to detect SMT.

If you want a general function to detect Hyper-Threading you need to support different generations of processors and make sure that the processor is an Intel processor and not AMD. It's best to use a library for this.

But you can create a function using OpenMP that works for many modern Intel processors as I described here.

The following code will count the number of physical cores on an modern Intel processors (it has worked on every Intel processor I have tried it on). You have to bind the threads to get this to work. With GCC you can use export OMP_PROC_BIND=true otherwise you can bind with code (which is what I do).

Note that I am not sure this method is reliable with VirtualBox. With VirtualBox on a 4 core/8 logical processor CPU with windows as Host and Linux as guess setting the number of cores for the VM to 4 this code reports 2 cores and /proc/cpuinfo shows that two of the cores are actually logical processors.

#include <stdio.h>

//cpuid function defined in instrset_detect.cpp by Agner Fog (2014 GNU General Public License)
//http://www.agner.org/optimize/vectorclass.zip

// Define interface to cpuid instruction.
// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
static inline void cpuid (int output[4], int functionnumber) {
#if defined (_MSC_VER) || defined (__INTEL_COMPILER)       // Microsoft or Intel compiler, intrin.h included

  __cpuidex(output, functionnumber, 0);                  // intrinsic function for CPUID

#elif defined(__GNUC__) || defined(__clang__)              // use inline assembly, Gnu/AT&T syntax

  int a, b, c, d;
  __asm("cpuid" : "=a"(a),"=b"(b),"=c"(c),"=d"(d) : "a"(functionnumber),"c"(0) : );
  output[0] = a;
  output[1] = b;
  output[2] = c;
  output[3] = d;

#else                                                      // unknown platform. try inline assembly with masm/intel syntax

  __asm {
    mov eax, functionnumber
      xor ecx, ecx
      cpuid;
    mov esi, output
      mov [esi],    eax
      mov [esi+4],  ebx
      mov [esi+8],  ecx
      mov [esi+12], edx
      }

  #endif
}

int getNumCores(void) {
  //Assuming an Intel processor with CPUID leaf 11
  int cores = 0;
  #pragma omp parallel reduction(+:cores)
  {
    int regs[4];
    cpuid(regs,11);
    if(!(regs[3]&1)) cores++;
  }
  return cores;
}

int main(void) {
  printf("cores %d\n", getNumCores());
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!