Compiling with g++ using multiple cores

后端 未结 8 2356
猫巷女王i
猫巷女王i 2020-11-29 15:47

Quick question: what is the compiler flag to allow g++ to spawn multiple instances of itself in order to compile large projects quicker (for example 4 source files at a time

相关标签:
8条回答
  • 2020-11-29 15:51

    GNU parallel

    I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:

    sudo apt-get install parallel
    ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"
    

    Explanation:

    • {.} takes the input argument and removes its extension
    • -t prints out the commands being run to give us an idea of progress
    • --will-cite removes the request to cite the software if you publish results using it...

    parallel is so convenient that I could even do a timestamp check myself:

    ls | grep -E '\.c$' | parallel -t --will-cite "\
      if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
        gcc -c -o '{.}.o' '{}'
      fi
    "
    

    xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Calling multiple commands through xargs

    Parallel linking was asked at: Can gcc use multiple cores when linking?

    TODO: I think I read somewhere that compilation can be reduced to matrix multiplication, so maybe it is also possible to speed up single file compilation for large files. But I can't find a reference now.

    Tested in Ubuntu 18.10.

    0 讨论(0)
  • 2020-11-29 15:53

    People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.

    We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.

    0 讨论(0)
  • 2020-11-29 15:53

    I'm not sure about g++, but if you're using GNU Make then "make -j N" (where N is the number of threads make can create) will allow make to run multple g++ jobs at the same time (so long as the files do not depend on each other).

    0 讨论(0)
  • 2020-11-29 16:00

    make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.

    0 讨论(0)
  • 2020-11-29 16:05

    If using make, issue with -j. From man make:

      -j [jobs], --jobs[=jobs]
           Specifies the number of jobs (commands) to run simultaneously.  
           If there is more than one -j option, the last one is effective.
           If the -j option is given without an argument, make will not limit the
           number of jobs that can run simultaneously.
    

    And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count():

    https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count

    Like this:

    make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')
    

    If you're asking why 1.5 I'll quote user artless-noise in a comment above:

    The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.

    0 讨论(0)
  • 2020-11-29 16:05

    distcc can also be used to distribute compiles not only on the current machine, but also on other machines in a farm that have distcc installed.

    0 讨论(0)
提交回复
热议问题