Are C++17 Parallel Algorithms implemented already?

前端 未结 4 2216
刺人心
刺人心 2020-11-27 17:54

I was trying to play around with the new parallel library features proposed in the C++17 standard, but I couldn\'t get it to work. I tried compiling with the up-to-date vers

相关标签:
4条回答
  • 2020-11-27 18:28

    Intel has released a Parallel STL library which follows the C++17 standard:

    • https://github.com/intel/parallelstl

    It is being merged into GCC.

    0 讨论(0)
  • 2020-11-27 18:30

    GCC 9 has them but you have to install TBB separately

    In Ubuntu 19.10, all components have finally aligned:

    • GCC 9 is the default one, and the minimum required version for TBB
    • TBB (Intel Thread Building Blocks) is at 2019~U8-1, so it meets the minimum 2018 requirement

    so you can simply do:

    sudo apt install gcc libtbb-dev
    g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
    ./main.out
    

    and use as:

    #include <execution>
    #include <algorithm>
    
    std::sort(std::execution::par_unseq, input.begin(), input.end());
    

    see also the full runnable benchmark below.

    GCC 9 and TBB 2018 are the first ones to work as mentioned in the release notes: https://gcc.gnu.org/gcc-9/changes.html

    Parallel algorithms and <execution> (requires Thread Building Blocks 2018 or newer).

    Related threads:

    • How to install TBB from source on Linux and make it work
    • trouble linking INTEL tbb library

    Ubuntu 18.04 installation

    Ubuntu 18.04 is a bit more involved:

    • GCC 9 can be obtained from a trustworthy PPA, so it is not so bad
    • TBB is at version 2017, which does not work, and I could not find a trustworthy PPA for it. Compiling from source is easy, but there is no install target which is annoying...

    Here are fully automated tested commands for Ubuntu 18.04:

    # Install GCC 9
    sudo add-apt-repository ppa:ubuntu-toolchain-r/test
    sudo apt-get update
    sudo apt-get install gcc-9 g++-9
    
    # Compile libtbb from source.
    sudo apt-get build-dep libtbb-dev
    git clone https://github.com/intel/tbb
    cd tbb
    git checkout 2019_U9
    make -j `nproc`
    TBB="$(pwd)"
    TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"
    
    # Use them to compile our test program.
    g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L 
    "${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
    ./main.out
    

    Test program analysis

    I have tested with this program that compares the parallel and serial sorting speed.

    main.cpp

    #include <algorithm>
    #include <cassert>
    #include <chrono>
    #include <execution>
    #include <random>
    #include <iostream>
    #include <vector>
    
    int main(int argc, char **argv) {
        using clk = std::chrono::high_resolution_clock;
        decltype(clk::now()) start, end;
        std::vector<unsigned long long> input_parallel, input_serial;
        unsigned int seed;
        unsigned long long n;
    
        // CLI arguments;
        std::uniform_int_distribution<uint64_t> zero_ull_max(0);
        if (argc > 1) {
            n = std::strtoll(argv[1], NULL, 0);
        } else {
            n = 10;
        }
        if (argc > 2) {
            seed = std::stoi(argv[2]);
        } else {
            seed = std::random_device()();
        }
    
        std::mt19937 prng(seed);
        for (unsigned long long i = 0; i < n; ++i) {
            input_parallel.push_back(zero_ull_max(prng));
        }
        input_serial = input_parallel;
    
        // Sort and time parallel.
        start = clk::now();
        std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
        end = clk::now();
        std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
    
        // Sort and time serial.
        start = clk::now();
        std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
        end = clk::now();
        std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
    
        assert(input_parallel == input_serial);
    }
    

    On Ubuntu 19.10, Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps) a typical output for an input with 100 million numbers to be sorted:

    ./main.out 100000000
    

    was:

    parallel 2.00886 s
    serial 9.37583 s
    

    so the parallel version was about 4.5 times faster! See also: What do the terms "CPU bound" and "I/O bound" mean?

    We can confirm that the process is spawning threads with strace:

    strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'
    

    which shows several lines of type:

    [pid 25774] clone(strace: Process 25788 attached
    [pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788
    

    Also, if I comment out the serial version and run with:

    time ./main.out 100000000
    

    I get:

    real    0m5.135s
    user    0m17.824s
    sys     0m0.902s
    

    which confirms again that the algorithm was parallelized since real < user, and gives an idea of how effectively it can be parallelized in my system (about 3.5x for 8 cores).

    Error messages

    Google, index this please.

    If you don't have tbb installed, the error is:

    In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
                     from /usr/include/c++/9/pstl/algorithm_impl.h:25,
                     from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
                     from /usr/include/c++/9/execution:32,
                     from parallel_sort.cpp:4:
    /usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
       19 | #include <tbb/blocked_range.h>
          |          ^~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    

    so we see that <execution> depends on an uninstalled TBB component.

    If TBB is too old, e.g. the default Ubuntu 18.04 one, it fails with:

    #error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
    
    0 讨论(0)
  • 2020-11-27 18:39

    Gcc does not yet implement the Parallelism TS (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017)

    However libstdc++ (with gcc) has an experimental mode for some equivalent parallel algorithms. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html

    Getting it to work:

    Any use of parallel functionality requires additional compiler and runtime support, in particular support for OpenMP. Adding this support is not difficult: just compile your application with the compiler flag -fopenmp. This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.

    Code example

    #include <vector>
    #include <parallel/algorithm>
    
    int main()
    {
      std::vector<int> v(100);
    
      // ...
    
      // Explicitly force a call to parallel sort.
      __gnu_parallel::sort(v.begin(), v.end());
      return 0;
    }
    
    0 讨论(0)
  • 2020-11-27 18:44

    You can refer https://en.cppreference.com/w/cpp/compiler_support to check all C++ feature implementation status. For your case, just search "Standardization of Parallelism TS", and you will find only MSVC and Intel C++ compilers support this feature now.

    0 讨论(0)
提交回复
热议问题