Multithread program in C++ shows the same performance as a serial one

后端 未结 3 1863
一生所求
一生所求 2020-12-19 14:16

I just want to write a simple program in C++, which creates two threads and each of them fills vector by squares of integers (0, 1, 4, 9, ...). Here is my code:



        
相关标签:
3条回答
  • 2020-12-19 14:56

    When I execute your code with MSVC2015 on a i7, I observe:

    • in debug mode, multithread is 14s compared to 26s in monothread. So it's almost twice as fast. The results are as expected.
    • in release mode, multithread is 0.3 compared to 0.2 in monothread, so it's slower, as you've reported.

    This suggest that your issue is related to the fact that the optimized fill() is too short compared to the overhead of creating a thread.

    Note also that even when there is enought work to do in fill() (e.g. the unoptimized version), the multithread will not multiply the time by two. Multithreading will increase overall throughput per second on a multicore processor, but each thread taken separately might run a little bit slower than usual.

    Edit: additional information

    The multithreading performance depends on a lot of factors, among others, for example the number of cores on your processor, the cores used by other processes running during the test, and as remarked by doug in his comment, the profile of the multithreaded task (i.e. memory vs. computing).

    To illustrate this, here the results of an informal benchmark that shows that decrease of individual thread throughput is much faster for memory intensive than for floating point intensive computations, and global throughput grows much slower (if at all):

    Using the following functions for each thread :

    // computation intensive
    void mytask(unsigned long long loops)
    {
        volatile double x; 
        for (unsigned long long i = 0; i < loops; i++) {
            x = sin(sqrt(i) / i*3.14159);
        }
    }
    
    //memory intensive
    void mytask2(vector<unsigned long long>& v, unsigned long long loops)
    {
        for (unsigned long long i = 0; i < loops; i++) {
            v.push_back(i*3+10);
        }
    }
    
    0 讨论(0)
  • 2020-12-19 14:58

    Most of the suggestions are right: threading a task will improve the execution time only if the thread cpu load (in your case the multiplication i * i) is more important than the shared memory access load (in your case v.push_back). You can try with this code. You will see the gains of threading. And you can use the unix command

    >time ./a.out 
    

    to time your code more easily.

    #include <iostream>
    #include <vector>
    #include <functional>
    #include <thread>
    #include <time.h>
    #include <math.h>
    
    #define MULTI 1
    #define SIZE 10000000
    
    void fill(std::vector<unsigned long long int> &v, size_t n)
    {
        int sum = 0;
        for (size_t i = 0; i < n; ++i) {
            for (size_t j = 0; j < 100; ++j) {
                sum += sqrt(i*j);
            }
        }
        v.push_back(sum);
    }
    
    int main()
    {
        std::vector<unsigned long long int> v1, v2;
        v1.reserve(SIZE);
        v2.reserve(SIZE);
        #if !MULTI
        fill(v1, SIZE);
        fill(v2, SIZE);
        #else
        std::thread first(fill, std::ref(v1), SIZE);
        std::thread second(fill, std::ref(v2), SIZE);
    
        first.join();
        second.join();
        #endif
        return 0;
    }
    
    0 讨论(0)
  • 2020-12-19 15:09

    The fill function will run so fast that the thread overhead is likely as long as the execuition.

    Replace fill with something that takes a significant amount of time to execute. As a first pass, use std::this_thread::sleep_for

    0 讨论(0)
提交回复
热议问题