Multithread program in C++ shows the same performance as a serial one

后端未结

关注

 3  1865

I just want to write a simple program in C++, which creates two threads and each of them fills vector by squares of integers (0, 1, 4, 9, ...). Here is my code:

相关标签:

3条回答

梦谈多话

2020-12-19 14:56
When I execute your code with MSVC2015 on a i7, I observe:
- in debug mode, multithread is 14s compared to 26s in monothread. So it's almost twice as fast. The results are as expected.
- in release mode, multithread is 0.3 compared to 0.2 in monothread, so it's slower, as you've reported.
This suggest that your issue is related to the fact that the optimized fill() is too short compared to the overhead of creating a thread.

Note also that even when there is enought work to do in fill() (e.g. the unoptimized version), the multithread will not multiply the time by two. Multithreading will increase overall throughput per second on a multicore processor, but each thread taken separately might run a little bit slower than usual.

Edit: additional information

The multithreading performance depends on a lot of factors, among others, for example the number of cores on your processor, the cores used by other processes running during the test, and as remarked by doug in his comment, the profile of the multithreaded task (i.e. memory vs. computing).

To illustrate this, here the results of an informal benchmark that shows that decrease of individual thread throughput is much faster for memory intensive than for floating point intensive computations, and global throughput grows much slower (if at all):

Using the following functions for each thread :
```
// computation intensive
void mytask(unsigned long long loops)
{
    volatile double x; 
    for (unsigned long long i = 0; i < loops; i++) {
        x = sin(sqrt(i) / i*3.14159);
    }
}

//memory intensive
void mytask2(vector<unsigned long long>& v, unsigned long long loops)
{
    for (unsigned long long i = 0; i < loops; i++) {
        v.push_back(i*3+10);
    }
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

北荒

2020-12-19 14:58

Most of the suggestions are right: threading a task will improve the execution time only if the thread cpu load (in your case the multiplication i * i) is more important than the shared memory access load (in your case v.push_back). You can try with this code. You will see the gains of threading. And you can use the unix command

>time ./a.out

to time your code more easily.

#include <iostream>
#include <vector>
#include <functional>
#include <thread>
#include <time.h>
#include <math.h>

#define MULTI 1
#define SIZE 10000000

void fill(std::vector<unsigned long long int> &v, size_t n)
{
    int sum = 0;
    for (size_t i = 0; i < n; ++i) {
        for (size_t j = 0; j < 100; ++j) {
            sum += sqrt(i*j);
        }
    }
    v.push_back(sum);
}

int main()
{
    std::vector<unsigned long long int> v1, v2;
    v1.reserve(SIZE);
    v2.reserve(SIZE);
    #if !MULTI
    fill(v1, SIZE);
    fill(v2, SIZE);
    #else
    std::thread first(fill, std::ref(v1), SIZE);
    std::thread second(fill, std::ref(v2), SIZE);

    first.join();
    second.join();
    #endif
    return 0;
}

0 讨论(0)

借酒劲吻你

2020-12-19 15:09

The fill function will run so fast that the thread overhead is likely as long as the execuition.

Replace fill with something that takes a significant amount of time to execute. As a first pass, use std::this_thread::sleep_for

0 讨论(0)
发布评论:

提交评论
- 加载中...