Why are 50 threads faster than 4?

前端 未结 5 1618
野趣味
野趣味 2020-12-29 06:06
DWORD WINAPI MyThreadFunction(LPVOID lpParam) {
    volatile auto x = 1;
    for (auto i = 0; i < 800000000 / MAX_THREADS;         


        
5条回答
  •  南笙
    南笙 (楼主)
    2020-12-29 06:33

    I took some code that I had "laying about" for some other purposes, and re-used it - so please beware that it's not "pretty", nor is supposed to be a good example of how you should do this.

    Here's the code I came up with (this is on a Linux system, so I'm using pthreads and I removed the "WINDOWS-isms":

    #include 
    #include 
    #include 
    
    int MAX_THREADS = 4;
    
    void * MyThreadFunction(void *) {
        volatile auto x = 1;
        for (auto i = 0; i < 800000000 / MAX_THREADS; ++i) {
            x += i / 3;
        }
        return 0;
    }
    
    
    using namespace std;
    
    int main(int argc, char **argv)
    {
        for(int i = 1; i < argc; i++)
        {
        if (strcmp(argv[i], "-t") == 0 && argc > i+1)
        {
            i++;
            MAX_THREADS = strtol(argv[i], NULL, 0);
            if (MAX_THREADS == 0)
            {
            cerr << "Hmm, seems like end is not a number..." << endl;
            return 1;
            }       
        }
        }
        cout << "Using " << MAX_THREADS << " threads" << endl;
        pthread_t *thread_id = new pthread_t [MAX_THREADS];
        for(int i = 0; i < MAX_THREADS; i++)
        {
        int rc = pthread_create(&thread_id[i], NULL, MyThreadFunction, NULL);
        if (rc != 0)
        {
            cerr << "Huh? Pthread couldn't be created. rc=" << rc << endl;
        }
        }
        for(int i = 0; i < MAX_THREADS; i++)
        {
            pthread_join(thread_id[i], NULL);
        }
        delete [] thread_id;
    }
    

    Running this with a variety of number of threads:

    MatsP@linuxhost junk]$ g++ -Wall -O3 -o thread_speed thread_speed.cpp -std=c++0x -lpthread
    [MatsP@linuxhost junk]$ time ./thread_speed -t 4
    Using 4 threads
    
    real    0m0.448s
    user    0m1.673s
    sys 0m0.004s
    [MatsP@linuxhost junk]$ time ./thread_speed -t 50
    Using 50 threads
    
    real    0m0.438s
    user    0m1.683s
    sys 0m0.008s
    [MatsP@linuxhost junk]$ time ./thread_speed -t 1
    Using 1 threads
    
    real    0m1.666s
    user    0m1.658s
    sys 0m0.004s
    [MatsP@linuxhost junk]$ time ./thread_speed -t 2
    Using 2 threads
    
    real    0m0.847s
    user    0m1.670s
    sys 0m0.004s
    [MatsP@linuxhost junk]$ time ./thread_speed -t 50
    Using 50 threads
    
    real    0m0.434s
    user    0m1.670s
    sys 0m0.005s
    

    As you can see, the "user" time stays almost identical. I actually tries a lot of other values too. But the results are the same so I won't bore y'all with a dozen more that show almost the same thing.

    This is running on a quad core processor, so you can see that the "more than 4 threads" times show the same "real" time as with "4 threads".

    I doubt very much there is anything different in how Windows deals with threads.

    I also compiled the code with a #define MAX_THREADS 50 and same again with 4. It gave no difference to the code posted - but just to cover the alternative where the compiler is optimizing the code.

    By the way, the fact that my code runs some three to ten times faster indicates that the originally posted code is using debug mode?

提交回复
热议问题