Big array of size 1mega caused high CPU?

问题

I have a multithreaded server application. This application receives data from sockets then handles these data like unpacking package, adding to data queue, etc, the function is as below. This function is called frequently. There is a select statement and if it finds there is data it will call this function to receive):

         //the main function used to receive 
         //file data from clients
         void service(void){
              while(1){
                   ....
                   struct timeval timeout;
                   timeout.tv_sec = 3;

                   ...
                   ret = select(maxFd+1, &read_set, NULL, NULL, &timeout);
                   if (ret > 0){
                       //get socket from SocketsMap
                       //if fd in SocketsMap and its being set
                       //then receive data from the socket
                       receive_data(fd);
                   }
              }
         } 

         void receive_data(int fd){
              const int ONE_MEGA = 1024 * 1024;

              //char *buffer = new char[ONE_MEGA]; consumes much less CPU
              char buffer[ONE_MEGA]; // cause high CPU 
              int readn = recv(fd, buffer, ONE_MEGA, 0);

              //handle the data
         }

I found the above consumes too much CPU -- usually 80% to 90%, but if I create the buffer from heap instead the CPU is only 14%. Why?

[update]
Added more code

[update2]
The stangest thing is that I also wrote another simple data-receiving server and client. The server simply receives data from sockets then discard it. Both types of space allocating works almost the same, no big difference in CPU usage. In the multithreaded server application which has the problem, I even reset the process stack size to 30M, using array still results in the problem, but allocating from heap solves it. I don't know why.

Regarding the "sizeof(buffer)", thanks for pointing out this, but I am 100% sure that it is not the problem, because in my application I don't use sizeof(buffer), but ONE_MEGA (1024*1024) instead.

By the way, there is one more thing to mention though I am not sure it's useful or not. Replacing the array with a smaller one such as "char buffer[1024]; also decreases the cpu usage dramatically.

[update3]
All sockets are in non-blocking mode.

回答1:

I just wrote this:

#include <iostream>
#include <cstdio>

using namespace std;

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

const int M = 1024*1024;

void bigstack()
{
    FILE *f = fopen("test.txt", "r");
    unsigned long long time;
    char buffer[M];

    time = rdtsc();
    fread(buffer, M, 1, f);
    time = rdtsc() - time;
    fclose(f);
    cout << "bs: Time = " << time / 1000 << endl;
}


void bigheap()
{
    FILE *f = fopen("test.txt", "r");
    unsigned long long time;
    char *buffer = new char[M];

    time = rdtsc();
    fread(buffer, M, 1, f);
    time = rdtsc() - time;
    delete [] buffer;
    fclose(f);
    cout << "bh: Time = " << time / 1000 << endl;
}



int main()
{
    for(int i = 0; i < 10; i++)
    {
    bigstack();
    bigheap();
    }
}

The output is something like this:

bs: Time = 8434
bh: Time = 7242
bs: Time = 1094
bh: Time = 2060
bs: Time = 842
bh: Time = 830
bs: Time = 785
bh: Time = 781
bs: Time = 782
bh: Time = 804
bs: Time = 782
bh: Time = 778
bs: Time = 792
bh: Time = 809
bs: Time = 785
bh: Time = 786
bs: Time = 782
bh: Time = 829
bs: Time = 786
bh: Time = 781

In other words, allocating from the stack of the heap makes absolutely no difference. The small amount of "slowness" in the beginning has to do with "warming up the caches".

And I'm fairly convinced that the reason your code behaves differently between the two is something else - maybe what simonc says: sizeof buffer is the problem?

回答2:

If all things are equal, memory is memory and it shouldn't matter whether your buffer is on the heap or on the stack.

But clearly all things aren't equal. I suspect the allocation of the 1M buffer on the stack INTERFERES/OVERLAPS with the stack space allocated to the OTHER threads. That is, to grow the stack requires either relocating the stack of the current thread, or relocating the stacks of the other threads. This takes time. This time is not needed when allocating from the heap or if the stack allocation is small enough not to interfere, as it is with the 1K example.

Assuming you are using a Posix-compatible thread implementation, take a look at

pthread_create
pthread_attr_getstack
pthread_attr_setstack

for giving the thread with the 1M buffer more stack space at thread creation time.

-Jeff

回答3:

You're ignoring the return value from recv. That's not good. Partial reads are a fact of life, and very likely if you pass such a large buffer. When you start processing parts of the buffer that don't contain valid data, unexpected things could happen.

The maximum frame size for the most commonly used protocol is 64kB. It's even possible (although unlikely) that something in the system only uses the lowest 16 bits of the buffer size, which incidentally you've set to zero. Which that would cause recv to return immediately without doing anything, resulting in an endless loop and high CPU usage.

Of course none of this should be any different with a dynamically-allocated buffer, but if you also used sizeof (buffer) and ended up with the heap-user code reading only a pointer-sized chunk at once, it could be different.

来源：https://stackoverflow.com/questions/17921632/big-array-of-size-1mega-caused-high-cpu

标签

c++

Linux

multithreading

sockets

cpu-usage