Why does moving the buffer pointer slow down fread (C programming language)?

牧云@^-^@ 提交于 2019-12-11 03:23:15

问题


I am reading a 1 GB file using fread in C. I am reading the file in 1MB chunks, using the following loop:

FILE *fp;
fp = fopen(filename, "rb");

unsigned char* buf;
buf = malloc(CHUNK_SIZE);

for(i = 0; i < NUMBER_OF_CHUNKS; ++i)
{
    fread(buf, CHUNK_SIZE, 1, fp);        

    //Do something with contents of buffer    
}
fclose(fp);

Reading the file this way takes ~2 seconds.

However, I decided that I wanted to allocate one big buffer for the contents of the whole file instead and "move the buffer pointer" inside the fread function at each iteration, like this:

FILE *fp;
fp = fopen(filename, "rb");

unsigned char* buf;
buf = malloc(CHUNK_SIZE * NUMBER_OF_CHUNKS);

for(i = 0; i < NUMBER_OF_CHUNKS; ++i)
{
    fread(&buf[i*CHUNK_SIZE], CHUNK_SIZE, 1, fp);         
}
fclose(fp);

This slows down the reading significantly, it now takes about ~40 seconds.

My questions are:

  1. Why does this have such a huge impact on performance?
  2. What would you recommend I do if I want to read the file in the second way, but I want to keep time low?

The file consists of a single line of alphanumeric characters.

I want to read it in the second way, so that I can have other threads access the chunks in the buffer that are already read, while the reading thread continues filling the rest of the buffer.

Thank you!


回答1:


It's possible that you are running out of memory on your machine. A gigabyte is rather a lot of memory to allocate. Your OS my have to swap some of the data to disk, which will cause an order of magnitude slowdown.

You could consider allocating each chunk individually, and freeing them when they are done with. This way the total memory usage of your program is bounded by the working set, rather than the entire file.




回答2:


When you run out of memory and the OS swaps it back and forth the swap partition, you not only cause about 3x as much disk traffic as intended. Moreover with mechanical/rotating hard disc [yes, those are still quite common] the head is required to seek back and forth the swap space and the file you are reading -- even when the files are not fragmented. This will most likely cause the additional 10-15x speed penalty.

A possible workaround is to use mmap to memory map the file as a continuous memory, allowing OS to decide the best swap strategy.



来源:https://stackoverflow.com/questions/22038509/why-does-moving-the-buffer-pointer-slow-down-fread-c-programming-language

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!