Reading all content from a text file - C

孤人 提交于 2019-12-03 16:27:29

You should try look into the functions fsize (About fsize, see update below) and fread. This could be a huge performance improvement.

Use fsize to get the size of the file you are reading. Use this size to do one alloc of memory only. (About fsize, see update below. The idea of getting the size of the file and doing one alloc is still the same).

Use fread to do block reading of the file. This is much faster than single charecter reading of the file.

Something like this:

long size = fsize(fp);
fcontent = malloc(size);
fread(fcontent, 1, size, fp);

Update

Not sure that fsize is cross platform but you can use this method to get the size of the file:

fseek(fp, 0, SEEK_END); 
size = ftell(fp);
fseek(fp, 0, SEEK_SET); 

People often realloc to twice the existing size to get amortized constant time instead of linear. This makes the buffer no more than twice as large, which is usually okay, and you have the option of reallocating back down to the correct size after you're done.

But even better is to stat(2) for the file size and allocate once (with some extra room if the file size is volatile).

Also, why you don't either fgets(3) instead of reading character by character, or, even better, mmap(2) the entire thing (or the relevant chunk if it's too large for memory).

It is probably slower and certainly more complex than:

while((c = getc(fp)) != EOF) {
    putchar(c);
}

which does the same thing as your code.

This is from a quick reading, so I might have missed a few issues.

First, a = realloc(a, ...); is wrong. If realloc() fails, it returns NULL, but doesn't free the original memory. Since you reassign to a, the original memory is lost (i.e., it is a memory leak). The right way to do this is to do: tmp = realloc(a, ...); if (tmp) a = tmp; etc.

Second, about determining the file size using fseek(fp, 0, SEEK_END);, note that this may or may not work. If the file is not random-access (such as stdin), you won't be able to go back to the beginning to read it. Also, fseek() followed by ftell() may not give a meaningful result for binary files. And for text files, it may not give you the right number of characters that can be read. There is some useful information on this topic on comp.lang.c FAQ question 19.2.

Also, in your original code, you don't set index to 0 when it equals PAGESIZE, so if your file length is greater than 2*PAGESIZE, you will overwrite the buffer.

Your freecontent() function:

static void freecontent(char *content)
{
    if(content) {
        free(content);
        content = NULL;
    }
}

is useless. It only sets a copy of content to NULL. It is just like if you wrote a function setzero like this:

void setzero(int i) { i = 0; }

A much better idea is to keep track of memory yourself and not free anything more or less than needed.

You shouldn't cast the return value of malloc() or realloc() in C, since a void * is implicitly converted to any other object pointer type in C.

Hope that helps.

sudish

One problem I can see here is variable index which is non-decreasing. So the condition if(!fcontent || index == PAGE_SIZE) will be true only once. So I think check should be like index%PAGE_SIZE == 0 instead of index == PAGE_SIZE.

On POSIX systems (e.g linux) you could get the same effect with the system call mmap that maps all your file in memory. It has an option to map that file copy on write, so you would overwrite your file if you change the buffer.

This would usually be much more efficient, since you leave as much as you can to the system. No need to do realloc or similar.

In particular, if you are only reading and several processes do that at the same time there would be only one copy in memory for the whole system.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!