Why use 4096 elements for a char array buffer?

冷暖自知 提交于 2019-12-23 01:10:48

问题


I found a program that takes in standard input

int main(int argc, char **argv) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <PATTERN>\n", argv[0]);
        return 2;
    }

    /* we're not going to worry about long lines */
    char buf[4096]; // 4kibi

    while (!feof(stdin) && !ferror(stdin)) { // when given a file through input redirection, file becomes stdin
        if (!fgets(buf, sizeof(buf), stdin)) { // puts reads sizeof(buf) characters from stdin and puts it into buf; fgets() stops reading when the newline is read
            break;
        }
        if (rgrep_matches(buf, argv[1])) {
            fputs(buf, stdout); // writes the string into stdout
            fflush(stdout);
        }
    }

    if (ferror(stdin)) {
        perror(argv[0]); // interprets error
        return 1;
    }

    return 0;
}

Why is the buf set to 4096 elements? Is it because the maximum number of characters on each line can only be 4096?


回答1:


The answer is in the code you pasted:

/* we're not going to worry about long lines */
char buf[4096]; // 4kibi

Lines longer than 4096 characters can occur, but the author didn't deem them worth caring about.

Note also the definition of fgets:

fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte (\0) is stored after the last character in the buffer.

So if there is a line longer than 4095 characters (since the 4096'th is reserved for the null byte), it will be split across multiple iterations of the while loop.




回答2:


The program just reads 4096 characters per iteration.

There's no limit in the size of a line, but the may be a limit in the size of the stack ( 8 MB in modern linux systems)

Most programmers choose what fit best for the program being implemented, in this case the programmer commented that there's no need to worry about longer lines.




回答3:


The author seems to just have a very large memory block for his expected input, to avoid dealing with chunks.

The seemingly awkward number 4096 is most likely explained by the fact that it is a) a power of two number and b) is a memory page size. So when the system chooses to swap out a page to disc, it can do it in one go without any overhead involved.

Wether this really helps is another question, because if you allocate a page with 'malloc', it may not be aligned on a page boundary.

I myself also use such a number often, because it doesn't hurt and in best case it might help. However, it is only really relevant if you are worried about speed and you have reall yontrol over the allocation process in detail. If you allocate a page directly from the OS, then such a size might really have some benefits.




回答4:


There is no such thing as max no characters in a line. 4096 is taken assuming a normal condition's no lines will be more than 4096 bytes.

It more like preparing for worst case.

Assume you take the size of array less than the sizeof(line) then itbreaks the operation into more than one step till eof is encountered.




回答5:


I think it is simply that the author chose the char buffer size to be 4*kibi* (4096 = 1024 * 4) by design as commented in code.



来源:https://stackoverflow.com/questions/22060177/why-use-4096-elements-for-a-char-array-buffer

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!