mmap slower than getline?

后端 未结 4 935
小鲜肉
小鲜肉 2020-12-29 14:17

I face the challenge of reading/writing files (in Gigs) line by line.

Reading many forum entries and sites (including a bunch of SO\'s), mmap was suggested as the f

4条回答
  •  抹茶落季
    2020-12-29 15:10

    The real power of mmap is being able to freely seek in a file, use its contents directly as a pointer, and avoid the overhead of copying data from kernel cache memory to userspace. However, your code sample is not taking advantage of this.

    In your loop, you scan the buffer one character at a time, appending to a stringstream. The stringstream doesn't know how long the string is, and so has to reallocate several times in the process. At this point you've killed off any performance increase from using mmap - even the standard getline implementation avoids multiple reallocations (by using a 128-byte on-stack buffer, in the GNU C++ implementation).

    If you want to use mmap to its fullest power:

    • Don't copy your strings. At all. Instead, copy around pointers right into the mmap buffer.
    • Use built-in functions such as strnchr or memchr to find newlines; these make use of hand-rolled assembler and other optimizations to run faster than most open-coded search loops.

提交回复
热议问题