I face the challenge of reading/writing files (in Gigs) line by line.
Reading many forum entries and sites (including a bunch of SO\'s), mmap was suggested as the f
The real power of mmap is being able to freely seek in a file, use its contents directly as a pointer, and avoid the overhead of copying data from kernel cache memory to userspace. However, your code sample is not taking advantage of this.
In your loop, you scan the buffer one character at a time, appending to a stringstream. The stringstream doesn't know how long the string is, and so has to reallocate several times in the process. At this point you've killed off any performance increase from using mmap - even the standard getline implementation avoids multiple reallocations (by using a 128-byte on-stack buffer, in the GNU C++ implementation).
If you want to use mmap to its fullest power:
strnchr or memchr to find newlines; these make use of hand-rolled assembler and other optimizations to run faster than most open-coded search loops.