I got some huge files I need to parse, and people have been recommending mmap because this should avoid having to allocate the entire file in-memory.
But looking at
"allocate the whole file in memory" conflates two issues. One is how much virtual memory you allocate; the other is which parts of the file are read from disk into memory. Here you are allocating enough space to contain the whole file. However, only the pages that you touch will actually be changed on disk. And, they will be changed correctly no matter what happens with the process, once you have updated the bytes in the memory that mmap allocated for you. You can allocate less memory by mapping only a section of the file at a time by using the "size" and "offset" parameters of mmap. Then you have to manage a window into the file yourself by mapping and unmapping, perhaps moving the window through the file. Allocating a big chunk of memory takes appreciable time. This can introduce an unexpected delay into the application. If your process is already memory-intensive, the virtual memory may have become fragmented and it may be impossible to find a big enough chunk for a large file at the time you ask. It may therefore necessary to try to do the mapping as early as possible, or to use some strategy to keep a large enough chunk of memory available until you need it.
However, seeing as you specify that you need to parse the file, why not avoid this entirely by organizing your parser to operate on a stream of data? Then the most you will need is some look-ahead and some history, instead of needing to map discrete chunks of the file into memory.