Assuming the following for...
Output:
The file is opened...
Data is \'streamed\' to disk. The data in memory is in a large contiguous buffer. It is
A general advice is to turn off buffering and read/write in large chunks (but not too large, then you will waste too much time waiting for the whole I/O to complete where otherwise you could start munching away at the first megabyte already. It's trivial to find the sweet spot with this algorithm, there's only one knob to turn: the chunk size).
Beyond that, for input mmap()ing the file shared and read-only is (if not the fastest, then) the most efficient way. Call madvise() if your platform has it, to tell the kernel how you will traverse the file, so it can do readahead and throw out the pages afterwards again quickly.
For output, if you already have a buffer, consider underpinning it with a file (also with mmap()), so you don't have to copy the data in userspace.
If mmap() is not to your liking, then there's fadvise(), and, for the really tough ones, async file I/O.
(All of the above is POSIX, Windows names may be different).