When I seek to some position in a file and write a small amount of data (20 bytes), what goes on behind the scenes?
My understanding
To my k
Indeed, at least on my system with GNU libc, it looks like stdio is reading 4kB blocks before writing back the changed portion. Seems bogus to me, but I imagine somebody thought it was a good idea at the time.
I checked by writing a trivial C program to open a file, write a small of data once, and exit; then ran it under strace, to see which syscalls it actually triggered. Writing at an offset of 10000, I saw these syscalls:
lseek(3, 8192, SEEK_SET) = 8192
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1808) = 1808
write(3, "hello", 5) = 5
Seems that you'll want to stick with the low-level Unix-style I/O for this project, eh?
The C standard library functions perform additional buffering, and are generally optimized for streaming reads, rather than random IO. On my system, I don't observe the spurious reads that Jamey Sharp saw I only see spurious reads when the offset is not aligned to a page size - it could be that the C library always tries to keep its IO buffer aligned to 4kb or something.
In your case, if you're doing lots of random reads and writes across a reasonably small dataset, you'd likely be best served using pread/pwrite to avoid having to make seeking syscalls, or simply mmaping the dataset and writing to it in memory (likely to be the fastest, if your dataset fits in memory).