How many random / sequential access per fread / fwrite?

问题

I have the following question regarding C file I/O.

At a physical level (harddrive), is it valid to assume that every fread(n_blocks, size, length,FILE fp) operation should cost one random access for the first page (block) and n-1 sequential accesses for the next blocks of that same buffer ??

I assumed this because the OS has so many processes that is mostly sure that one of them is also writing to or reading from a file between each fread of the local program and by that assumption the hard drive is positioned at another sector / cylinder.

is ok this assumption?

回答1:

You can assume whatever you want, it's much more complicated in reality.

fread/fwrite will usually read and write from/to an internal buffer in the memory of your process. When the buffer is full/empty, they will forward the read/write to the operating system, which has its own cache. If you are reading and the OS can't find the portion of the file in the cache then your program will wait till the data is actually fetched from the hard-drive, which is an expensive operation. If you're writing then the data will be just copied to the OS cache and reside there till it'll flush to the disk, which may happen long after your program has closed the file. Then, today's hard drives have in turn their own caches which the OS may not even be aware of.

回答2:

No, it's not. The blocks of a single file may be scattered all over the hard disk if the filesystem is fragmented.

回答3:

No it's not. You can't even assume that an fread will trigger physical I/O. Your OS has the possibility to do a lot of stuff with I/O requests, including caching the results, reordering and coalescing (or splitting) reads (and even sometimes writes).

If there is a lot of I/O going on, you can't count on getting sequential reads either, depending on what size buffer you (and possibly the I/O stream library) use. Some operating systems provide ways to "hint" that you will be reading sequentially on a file descriptor (or mmaped region) which could help.

回答4:

From the point view of an application programmer, the exact process of reading the blocks is indeterministic. It all goes down to the disk scheduler that organizes the access operations of multiple requests at the same time from multiple processes. There are multiple algorithms to solve this issue, but thinking too simplistic(1 random seek, n sequential seeks) is not realistic at all. In the end, neither the C standard nor the C++ standard define such a thing for clear reasons.

回答5:

As many explained, caching (perhaps at several levels) has to be taken into account.

Perhaps you want to know how to accelerate or tune it from your C code. This is highly operating system specific.

On recent Linux systems, you could use the readahead, madvise (with mmap) and others system calls.

Often, you can simply read in advance a file (perhaps just with cat yourfile > /dev/null) and your program would then run faster on Linux.

Try for instance running twice the wc word counting utility on some big file. The second run usually goes much faster than the first.

来源：https://stackoverflow.com/questions/8457668/how-many-random-sequential-access-per-fread-fwrite

标签

c++

fread