Suppose I have a dataset that is an array of 1e12 32-bit ints (4 TB) stored in a file on a 4TB HDD ext4 filesystem..
Consider that the data is most likely random (or at
I'd say performance should be similar if access is truly random. The OS will use a similar caching strategy whether the data page is mapped from a file or the file data is simply cached without an association to RAM.
Assuming cache is ineffective:
fadvise
to declare your access pattern in advance and disable readahead.So I'd go with explicit reads.