Suppose I have a dataset that is an array of 1e12 32-bit ints (4 TB) stored in a file on a 4TB HDD ext4 filesystem..
Consider that the data is most likely random (or at
Probably for a 4TB linear dataset you don't need a file system. I guess a raw device access may bring some performance benefits.
Also probably there is a way to optimize the queries or the data structure, so that caching could be used more efficiently?