Linux: Large int array: mmap vs seek file?

后端 未结 4 592
清酒与你
清酒与你 2021-02-05 11:14

Suppose I have a dataset that is an array of 1e12 32-bit ints (4 TB) stored in a file on a 4TB HDD ext4 filesystem..

Consider that the data is most likely random (or at

4条回答
  •  长发绾君心
    2021-02-05 11:55

    I'd say performance should be similar if access is truly random. The OS will use a similar caching strategy whether the data page is mapped from a file or the file data is simply cached without an association to RAM.

    Assuming cache is ineffective:

    • You can use fadvise to declare your access pattern in advance and disable readahead.
    • Due to address space layout randomization, there might not be a contiguous block of 4 TB in your virtual address space.
    • If your data set ever expands, the address space issue might become more pressing.

    So I'd go with explicit reads.

提交回复
热议问题