Linux: Large int array: mmap vs seek file?

后端 未结 4 609
清酒与你
清酒与你 2021-02-05 11:14

Suppose I have a dataset that is an array of 1e12 32-bit ints (4 TB) stored in a file on a 4TB HDD ext4 filesystem..

Consider that the data is most likely random (or at

4条回答
  •  感动是毒
    2021-02-05 12:04

    On the one hand, you have extensive use of memory swap resulting in minor pagefaults, transparent for the applicative. On the other one, you have numerous system calls, with the known overhead. The Wikipedia page about memory-mapped file seems to be quite clear to me, it browses in an comprehensive manner pros and cons.

    I think 64bit architecture + large file call for a memory-mapped file approach, at least to keep from complexifying the applicative; I have been told that complexity often leads to poor performance. However mmap() is usual for sequential access, which is not the purpose here.

    Because this is pure random access, there is few chance that two accesses will be in the same RAM-loaded page. A full 4kb page will be swapped from the HDD to the RAM, just for a 4 bytes data... This is useless loading of buses and will probably result in poor performances.

    Hope this help.

提交回复
热议问题