array of structures, or structure of arrays?

后端 未结 11 518
甜味超标
甜味超标 2021-01-01 01:32

Hmmm. I have a table which is an array of structures I need to store in Java. The naive don\'t-worry-about-memory approach says do this:

public class Record          


        
11条回答
  •  星月不相逢
    2021-01-01 01:56

    If you have millions of records, the second approach has several advantages:

    • Memory usage: the first approach uses more memory because a) every Java object in heap has a header (containing class id, lock state etc.); b) objects are aligned in memory; c) each reference to an object costs 4 bytes (on 64-bit JVMs with Compressed OOPs or 32-bit JVMs) or 8 bytes (64-bit JVMs without Compressed OOPs). See e. g. CompressedOops for more details. So the first approach takes about two times more memory (more precisely: according to my benchmark, an object with 16 bytes of payload + a reference to it took 28 bytes on 32-bit Java 7, 36 bytes on 64-bit Java 7 with compressed OOPs, and 40 bytes on 64-bit Java 7 w/o compressed OOPs).
    • Garbage collection: although the second approach seems to create many objects (one on each call of getRecord), it might not be so, as modern server JVMs (e. g. Oracle's Java 7) can apply escape analysis and stack allocation to avoid heap allocation of temporary objects in some cases; anyway, GCing short-lived objects is cheap. On the other hand, it is probably easier for the garbage collector if there are not millions of long-lived objects (as there are in the first approach) whose reachability to check (or at least, such objects may make your application need more careful tuning of GC generation sizes). Thus the second approach may be better for GC performance. However, to see whether it makes a difference in the real situation, one should make a benchmark oneself.
    • Serialization speed: the speed of (de)serializing a large array of primitives on disk is only limited by HDD speed; serializing many small objects is inevitably slower (especially if you use Java's default serialization).

    Therefore I have used the second approach quite often for very large collections. But of course, if you have enough memory and don't care about serialization, the first approach is simpler.

提交回复
热议问题