Multimap Space Issue: Guava

后端 未结 4 2059

In my Java code, I am using Guava\'s Multimap (com.google.common.collect.Multimap) by using this:

 Multimap Index = HashMultimap.crea         


        
相关标签:
4条回答
  • 2021-01-02 08:36

    It sounds like you need a sparse boolean matrix. Sparse matrices / arrays in Java should provide pointers to library code. Then instead of putting (i, j) into the multimap, just put a 1 into the matrix at [i][j].

    0 讨论(0)
  • 2021-01-02 08:37

    Probably the simplest way to minimize the memory overhead would be to potentially mix Trove's primitive collection implementations (to avoid memory overhead of boxing) and Guava's Multimap, something like

    SetMultimap<Integer, Integer> multimap = Multimaps.newSetMultimap(
      TDecorators.wrap(TIntObjectHashMap<Collection<Integer>>()),
      new Supplier<Set<Integer>>() {
        public Set<Integer> get() {
          return TDecorators.wrap(new TIntHashSet());
        }
      });
    

    That still has the overhead of boxing and unboxing on queries, but the memory it consumes just sitting there would be significantly reduced.

    0 讨论(0)
  • 2021-01-02 08:43

    You could use probably an ArrayListMultimap, which requires less memory than a HashMultimap, since ArrayLists are smaller than HashSets. Or, you could modify Louis's Trove solution, replacing the Set with a List, to reduce memory usage further.

    Some applications depend on the fact that HashMultimap satisfies the SetMultimap interface, but most don't.

    0 讨论(0)
  • 2021-01-02 08:59

    There's a huge amount of overhead associated with Multimap. At a minimum:

    • Each key and value is an Integer object, which (at a minimum) doubles the storage requirements of each int value.
    • Each unique key value in the HashMultimap is associated with a Collection of values (according to the source, the Collection is a Hashset).
    • Each Hashset is created with default space for 8 values.

    So each key/value pair requires (at a minimum) perhaps an order of magnitude more space than you might expect for two int values. (Somewhat less when multiple values are stored under a single key.) I would expect 10 million key/value pairs to take perhaps 400MB.

    Although you have 2.5GB of heap space, I wouldn't be all that surprised if that's not enough. The above estimate is, I think, on the low side. Plus, it only accounts for how much is needed to store the map once it is built. As the map grows, the table needs to be reallocated and rehashed, which temporarily at least doubles the amount of space used. Finally, all this assumes that int values and object references require 4 bytes. If the JVM is using 64-bit addressing, the byte count probably doubles.

    0 讨论(0)
提交回复
热议问题