In my Java code, I am using Guava\'s Multimap (com.google.common.collect.Multimap) by using this:
Multimap Index = HashMultimap.crea
There's a huge amount of overhead associated with Multimap
. At a minimum:
Integer
object, which (at a minimum) doubles the storage requirements of each int
value.HashMultimap
is associated with a Collection
of values (according to the source, the Collection
is a Hashset
).Hashset
is created with default space for 8 values.So each key/value pair requires (at a minimum) perhaps an order of magnitude more space than you might expect for two int
values. (Somewhat less when multiple values are stored under a single key.) I would expect 10 million key/value pairs to take perhaps 400MB.
Although you have 2.5GB of heap space, I wouldn't be all that surprised if that's not enough. The above estimate is, I think, on the low side. Plus, it only accounts for how much is needed to store the map once it is built. As the map grows, the table needs to be reallocated and rehashed, which temporarily at least doubles the amount of space used. Finally, all this assumes that int
values and object references require 4 bytes. If the JVM is using 64-bit addressing, the byte count probably doubles.