Map alternative for primitive values

时光总嘲笑我的痴心妄想 提交于 2021-02-06 09:42:09

问题


I did some profiling on my application and one of the results turned out that about 18% of memory on the heap is used by objects of type Double. It turns out these objects are the values in Maps, where I cannot use the primitive type.

My reasoning is that the primitive type of double consumes less memory than it's object Double. Is there a way to have a map like data structure, that would accept any type as key and a primitive double as values?

Main operations would be:

  • insertion (probably only once)
  • Lookup (contains by key)
  • Retrieval (by key)
  • Iteration

Typical maps that I have are:

  • HashMap<T, HashMap<NodeData<T>, Double>> graph
  • HashMap<Point2D, Boolean> onSea (though not a double value)
  • ConcurrentHashMap<Point2D, HashMap<Point2D, Double>>

All used with Java 8.

Addendum

I am mainly not interested in frameworks that have a solution to these type of maps, but on what has to be considered when addressing these issues. If you want, what are the concepts/ideas/approaches behind any such framework. Or the solution may be also on another level, where the maps are replaced with objects following a certain pattern like @Ilmari Karonen pointed out in his answer.


回答1:


Others have already suggested several third-party implementations of primitive-valued maps. For completeness, I'd like to mention some ways to get rid of the maps entirely that you might want to consider. These solutions won't always be possible, but when they are, they will usually be both faster and more memory-efficient than any map can be.

Alternative 1: Use plain old arrays.

A simple double[] array may not be as sexy as a fancy map, but very little can beat it in compactness and speed of access.

Of course, arrays come with a bunch of limitations: their size is fixed (although you can always create a new array and copy the old one's content into it) and their keys can only be small positive integers which, for efficiency, should be reasonably dense (i.e. the total number of used keys should be a reasonably large fraction of the highest key value). But if that happens to be the case for your keys, or if you can arrange for it to be the case, arrays of primitive values can be very efficient.

In particular, if you can assign a unique small integer ID to each key object, then you can use that ID as an index into an array. Similarly, if you're already storing your objects in an array (e.g. as part of some more complex data structure) and looking them up by index, then you can simply use the same index to look up any additional metadata values in another array.

You could even dispense with the ID uniqueness requirement, if you implemented some kind of a collision handling mechanism, but at that point you're well on your way towards implementing your own hash table. In some cases that might actually make sense, but usually at that point it's probably easier to use an existing third-party implementation.

Alternative 2: Customize your objects.

Instead of maintaining a map from key objects into primitive values, why not just make those values into properties of the objects themselves? This is, after all, what object-oriented programming is all about — grouping related data into meaningful objects.

For example, instead of maintaining a HashMap<Point2D, Boolean> onSea, why not just give your points a boolean onSea property? Of course, you'll need to define your own custom point class for this, but there's no reason why you can't make it extend the standard Point2D class if you want, so that you can pass your custom points into any method that expects a Point2D.

Again, this approach may not always work directly, e.g. if you need to work with classes that you cannot modify (but see below), or if the values you want to store are associated with more than one object (as in your ConcurrentHashMap<Point2D, HashMap<Point2D, Double>>).

However, for the latter case, you may still be able to solve the problem by suitably redesigning your data representation. For example, instead of representing a weighted graph as a Map<Node, Map<Node, Double>>, you can define an Edge class like:

class Edge {
    Node a, b;
    double weight;
}

and then add an Edge[] (or Vector<Edge>) property to each node that contains any edges connected to that node.

Alternative 3: Combine multiple maps into one.

If you have multiple maps with the same keys, and cannot just turn the values into new properties of the key objects as suggested above, consider grouping them into a single metadata class and creating a single map from the keys into objects of that class. For example, instead of a Map<Item, Double> accessFrequency and a Map<Item, Long> creationTime, consider defining a single metadata class like:

class ItemMetadata {
    double accessFrequency;
    long creationTime;
}

and having a single Map<Item, ItemMetadata> to store all the metadata values. This is more memory-efficient than having multiple maps, and may also save time by avoiding redundant map lookups.

In some cases, for convenience, you may also wish to include in each metadata object a reference to its corresponding main object, so that you can access both through a single reference to the metadata object. Which naturally segues into...

Alternative 4: Use a decorator.

As a combination of the previous two alternatives, if you cannot directly add extra metadata properties into the key objects, consider instead wrapping them with decorators that can hold the extra values. Thus, for example, instead of directly creating your own point class with extra properties, you could simply do something like:

class PointWrapper {
    Point2D point;
    boolean onSea;
    // ...
}

If you like, you can even turn this wrapper into a full-blown decorator by implementing method forwarding, but even just a simple "dumb" wrapper may be sufficient for many purposes.

This approach is most useful if you can then arrange to store and work with just the wrappers, so that you never need to look up the wrapper corresponding to an unwrapped object. Of course, if you do need to do that occasionally (e.g. because you're only receiving the unwrapped objects from other code), then you can do that with a single Map<Point2D, PointWrapper>, but then you're effectively back at the previous alternative.




回答2:


Eclipse Collections has object and primitive maps and has Mutable and Immutable versions for both.

MutableObjectDoubleMap<String> doubleMap = ObjectDoubleMaps.mutable.empty();
doubleMap.put("1", 1.0d);
doubleMap.put("2", 2.0d);

MutableObjectBooleanMap<String> booleanMap = ObjectBooleanMaps.mutable.empty();
booleanMap.put("ok", true);

ImmutableObjectDoubleMap<String> immutableMap = doubleMap.toImmutable();
Assert.assertEquals(doubleMap, immutableMap);

A MutableMap can be used as a factory for an ImmutableMap in Eclipse Collections by calling toImmutable as I have done in the example above. Both mutable and immutable maps share a common parent interface, which in the case of the MutableObjectDoubleMap and ImmutableObjectDoubleMap above, is named ObjectDoubleMap.

Eclipse Collections also has synchronized and unmodifiable versions for all mutable containers in the library. The following code will give you a synchronized view wrapped around the primitive maps.

MutableObjectDoubleMap<String> doubleMap = 
        ObjectDoubleMaps.mutable.<String>empty().asSynchronized();
doubleMap.put("1", 1.0d);
doubleMap.put("2", 2.0d);

MutableObjectBooleanMap<String> booleanMap = 
        ObjectBooleanMaps.mutable.<String>empty().asSynchronized();
booleanMap.put("ok", true);

This performance comparison of large maps was published a couple of years ago.

Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

GS Collections has since been migrated to the Eclipse Foundation and is now Eclipse Collections.

Note: I am a committer for Eclipse Collections.




回答3:


What you are looking for is a Object2DoubleOpenHashMap from fastutil (Collections Framework with a small memory footprint and fast access and insertion) which provides methods of type double getDouble(Object k) and double put(K k, double v).

For example:

// Create a Object2DoubleOpenHashMap instance
Object2DoubleMap<String> map = new Object2DoubleOpenHashMap<>();
// Put a new entry
map.put("foo", 12.50d);
// Access to the entry
double value = map.getDouble("foo");

The class Object2DoubleOpenHashMap is a real implementation of a Map that is not thread-safe, however you can still use the utility method Object2DoubleMaps.synchronize(Object2DoubleMap<K> m) to make it thread-safe thanks to a decorator.

The creation code would then be:

// Create a thread safe Object2DoubleMap
Object2DoubleMap<String> map =  Object2DoubleMaps.synchronize(
    new Object2DoubleOpenHashMap<>()
);



回答4:


There are several implementations:

  • http://labs.carrotsearch.com/hppc.html
  • http://trove.starlight-systems.com/

Here are questions related to best performance:

  • Most efficient Java primitive collections library
  • What is a fast alternative to HashMap for mapping to primitive types?

The actual implementation can also influence performance

  • Difference between HashMap, LinkedHashMap and TreeMap



回答5:


To have a better estimate of how these various libraries stack up against each other, I put together a small benchmark that checks the performance of:

  • total time for 300'000 insertions
  • average time for a check of containments with 1000 samples that are in the map
  • Memory size of the data structure I had a look at the Map-like structure which takes a String as key and double as value. The checked frameworks are Eclipse Collection, HPPC, Trove and FastUtil, as well as for comparison the HashMap and ConcurrentHashMap.

In short, these are the results:

Filling in 300000 into the JDK HashMap took 107ms
Filling in 300000 into the JDK ConcurrentHashMap took 152ms
Filling in 300000 into the Eclipse map took 107ms
Filling in 300000 into the Trove map took 855ms
Filling in 300000 into the HPPC map took 93ms
Filling in 300000 into the FastUtil map took 163ms
1000 lookups average in JDK HashMap took: 550ns
1000 lookups average in JDK Concurrent HashMap took: 748ns
1000 lookups average in Eclipse Map took: 894ns
1000 lookups average in Trove Map took: 1033ns
1000 lookups average in HPPC Map took: 523ns
1000 lookups average in FastUtil Map took: 680ns
JDK HashMap:            43'809'895B
JDK Concurrent HashMap: 43'653'740B => save  0.36%
Eclipse Map:            35'755'084B => save 18.39%
Trove Map:              32'147'798B => save 26.62%
HPPC Map:               27'366'533B => save 37.53%
FastUtil Map:           31'560'889B => save 27.96%

For all the details, as well as the test application take a look at my blog entry.



来源:https://stackoverflow.com/questions/41726286/map-alternative-for-primitive-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!