Best way to synchronize infrequently updated hashmap

旧城冷巷雨未停 提交于 2020-01-04 03:11:46

问题


I have a HashMap that we use in our application. The data is populated from database during application initial load and then it's always just read and never updated. There will be multiple threads constantly reading the data. Since the data is never updated, we currently don't use any synchronization and is using just the HashMap. The way we define this now is:

private volatile Map<Integer, MyData> myMap = new HashMap<>();

Now we want to update this data in map once a day by re-populating from the database. What I'm planning to do is get the data from database to a local map say myLocalMap say everyday at midnight. Once I have the data loaded from DB to myLocalMap I will just swap myMap to point to this.

So my concern is, at the point where I do myMap = myLocalMap, is there a possibility that some other thread that is reading data from myMap get an empty or unexpected result? If yes, I will have to synchronize myMap. For synchronization I have the below options:

synchronized(myMap) {} OR // synchronize all map get and update operations
ConcurrentHashMap OR
Collections.synchronizedMap(myMap)

But I'm hesitant to use synchronization because then I'm synchronizing all the reads too. I think synchronizing the map for once a day process of refreshing will impact all reads of the map that constantly occurs all through the day. This is especially bad since I have many maps in my application that are read and updated this way. Any thought/comments? Thanks!


回答1:


At the point where I do myMap = myLocalMap, is there a possibility that some other thread that is reading data from myMap get an empty or unexpected result?

No there is not. Reads and writes are atomic for reference variables which means the entire operation happens all at once and the results aren't visible to other threads until the entire operation is finished. Therefore, any thread reading from 'myMap' will get either the old myMap or the new myMap, but will never get an empty or inconsistent result. Additionally, using the volatile keyword on 'myMap' would mean that all threads will always be aware of fresh data: if myMap has been updated, any read operations that were initiated after the update operation started will use that updated value.

Supporting documentation from Oracle's Java tutorial:

  • Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).
  • any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable

Vogella:

If a variable is declared with the volatile keyword then it is guaranteed that any thread that reads the field will see the most recently written value.

Also from the same article on Vogella:

The Java language specification guarantees that reading or writing a variable is an atomic operation

Also see this reference, specifically "Listing 3. Using a volatile variable for safe one-time publication" which describes a scenario very similar to yours.

I agree with Giovanni about ConcurrentHashMap by the way. But in your case you don't need to use ConcurrentHashMap since all of your updates occur in a single transaction and you are just adjusting the Map to point to the new data.




回答2:


Use ConcurrentHashMap. From the javadoc:

[...] retrieval operations do not entail locking [...]

EDIT: As per the comment, since your map is effectively immutable, you might as well use Guava's ImmutableMap:

private volatile Map<K,V> map = ImmutableMap.of();

The above creates an empty immutable map which is read-only and thread safe (clients can't accidentally modify it with the risk of ConcurrentModificationExceptions).

When you need to repopulate the map from the database you just build a new one:

ImmutableMap.Builder<K,V> builder = ImmutableMap.builder();
// add your k,v pairs to the builder
builder.put(foo,bar);
// now swap the map with the new one:
map = builder.build();

As @KyleM noted, having the map declared volatile means that the above assignment will be atomic, i.e., once completed, all clients will see the new map. So this should solve all your concurrency concerns in two simple steps.

Note that this pattern is somewhat similar to having an immutable collection stored in a var in Scala.

As a side note, you should check out Guava's caches: they might solve even more problems for you.




回答3:


I agree with Giovanni, unless you want to lock out reads while you're updating your hash? Then you might have to do some custom synchronization.

I would guess you wouldn't want a read to happen mid-update, but i'm not sure.




回答4:


If you are just swapping the attribute myMap with an already initialized map and the volatile keyword, you do not need any kind of synchronization

The ConcurrentHashMap approach would be good if you need to update some values regularly.



来源:https://stackoverflow.com/questions/22562165/best-way-to-synchronize-infrequently-updated-hashmap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!