I'm attempting to create a ConcurrentHashMap
that supports "snapshots" in order to provide consistent iterators, and am wondering if there's a more efficient way to do this. The problem is that if two iterators are created at the same time then they need to read the same values, and the definition of the concurrent hash map's weakly consistent iterators does not guarantee this to be the case. I'd also like to avoid locks if possible: there are several thousand values in the map and processing each item takes several dozen milliseconds, and I don't want to have to block writers during this time as this could result in writers blocking for a minute or longer.
What I have so far:
- The
ConcurrentHashMap's
keys are Strings, and its values are instances ofConcurrentSkipListMap<Long, T>
- When an element is added to the hashmap with
putIfAbsent
, then a new skiplist is allocated, and the object is added viaskipList.put(System.nanoTime(), t)
. - To query the map, I use
map.get(key).lastEntry().getValue()
to return the most recent value. To query a snapshot (e.g. with an iterator), I usemap.get(key).lowerEntry(iteratorTimestamp).getValue()
, whereiteratorTimestamp
is the result ofSystem.nanoTime()
called when the iterator was initialized. - If an object is deleted, I use
map.get(key).put(timestamp, SnapShotMap.DELETED)
, where DELETED is a static final object.
Questions:
- Is there a library that already implements this? Or barring that, is there a data structure that would be more appropriate than the
ConcurrentHashMap
and theConcurrentSkipListMap
? My keys are comparable, so maybe some sort of concurrent tree would better support snapshots than a concurrent hash table. How do I prevent this thing from continually growing? I can delete all of the skip list entries with keys less than X (except for the last key in the map) after all iterators that were initialized on or before X have completed, but I don't know of a good way to determine when this has happened: I can flag that an iterator has completed when its
hasNext
method returns false, but not all iterators are necessarily going to run to completion; I can keep aWeakReference
to an iterator so that I can detect when it's been garbage collected, but I can't think of a good way to detect this other than by using a thread that iterates through the collection of weak references and then sleeps for several minutes - ideally the thread would block on theWeakReference
and be notified when the wrapped reference is GC'd, but I don't think this is an option.ConcurrentSkipListMap<Long, WeakReference<Iterator>> iteratorMap; while(true) { long latestGC = 0; for(Map.Entry<Long, WeakReference<Iterator>> entry : iteratorMap.entrySet()) { if(entry.getValue().get() == null) { iteratorMap.remove(entry.getKey()); latestGC = entry.getKey(); } else break; } // remove ConcurrentHashMap entries with timestamps less than `latestGC` Thread.sleep(300000); // five minutes }
Edit: To clear up some confusion in the answers and comments, I'm currently passing weakly consistent iterators to code written by another division in the company, and they have asked me to increase the strength of the iterators' consistency. They are already aware of the fact that it is infeasible for me to make 100% consistent iterators, they just want a best effort on my part. They care more about throughput than iterator consistency, so coarse-grained locks are not an option.
See Question&Answers more detail:os