Background:
I maintain several Winforms apps and class libraries that either could or already do benefit from caching. I'm also aware of the Caching Application Block and the System.Web.Caching namespace (which, from what I've gathered, is perfectly OK to use outside ASP.NET).
I've found that, although both of the above classes are technically "thread safe" in the sense that individual methods are synchronized, they don't really seem to be designed particularly well for multi-threaded scenarios. Specifically, they don't implement a GetOrAdd
method similar to the one in the new ConcurrentDictionary
class in .NET 4.0.
I consider such a method to be a primitive for caching/lookup functionality, and obviously the Framework designers realized this too - that's why the methods exist in the concurrent collections. However, aside from the fact that I'm not using .NET 4.0 in production apps yet, a dictionary is not a full-fledged cache - it doesn't have features like expirations, persistent/distributed storage, etc.
Why this is important:
A fairly typical design in a "rich client" app (or even some web apps) is to start pre-loading a cache as soon as the app starts, blocking if the client requests data that is not yet loaded (subsequently caching it for future use). If the user is plowing through his workflow quickly, or if the network connection is slow, it's not unusual at all for the client to be competing with the preloader, and it really doesn't make a lot of sense to request the same data twice, especially if the request is relatively expensive.
So I seem to be left with a few equally lousy options:
Don't try to make the operation atomic at all, and risk the data being loaded twice (and possibly have two different threads operating on different copies);
Serialize access to the cache, which means locking the entire cache just to load a single item;
Start reinventing the wheel just to get a few extra methods.
Clarification: Example Timeline
Say that when an app starts, it needs to load 3 datasets which each take 10 seconds to load. Consider the following two timelines:
00:00 - Start loading Dataset 1 00:10 - Start loading Dataset 2 00:19 - User asks for Dataset 2
In the above case, if we don't use any kind of synchronization, the user has to wait a full 10 seconds for data that will be available in 1 second, because the code will see that the item is not yet loaded into the cache and try to reload it.
00:00 - Start loading Dataset 1 00:10 - Start loading Dataset 2 00:11 - User asks for Dataset 1
In this case, the user is asking for data that's already in the cache. But if we serialize access to the cache, he'll have to wait another 9 seconds for no reason at all, because the cache manager (whatever that is) has no awareness of the specific item being asked for, only that "something" is being requested and "something" is in progress.
The Question:
Are there any caching libraries for .NET (pre-4.0) that do implement such atomic operations, as one might expect from a thread-safe cache?
Or, alternatively, is there some means to extend an existing "thread-safe" cache to support such operations, without serializing access to the cache (which would defeat the purpose of using a thread-safe implementation in the first place)? I doubt that there is, but maybe I'm just tired and ignoring an obvious workaround.
Or... is there something else I'm missing? Is it just standard practice to let two competing threads steamroll each other if they happen to both be requesting the same item, at the same time, for the first time or after an expiration?
See Question&Answers more detail:os