Single digit response times

Single digit response times are fully possible. And I think we should try to get there, it may not always be possible but that's no reason not to explore our options. A fast site will feel more responsive, and using less resources means that you can handle massive load on less resources.

Caching content is perhaps the single most effective measure. Living in a web world we have three main ways to go

Http caching (think RFC7234 and headers such as max-age, etags and If-Modified-Since)
Output cache, either built-in or products like Varnish
Application cache.

In this post we're going to look at the last of these, the one closest to your stack. This is where you end up when you need to alter the response before you emit data from the server.

Starting out with a small solution, often hosted on a single server, the application cache normally lives in-memory and the objects are kept as is. It's fast and relatively easy, the memory management is often taken care of for you.

After a while your application grows, you have multiple servers and having local caches may no longer be optimal. Problems such as cache sync arise - serving the same user different data just because the user ended up on a different server makes your application feel britle and awkward. Enter the remote cache. These days often implemented using Redis (Redis is really on a roll with both AWS and Microsoft offering Redis as a service).

So, what happens when you switch from a local cache to a remote cache? You get two penalties

Network IO
Serialization

Redis is fast, we're talking about 6-8ms response times with network IO and serialization, but when your goal is single digit response times, that's too much.

So, how can we have both? A layered cache, using both a remote cache and a local cache. This also plays well with this cloud thing where instances pop up when needed and instance memory is not really expensive.

How much memory do you really need to keep your entire site in memory?

To get this to play well we also need to

Keep our local caches in sync
A fast serialization when we hit the remote cache

To keep our local caches in sync we need a pub/sub mechanism, Redis has this built in and makes the implementation quite trivial. It's not a reliable sync, i.e. the messages are published, if a client is disconnected for some reason, it will not get the updated. I find that this is something we normally can live with.

The serialization part of it is also solved for us, there are several fast serialization procotols available to us, such as MsgPack, Protobuf, Bond or Avro. You will find that these protocols have their pros and cons, In .Net land MsgPack does not require class markup, but cannot cope with inheritance. Protobuf can, but requires a serialization definition. Yan Cui has a nice benchmark over at http://theburningmonk.com/benchmarks/.

What ever you do, BinaryFormatter is slow.

So, by creating a layered distributed cache we can achieve fast response times and a synchronized cache. As an added benefit we can use the pub/sub mechanism to push data changes to the cache from other services. This comes in real handy in the microservices world.

DoubleCache is my own implementation of a layered distributed cache for .Net. Using Redis as the remote storage, and System.Runtime.MemoryCache (can easily be switched to HttpCache). I've tried to stick to a single interface, making it possible to switch between a local cache, remote cache, local and remote, or a synced local and remote cache. The implementation follows the cache-aside and decorator patterns.