Caching is awesome. Instead of repeating an expensive operation, just reuse the result from last time! But properly using cache can be tricky… But there are a couple of things I’d like to not have to deal with:

Stampede protection

Cache is essential if you have a high-traffic application. But unless there’s a steady stream of predictable traffic, you risk having no data in cache at the time you need it most: when traffic spikes.

When all of a sudden, there are a lot of simultaneous requests for data that is not in cache, the expensive operation that generates that data will be executed a lot of times, all at once. Given enough traffic, the application server likely won’t be able to handle that and crash.

It should be possible for the very first request to signal the others that it’s already computing the expensive code. It should tell those others to not do that as well, but just sit tight and wait for the result. Not every process should have to compute the same result that the first one is already working on…

Repeat requests

Given a sufficiently large application, chances are you’re going to fetch the same value multiple times, because you just happen to need it in multiple places throughout your app. Since your key-value store is likely on another server, you’re probably losing a valuable millisecond for every repeat request.

Once a result is fetched from cache, it could just be kept in memory. When you later attempt to fetch the same key in the same request, it could then just be served from memory. Or when you store a new value & need it again later, that too could be duplicated in memory. Just make sure to evict data from memory before it fills up completely and causes your app to crash.

Transactions

A lot of the data stored to cache are denormalizations of data that’s also in the primary data store (e.g. database) or can be derived from it. That may touch updates in multiple tables or happen all throughout the code. To ensure data integrity in the database, I can use transactions, but where does that leave my cache?

Similar to buffered (in-memory) repeat cache requests, we should be able to “buffer” writes and store them all at once, once we’re ready to do so. Instead of immediately storing data to cache, it could be delayed until I want to commit it. Meanwhile we could also write it to memory so that value can be reused already, even though it’s not yet persisted to real cache. That would even allow for some optimizations, like not storing data to a key we’re going to delete again in that same request. Or combining multiple separate set operations into one setMulti.

But atomicity is the tricky part here. If one of the delayed operations fail, the rest should not go through and what has already happened should be restored to what it was. This could be accomplished by ordering the operations in a thoughtful manner: first execute the “conditional” operations (e.g. replace can only happen if a value already existed in cache), then do the ones where no failure is expected regardless of what may have happened to cache before we committed (e.g. delete will always succeed).

And to ensure we can actually recover if one of the conditional operations fail, we should first retrieve their current value. We can them cas (check-and-set) them back should they go wrong! The only problem here can be restoring expiration time, which most key-value stores won’t let you fetch.

Compatibility

Ideally, there would be just 1 API for a variety of cache backends. Open source projects will want to support multiple platforms. Or maybe you’re running a cluster of Redis servers in productions but you’d like to use a memory-based cache to run your tests?

The PHP FIG has recently settled on php/cache 1.0, an interface to implement if you’d like to be able to use a wide variety of cache backends. It’s doesn’t offer much beyond the basic get & set, but it’ll do for most cases!

I’m sure many psr/cache implementations will emerge, but there are a couple already. There’s one that I’ve been working on: Scrapbook. It comes with adapters for Memcached, Redis, Couchbase, APC, MySQL, PostgreSQL, SQLite, filesystem (via league/flysystem), and an in-memory cache (very useful for testing). And it has all of the aforementioned goodies: stampede protection, buffered cache & (nested) transactions. All of it behind a simple Memcached-like API (for maximum features) or psr/cache/psr/simplecache (for maximum compatibility).

Matthias Mullie

Cache! A-ah. Saviour of the Universe

Stampede protection

Repeat requests

Transactions

Compatibility

Projects

Scrapbook: cache environment

Post to email

Minify

Top Posts

Measuring software complexity

Geographic searches within a certain distance

Regular expressions for pros