Lock-free

Scalability isn’t everything

In the beginning, you write threaded apps with great ignorance to scalability. That’s usually okay — most apps don’t need it, but sooner or later you will come across a problem that demands it. With enough searching, you will come across lock–free algorithms. Tricky to get right, but promising fantastic scalability if you do.

Even trickier, though, is knowing when to not use them. Lock–free algorithms come with a price: although they are indeed very scalable, their performance can be much worse than a well designed algorithm for single–threaded applications. Do a little benchmarking and you might find something surprising: the performance hit can actually be so large that a simple locked single–threaded algorithm with no scalability will give better overall performance than a 100% scalable lock–free version.

This is more common than you might think. Take a queue. A single–threaded version will typically have very minimal memory overhead: maybe a pointer for every n objects. A lock–free version will need two pointers for every object (or one, if you use a GC). Now the amount of overhead greatly depends on what your object is. If your object is large, a lock–free queue will probably be a better choice. But if your object is small—say one or two pointers—the overhead can be enough that cache misses will significantly affect your application.

I recently had to tackle this problem. My application needed a queue of small objects, and on a modern quad–core CPU the cache misses were hurting performance so much that although a lock–free queue did have near 100% scalability, the overall operation was completing 165% faster with a locked queue with zero scalability.

The next best thing is to combines the best of both worlds: design a queue with low overhead and medium scalability. Using a reader–writer lock with a combination of lock–free operations, I came up with a queue that only needs to do a full lock once every 32 or 64 operations. The result? Scalability 5% lower than a lock–free queue, with overall performance 210% better.

OK, I’ll admit it: I cheated, somewhat. Lock–free algorithms are good for more than just scalability. They also offer immunity to nasty effects like deadlock, livelock, and priority inversion. In my case I wasn’t in a situation to worry about these, but you might be. The lesson here is to know your situation and decide carefully, and don’t trust what others tell you: always try things yourself and profile.