March 2008 Archives

Life in Windows 2008

Given that I got a free copy of Windows Server 2008 at the big launch, I thought I’d try it out on my desktop. It and Vista are basically the same OS, just different artificial limits. Following Vijayshinva Karnure’s directions, I had a fully functional desktop in no time.

I’ve only run into a couple minor issues. Half-Life 2 doesn’t seem to have a proper codec to play their intro video, and the “Windows Live Messenger Download” link in the start menu takes you to an installer that refuses to run on a server OS.

The internationalization test

When choosing a development framework there are many things to look for, but the first thing I notice is solid internationalization support.

Does that seem weird to anyone? That is exactly why. Internationalization is extremely hard to get right, and is usually the last thing on anyone’s mind. If a framework offers fully integrated internationalization, there is a good chance it will be well thought out in other areas and be worth my time to use.

GCC 4.3, C++0x preview

GCC 4.3 came out a couple weeks ago, and I finally got time to give its experimental C++0x support a go. Specifically, I was interested in two features of it: variadic templates and rvalue references.

There is one prime example of what these two features are awesome for: perfect forwarding. Take a memory pool. You might have something like this:

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }
};

But that is hardly satisfactory. What if you want to construct T with an argument?

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }

   template<typename T, typename ArgT>
   T* construct(const ArgT &arg) { return new(alloc()) T(arg); }
};

So we add a new function to handle passing an arg. Better, but still not very great. What if you want to pass it multiple arguments?

C++ has very few problem that can’t be worked around in a relatively straitforward way. Unfortunately, this is one of those problems. The current solution most library developers employ will involve some really nasty preprocessor hacks to generate separate functions for 1, 2, 3, up to maybe 10 or 15 arguments. So, how do we solve this?

Enter variadic templates, the new C++ feature built specifically to solve this. Here is an updated pool class that takes any number of arguments:

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(const Args&... args) { return new(alloc()) T(args...); }
};

Pretty simple! Those ellipses will expand into zero or more args. Great – we’re almost there. But we still have a problem here: what happens if the constructor for T takes some arguments as non-const references? This construct function will try to pass them as const references, resulting in a compile error. We can’t have it pass args as non-const references, because then if you pass it an rvalue—such as a temporary—it will generate another compile error as rvalues can only be bound to const references.

This is where the second part of our pool upgrades come in: rvalue references.

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(Args&&... args) {
      return new(alloc()) T(std::forward(args)...);
   }
};

We’ve finally got our solution. That double-reference looking thing is the new syntax for rvalue references. This construct implements perfect forwarding: calling construct<foo>(a, b, c, d) will behave exactly as if we had called the constructor directly via new(alloc()) T(a, b, c, d).

This works because Args is a templated type that will resolve to references and const references if it needs to. One problem I have yet to figure out how to solve is a constructor where you know the type you want, and want to accept any reference type:

struct foo {
   foo(const bar &b) : m_b(b) {}
   foo(bar &&b) : m_b(std::move(b)) {}

   bar m_b;
};

I don’t care if b is a lvalue or rvalue reference: I just want the construction of m_b to be as efficient as possible so that it can use move semantics when you pass it an rvalue. So far the only way I can find to do it is with two separate constructors, which could mean a lot of code duplication on something less trivial than this example.

Scalability isn’t everything

In the beginning, you write threaded apps with great ignorance to scalability. That’s usually okay — most apps don’t need it, but sooner or later you will come across a problem that demands it. With enough searching, you will come across lock–free algorithms. Tricky to get right, but promising fantastic scalability if you do.

Even trickier, though, is knowing when to not use them. Lock–free algorithms come with a price: although they are indeed very scalable, their performance can be much worse than a well designed algorithm for single–threaded applications. Do a little benchmarking and you might find something surprising: the performance hit can actually be so large that a simple locked single–threaded algorithm with no scalability will give better overall performance than a 100% scalable lock–free version.

This is more common than you might think. Take a queue. A single–threaded version will typically have very minimal memory overhead: maybe a pointer for every n objects. A lock–free version will need two pointers for every object (or one, if you use a GC). Now the amount of overhead greatly depends on what your object is. If your object is large, a lock–free queue will probably be a better choice. But if your object is small—say one or two pointers—the overhead can be enough that cache misses will significantly affect your application.

I recently had to tackle this problem. My application needed a queue of small objects, and on a modern quad–core CPU the cache misses were hurting performance so much that although a lock–free queue did have near 100% scalability, the overall operation was completing 165% faster with a locked queue with zero scalability.

The next best thing is to combines the best of both worlds: design a queue with low overhead and medium scalability. Using a reader–writer lock with a combination of lock–free operations, I came up with a queue that only needs to do a full lock once every 32 or 64 operations. The result? Scalability 5% lower than a lock–free queue, with overall performance 210% better.

OK, I’ll admit it: I cheated, somewhat. Lock–free algorithms are good for more than just scalability. They also offer immunity to nasty effects like deadlock, livelock, and priority inversion. In my case I wasn’t in a situation to worry about these, but you might be. The lesson here is to know your situation and decide carefully, and don’t trust what others tell you: always try things yourself and profile.

Digging into TR1

Channel 9 has an interview with Stephan T. Lavavej of the Visual C++ team showing off some of the new features in TR1, the new draft standard library additions for C++0x. While the interview will probably have nothing new for competent C++ developers, Stephan does go into good detail explaining the unique strengths of C++ that newbies may not immediately see.

If you’ve thought of C++ but have been scared by its complexity, or have been tempted by newer garbage collected languages like C#, this will probably be a good eye opener.