Coding, Page 4

GCC 4.3, C++0x preview

GCC 4.3 came out a couple weeks ago, and I finally got time to give its experimental C++0x support a go. Specifically, I was interested in two features of it: variadic templates and rvalue references.

There is one prime example of what these two features are awesome for: perfect forwarding. Take a memory pool. You might have something like this:

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }
};

But that is hardly satisfactory. What if you want to construct T with an argument?

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }

   template<typename T, typename ArgT>
   T* construct(const ArgT &arg) { return new(alloc()) T(arg); }
};

So we add a new function to handle passing an arg. Better, but still not very great. What if you want to pass it multiple arguments?

C++ has very few problem that can’t be worked around in a relatively straitforward way. Unfortunately, this is one of those problems. The current solution most library developers employ will involve some really nasty preprocessor hacks to generate separate functions for 1, 2, 3, up to maybe 10 or 15 arguments. So, how do we solve this?

Enter variadic templates, the new C++ feature built specifically to solve this. Here is an updated pool class that takes any number of arguments:

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(const Args&... args) { return new(alloc()) T(args...); }
};

Pretty simple! Those ellipses will expand into zero or more args. Great – we’re almost there. But we still have a problem here: what happens if the constructor for T takes some arguments as non-const references? This construct function will try to pass them as const references, resulting in a compile error. We can’t have it pass args as non-const references, because then if you pass it an rvalue—such as a temporary—it will generate another compile error as rvalues can only be bound to const references.

This is where the second part of our pool upgrades come in: rvalue references.

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(Args&&... args) {
      return new(alloc()) T(std::forward(args)...);
   }
};

We’ve finally got our solution. That double-reference looking thing is the new syntax for rvalue references. This construct implements perfect forwarding: calling construct<foo>(a, b, c, d) will behave exactly as if we had called the constructor directly via new(alloc()) T(a, b, c, d).

This works because Args is a templated type that will resolve to references and const references if it needs to. One problem I have yet to figure out how to solve is a constructor where you know the type you want, and want to accept any reference type:

struct foo {
   foo(const bar &b) : m_b(b) {}
   foo(bar &&b) : m_b(std::move(b)) {}

   bar m_b;
};

I don’t care if b is a lvalue or rvalue reference: I just want the construction of m_b to be as efficient as possible so that it can use move semantics when you pass it an rvalue. So far the only way I can find to do it is with two separate constructors, which could mean a lot of code duplication on something less trivial than this example.

Scalability isn’t everything

In the beginning, you write threaded apps with great ignorance to scalability. That’s usually okay — most apps don’t need it, but sooner or later you will come across a problem that demands it. With enough searching, you will come across lock–free algorithms. Tricky to get right, but promising fantastic scalability if you do.

Even trickier, though, is knowing when to not use them. Lock–free algorithms come with a price: although they are indeed very scalable, their performance can be much worse than a well designed algorithm for single–threaded applications. Do a little benchmarking and you might find something surprising: the performance hit can actually be so large that a simple locked single–threaded algorithm with no scalability will give better overall performance than a 100% scalable lock–free version.

This is more common than you might think. Take a queue. A single–threaded version will typically have very minimal memory overhead: maybe a pointer for every n objects. A lock–free version will need two pointers for every object (or one, if you use a GC). Now the amount of overhead greatly depends on what your object is. If your object is large, a lock–free queue will probably be a better choice. But if your object is small—say one or two pointers—the overhead can be enough that cache misses will significantly affect your application.

I recently had to tackle this problem. My application needed a queue of small objects, and on a modern quad–core CPU the cache misses were hurting performance so much that although a lock–free queue did have near 100% scalability, the overall operation was completing 165% faster with a locked queue with zero scalability.

The next best thing is to combines the best of both worlds: design a queue with low overhead and medium scalability. Using a reader–writer lock with a combination of lock–free operations, I came up with a queue that only needs to do a full lock once every 32 or 64 operations. The result? Scalability 5% lower than a lock–free queue, with overall performance 210% better.

OK, I’ll admit it: I cheated, somewhat. Lock–free algorithms are good for more than just scalability. They also offer immunity to nasty effects like deadlock, livelock, and priority inversion. In my case I wasn’t in a situation to worry about these, but you might be. The lesson here is to know your situation and decide carefully, and don’t trust what others tell you: always try things yourself and profile.

Digging into TR1

Channel 9 has an interview with Stephan T. Lavavej of the Visual C++ team showing off some of the new features in TR1, the new draft standard library additions for C++0x. While the interview will probably have nothing new for competent C++ developers, Stephan does go into good detail explaining the unique strengths of C++ that newbies may not immediately see.

If you’ve thought of C++ but have been scared by its complexity, or have been tempted by newer garbage collected languages like C#, this will probably be a good eye opener.

strncpy is not your friend

Being in IRC, every so often you will find someone heralding the use of strncpy for writing secure code. A lot of the time they are just going off what others have said, and can’t even tell you what strncpy really does. strncpy is a problem for two reasons:

Bugs happen. Sometimes we build sanity checks into programs to combat unknown ones before they become a problem. But strncpy is not a sanity check or security feature—using it instead of resizing a buffer to accommodate the data, or just outright rejecting the data if it gets too big is a bug.

C++ sucks less than you think it does

C++ seems to be the quickest language to get a knee-jerk “it sucks!” reaction out of people. The common reasons they list:

C++ is a very powerful, very complex language. Being a multi-paradigm (procedural-, functional-, object-oriented–, and meta-programming) language that implores the coder to always use the best tool for the job, C++ forces you to think differently than other popular languages and will take the average coder years of working with it before they start to get truly good at it.

C++ does have its flaws. Some are fixable, some aren’t. Most of what is fixable is being addressed in a new standard due some time next year (2009).

Its biggest problem is that newcomers tend to only see its complexity and syntax, not its power. The primary reason for this is education. In introductory and advanced courses, students are taught an underwhelming amount of templates without any of the useful practices that can make them so epically powerful. Use of the interesting parts of the standard library, such as iterators and functional programming, are put aside in favor of object-oriented design and basic data structures which, while useful, is only a small part of what C++ is about. RAII, an easy zero-cost design pattern that makes resource leaks almost impossible, is virtually untaught.

C++ does tend to produce larger executables. This is a trade off – do you want smaller executables or faster code? Whereas in C you might use callbacks for something (as shown by the qsort and bsearch functions in the standard library) and produce less code, in C++ everything is specialized with a template that gives improved performance. This follows C++’s “don’t pay for what you don’t use” philosophy.

Don’t get me wrong, C++ is not the right tool for all jobs. But among all the languages I have used, C++ stands out from the crowd as one that is almost never a bad choice, and a lot of times is the best choice. It might take longer to get good at than most other languages, but once you’ve got it down its power is hard to match.

Visual C++ 2008 Feature Pack beta

It’s here! A beta of the promised TR1 library update has been put up for download.

Included in the pack is an update to MFC that adds a number of Office-style controls. Wish they’d put these out as plain Win32 controls, because I’ve got no intention of using MFC!

Writing a good parser

From a simple binary protocol used over sockets to a complex XML document, many applications depend on parsing. Why, then, do the great majority of parsers out there just plain suck?

I’ve come to believe a good parser should have two properties:

Several parsers meet the first requirement, but almost none meet the second: they expect their input to come in without a break in execution. So how do you accomplish this? Lately I’ve been employing this fairly simple design:

struct buffer {
  struct buffer *next;
  char *buf;
  size_t len;
};

struct parser {
  struct buffer *input;
  struct buffer *lastinput;
  struct buffer *output;
  int (*func)(struct parser*, struct *state);
};

enum {
  CONTINUE,
  NEEDMORE,
  GOTFOO,
  GOTBAR
};

int parse(struct parser *p) {
  int ret;
  while((ret = p->func(p)) == CONTINUE);

  return ret;
}

The idea should be easy to understand:

  1. Add buffer(s) to input queue.
  2. If parse() returns NEEDMORE, add more input to the queue and call it again.
  3. If parse() returns GOTFOO or GOTBAR, state is filled with data.
  4. The function pointer is continually updated with a parser specialized for the current data: in this case, a FOO or a BAR, or even just bits and pieces of a FOO or a BAR. It returns CONTINUE if parse() should just call a new function pointer.
  5. As the parser function pointers eat at the input queue, put used buffers into the output stack.

Other than meeting my two requirements above, the best thing about this design? It doesn’t sacrifice cleanliness, and it won’t cause your code size to increase.

MSDN Content Service

Microsoft just put out a neat web service that lets you query all the content of MSDN and TechNet. I can see it now… code browsers that have rollover documentation on WinAPI functions.

C++0x work progressing

A bit delayed, but I just found the results of the October 2007 C++ meeting. In it, they voted in several really nice things for the C++0x draft:

Working with cryptography; turns out it’s not so simple

While coming up with a new list format for PeerGuardian 3, I decided it should have built in digital signatures, so everyone getting lists can verify the integrity and who the list came from.

Although I’ve used crypto systems like GPG before and understood the basics of it, I’d never implemented one myself. So after much research, I decided on LibTomCrypt due to its simple API, stellar documentation, and support for modern algorithms like AES and ECC. Being entirely in the public domain is a good perk, too.

The first iteration is a very basic public key system. After further reading, I’ve decided it would be useful to implement a full public key infrastructure – that is, signed keys and possibility of revocation. This allows Phoenix Labs (or anyone else) to sign other public keys to verify they’re legit and trustworthy, and later revoke the key if something happens with it (such as the private key being leaked).

All in all, it’s turning out to be a lot more work than I expected, but I don’t mind – it’s something new and interesting, which seems to happen less and less these days.