January 2008 Archives

strncpy is not your friend

Being in IRC, every so often you will find someone heralding the use of strncpy for writing secure code. A lot of the time they are just going off what others have said, and can’t even tell you what strncpy really does. strncpy is a problem for two reasons:

Bugs happen. Sometimes we build sanity checks into programs to combat unknown ones before they become a problem. But strncpy is not a sanity check or security feature—using it instead of resizing a buffer to accommodate the data, or just outright rejecting the data if it gets too big is a bug.

C++ sucks less than you think it does

C++ seems to be the quickest language to get a knee-jerk “it sucks!” reaction out of people. The common reasons they list:

C++ is a very powerful, very complex language. Being a multi-paradigm (procedural-, functional-, object-oriented–, and meta-programming) language that implores the coder to always use the best tool for the job, C++ forces you to think differently than other popular languages and will take the average coder years of working with it before they start to get truly good at it.

C++ does have its flaws. Some are fixable, some aren’t. Most of what is fixable is being addressed in a new standard due some time next year (2009).

Its biggest problem is that newcomers tend to only see its complexity and syntax, not its power. The primary reason for this is education. In introductory and advanced courses, students are taught an underwhelming amount of templates without any of the useful practices that can make them so epically powerful. Use of the interesting parts of the standard library, such as iterators and functional programming, are put aside in favor of object-oriented design and basic data structures which, while useful, is only a small part of what C++ is about. RAII, an easy zero-cost design pattern that makes resource leaks almost impossible, is virtually untaught.

C++ does tend to produce larger executables. This is a trade off – do you want smaller executables or faster code? Whereas in C you might use callbacks for something (as shown by the qsort and bsearch functions in the standard library) and produce less code, in C++ everything is specialized with a template that gives improved performance. This follows C++’s “don’t pay for what you don’t use” philosophy.

Don’t get me wrong, C++ is not the right tool for all jobs. But among all the languages I have used, C++ stands out from the crowd as one that is almost never a bad choice, and a lot of times is the best choice. It might take longer to get good at than most other languages, but once you’ve got it down its power is hard to match.

Visual C++ 2008 Feature Pack beta

It’s here! A beta of the promised TR1 library update has been put up for download.

Included in the pack is an update to MFC that adds a number of Office-style controls. Wish they’d put these out as plain Win32 controls, because I’ve got no intention of using MFC!

Writing a good parser

From a simple binary protocol used over sockets to a complex XML document, many applications depend on parsing. Why, then, do the great majority of parsers out there just plain suck?

I’ve come to believe a good parser should have two properties:

Several parsers meet the first requirement, but almost none meet the second: they expect their input to come in without a break in execution. So how do you accomplish this? Lately I’ve been employing this fairly simple design:

struct buffer {
  struct buffer *next;
  char *buf;
  size_t len;
};

struct parser {
  struct buffer *input;
  struct buffer *lastinput;
  struct buffer *output;
  int (*func)(struct parser*, struct *state);
};

enum {
  CONTINUE,
  NEEDMORE,
  GOTFOO,
  GOTBAR
};

int parse(struct parser *p) {
  int ret;
  while((ret = p->func(p)) == CONTINUE);

  return ret;
}

The idea should be easy to understand:

  1. Add buffer(s) to input queue.
  2. If parse() returns NEEDMORE, add more input to the queue and call it again.
  3. If parse() returns GOTFOO or GOTBAR, state is filled with data.
  4. The function pointer is continually updated with a parser specialized for the current data: in this case, a FOO or a BAR, or even just bits and pieces of a FOO or a BAR. It returns CONTINUE if parse() should just call a new function pointer.
  5. As the parser function pointers eat at the input queue, put used buffers into the output stack.

Other than meeting my two requirements above, the best thing about this design? It doesn’t sacrifice cleanliness, and it won’t cause your code size to increase.