Posts Tagged ‘c’

Efficient stream parsing in C++

Posted in Coding, Feature Article, Scalability on September 1st, 2009 by Cory – Comments Off

A while ago I wrote about creating a good parser and while the non-blocking idea was spot-on, the rest of it really isn’t very good in C++ where we have the power of templates around to help us.

I’m currently finishing up a HTTP library and have been revising my views on stream parsing because of it. I’m still not entirely set on my overall implementation, but I’m nearing completion and am ready to share my ideas. First, I’ll list my requirements:

  • I/O agnostic: the parser does not call any I/O functions and does not care where the data comes from.
  • Pull parsing: expose a basic stream of parsed elements that the program reads one at a time.
  • Non-blocking: when no more elements can be parsed from the input stream, it must immediately return something indicating that instead of waiting for more data.
  • In-situ reuse: for optimal performance and scalability the parser should avoid copying and allocations, instead re-using data in-place from buffers.
  • A simple, easy to follow parser: having the parser directly handle buffers can easily lead to spaghetti code, so I’m simply getting rid of that. The core parser must operate on a single iterator range.

To accomplish this I broke this out into three layers: a core parser, a buffer, and a buffer parser.

The core parser

Designing the core parser was simple. I believe I already have a solid C++ parser design in my XML library, so I’m reusing that concept. This is fully in-situ pull parser that operates on a range of bidirectional iterators and returns back a sub-range of those iterators. The pull function returns ok when it parses a new element, done when it has reached a point that could be considered an end of the stream, and need_more when an element can’t be extracted from the passed in iterator range. Using this parser is pretty simple:

typedef std::deque<char> buffer_type;
typedef http::parser<buffer_type::iterator> parser_type;

buffer_type buffer;

parser_type p;
parser_type::node_type n;
parser_type::result_type r;

do
{
  push_data(buffer); // add data to buffer from whatever I/O source.

  std::deque<char>::iterator first = buffer.begin();

  while((r = p.parse(first, buffer.end(), n)) == http::result_types::ok)
  {
    switch(n.type)
    {
      case http::node_types::method:
      case http::node_types::uri:
      case http::node_types::version:
    }
  }

  buffer.erase(buffer.begin(), first); // remove all the used
                                       // data from the buffer.
} while(r == http::result_types::need_more);

By letting the user pass in a new range of iterators to parse each time, we have the option of updating the stream with more data when need_more is returned. The parse() function also updates the first iterator so that we can pop any data prior to it from the data stream.

By default the parser will throw an exception when it encounters an error. This can be changed by calling an overload and handling the error result type:

typedef std::deque<char> buffer_type;
typedef http::parser<buffer_type::iterator> parser_type;

buffer_type buffer;

parser_type p;
parser_type::node_type n;
parser_type::error_type err;
parser_type::result_type r;

do
{
  push_data(buffer); // add data to buffer from whatever I/O source.

  std::deque<char>::iterator first = buffer.begin();

  while((r = p.parse(first, buffer.end(), n, err)) == http::result_types::ok)
  {
    switch(n.type)
    {
      case http::node_types::method:
      case http::node_types::uri:
      case http::node_types::version:
    }
  }

  buffer.erase(buffer.begin(), first); // remove all the used
                                       // data from the buffer.
} while(r == http::result_types::need_more);

if(r == http::result_types::error)
{
  std::cerr
    << "an error occured at "
    << std::distance(buffer.begin(), err.position())
    << ": "
    << err.what()
    << std::endl;
}

The buffer

Initially I was testing my parser with a deque<char> like above. This let me test the iterator-based parser very easily by incrementally pushing data on, parsing some of it, and popping off what was used. Unfortunately, using a deque means we always have an extra copy, from an I/O buffer into the deque. Iterating a deque is also a lot slower than iterating a range of pointers because of the way deque is usually implemented. This inefficiency is acceptable for testing, but just won't work in a live app.

My buffer class is I/O- and parsing-optimized, operating on pages of data. It allows pages to be inserted directly from I/O without copying. Ones that weren't filled entirely can still be filled later, allowing the user to commit more bytes of a page as readable. One can use scatter/gather I/O to make operations span multiple pages contained in a buffer.

The buffer exposes two types of iterators. The first type is what we are used to in deque: just a general byte stream iterator. But this incurs the same cost as deque: each increment to the iterator must check if it's at the end of the current page and move to the next. A protocol like HTTP can fit a lot of elements into a single 4KiB page, so it doesn't make sense to have this cost. This is where the second iterator comes in: the page iterator. A page can be thought of as a Range representing a subset of the data in the full buffer. Overall the buffer class looks something like this:

struct page
{
  const char *first;    // the first byte of the page.
  const char *last;     // one past the last byte of the page.
  const char *readpos;  // the first readable byte of the page.
  const char *writepos; // the first writable byte of the page,
                        // one past the last readable byte.
};

class buffer
{
public:
  typedef ... size_type;
  typedef ... iterator;
  typedef ... page_iterator;

  void push(page *p); // pushes a page into the buffer.  might
                      // be empty, semi-full, or full.

  page* pop(); // pops the first fully read page from from the buffer.

  void commit_write(size_type numbytes); // merely moves writepos
                                         // by some number of bytes.

  void commit_read(size_type numbytes); // moves readpos by
                                        // some number of bytes.

  iterator begin() const;
  iterator end() const;

  page_iterator pages_begin() const;
  page_iterator pages_end() const;
};

One thing you may notice is it expects you to push() and pop() pages directly onto it, instead of allocating its own. I really hate classes that allocate memory – in terms of scalability the fewer places that allocate memory, the easier it will be to optimize. Because of this I always try to design my code to – if it makes sense – have the next layer up do allocations. When it doesn't make sense, I document it. Hidden allocations are the root of evil.

The buffer parser

Unlike the core parser, the buffer parser isn't a template class. The buffer parser exposes the same functionality as a core parser, but using a buffer instead of iterator ranges.

This is where C++ gives me a big advantage. The buffer parser is actually implemented with two core parsers. The first is a very fast http::parser<const char*>. It uses this to parse as much of a single page as possible, stopping when it encounters need_more and no more data can be added to the page. The second is a http::parser<buffer::iterator>. This gets used when the first parser stops, which will happen very infrequently – only when a HTTP element spans multiple pages.

This is fairly easy to implement, but required a small change to my core parser concept. Because each has separate internal state, I needed to make it so I could move the state between two parsers that use different iterators. The amount of state is actually very small, making this a fast operation.

The buffer parser works with two different iterator types internally, so I chose to always return a buffer::iterator range. The choice was either that or silently copy elements spanning multiple pages, and this way lets the user of the code decide how they want to handle it.

Using the buffer parser is just as easy as before:

http::buffer buffer;
http::buffer_parser p;
http::buffer_parser::node_type n;
http::buffer_parser::result_type r;

do
{
  push_data(buffer); // add data to buffer from whatever I/O source.

  while((r = p.parse(buffer, n)) == http::result_types::ok)
  {
    switch(n.type)
    {
      case http::node_types::method:
      case http::node_types::uri:
      case http::node_types::version:
    }
  }

  pop_used(buffer); // remove all the used
                    // data from the buffer.
} while(r == http::result_types::need_more)

The I/O layer

I'm leaving out an I/O layer for now. I will probably write several small I/O systems for it once I'm satisfied with the parser. Perhaps one using asio, one using I/O completion ports, and one using epoll. I've designed this from the start to be I/O agnostic but with optimizations that facilitate efficient forms of all I/O, so I think it could be an good benchmark of the various I/O subsystems that different platforms provide.

One idea I've got is to use Winsock Kernel to implement a kernel-mode HTTPd. Not a very good idea from a security standpoint, but would still be interesting to see the effects on performance. Because the parser performs no allocation, no I/O calls, and doesn't force the use of exceptions, it should actually be very simple to use in kernel-mode.

C++1x loses Concepts

Posted in Coding on July 22nd, 2009 by Cory – Comments Off

Bjarne Stroustrup and Herb Sutter have both reported on the ISO C++ meeting in Frankfurt a week ago, in which the much-heralded feature “concepts” were removed from C++1x.

Concepts are a powerful feature aimed at improving overloading (basically, removing the extra work in using things like iterator categories) and moving type checking up the ladder so that more reasonable error messages can be produced when a developer passes in the wrong type (think a single error line instead of multiple pages of template crap). Apparently the feature was a lot less solid than most of us thought, with a huge amount of internal arguing within the committee on a lot of the fundamental features of it. It seems that while most agreed concepts were a good idea, nobody could agree on how to implement them.

I’m definitely disappointed by this, but I’m also glad they chose to remove concepts instead of further delaying the standard, or worse putting out a poorly designed one. Instead, it seems like there is hope for a smaller C++ update to come out in 4-5 years that adds a more well thought out concepts feature. There are plenty of other C++1x language features to be happy about for though, like variadic templates, rvalue references, and lambda functions!

You may notice I’ve been saying C++1x here instead of C++0x — that’s because it’s pretty obvious to everyone now that we won’t be getting the next C++ standard in 2009, but more likely 2011 or 2012. Just in time for the end of the world!

C++ ORM framework for SQLite

Posted in Coding on May 18th, 2009 by Cory – Comments Off

Over the past week I’ve been rewriting my rather dated SQLite wrapper to have an efficient, modern C++ feel. The basic wrapper is there, but I was looking for something a little more this time.

While looking at the problem I decided I was spending too much time writing boilerplate SQL for all my types so I decided to look at existing ORM frameworks. I’m pretty picky about my C++ though, and couldn’t find anything I liked so I started writing my own. Instead of creating a tool to generate C++, I wanted to take a pure approach using native C++ types and template metaprogramming.

What I ended up with is not a full ORM framework, and I’m not particularly interested in making it one. All I’m aiming for is removing boilerplate code while leaving it easy to extend it for more complex queries. Here’s what I’ve got so far:

struct my_object
{
  int id;
  std::string value;
  boost::posix_time::ptime time;
};

typedef boost::mpl::vector<
  sqlite3x::column<
    my_object, int, &my_object::id,
    sqlite3x::primary_key, sqlite3x::auto_increment
  >,
  sqlite3x::column<
    my_object, std::string, &my_object::value,
    sqlite3x::unique_key
  >,
  sqlite3x::column<
    my_object, boost::posix_time::ptime, &my_object::time
  >
> my_object_columns;

typedef sqlite3x::table<
  my_object,
  my_object_columns
> my_object_table;

Using it is pretty simple. It uses the primary key as expected, generating the proper WHERE conditions and even extracting the type to let find() and others specify only the primary key:

sqlite3x::connection con("test.db3");

my_object_table my_objects(con, "t_objects");

my_objects.add(my_object());
my_objects.edit(my_object());
my_objects.remove(int());
my_objects.exists(int());
my_objects.find(int());

One benefit of the approach taken is it makes working with single- and multiple-inheritance just as easy:

struct my_derived :
  my_object
{
  float extra;
};

typedef boost::mpl::copy<
  boost::mpl::vector<
    sqlite3x::column<my_derived, float, &my_object::extra>
  >,
  boost::mpl::back_inserter<my_object_columns>
> my_derived_columns;

typedef sqlite3x::table<
  my_derived,
  my_derived_columns
> my_object_table;

The next thing on the list was supporting types not known natively to sqlite3x. I did not want to have the headache of sub-tables, so I took the easy route and implemented basic serialization support:

struct my_derived :
  my_object
{
  std::vector<boost::uuid> uuids;
};

struct uuids_serializer
{
  static void serialize(std::vector<boost::uint8_t> &buffer,
     const std::vector<boost::uuid> &uuids);

  template<typename Iterator>
  static Iterator deserialize(std::vector<boost::uuid> &uuids,
     Iterator first, Iterator last);
};

typedef boost::mpl::copy<
  boost::mpl::vector<
    sqlite3x::column<
      my_derived, float, &my_object::extra,
      sqlite3x::serializer<uuids_serializer>
    >
  >,
  boost::mpl::back_inserter<my_object_columns>
> my_derived_columns;

A few things aren’t finished, like specifying indexes and support for multi-column primary keys.

Overall though, I’m pretty happy with it. The majority of what I use SQLite for doesn’t require many complex queries, so this should greatly help lower the amount of code I have to manage.

Best of all this ORM code is in an entirely isolated header file — if you don’t want it, just don’t include it and you’ll still have access to all the basic SQLite functions. Even with it included I kept to the C++ mantra of “dont pay for what you don’t use” — as it is entirely template-driven, code will only be generated if you actually use it.

Once I’m finished the code will replace what I have up on the SQLite wrapper page, but until then it will exist in the subversion repository only.

C++0x is now feature complete

Posted in Coding on October 28th, 2008 by Cory – Comments Off

Herb Sutter posts to tell us C++0x is now feature complete. There will now be about a year of bugfixing and clarification, but that’s it: all the features are now known, and their interfaces are solid barring any bugs being found. This means compilers can finally start implementing C++0x at full speed without too much worry of surprises.

The Committee Draft is not yet available, but it is about the same as the September 2008 Working Draft.

GCC 4.3, C++0x preview

Posted in Coding on March 19th, 2008 by Cory – Comments Off

GCC 4.3 came out a couple weeks ago, and I finally got time to give its experimental C++0x support a go. Specifically, I was interested in two features of it: variadic templates and rvalue references.

There is one prime example of what these two features are awesome for: perfect forwarding. Take a memory pool. You might have something like this:

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }
};

But that is hardly satisfactory. What if you want to construct T with an argument?

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }

   template<typename T, typename ArgT>
   T* construct(const ArgT &arg) { return new(alloc()) T(arg); }
};

So we add a new function to handle passing an arg. Better, but still not very great. What if you want to pass it multiple arguments?

C++ has very few problem that can’t be worked around in a relatively straitforward way. Unfortunately, this is one of those problems. The current solution most library developers employ will involve some really nasty preprocessor hacks to generate separate functions for 1, 2, 3, up to maybe 10 or 15 arguments. So, how do we solve this?

Enter variadic templates, the new C++ feature built specifically to solve this. Here is an updated pool class that takes any number of arguments:

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(const Args&... args) { return new(alloc()) T(args...); }
};

Pretty simple! Those ellipses will expand into zero or more args. Great – we’re almost there. But we still have a problem here: what happens if the constructor for T takes some arguments as non-const references? This construct function will try to pass them as const references, resulting in a compile error. We can’t have it pass args as non-const references, because then if you pass it an rvalue—such as a temporary—it will generate another compile error as rvalues can only be bound to const references.

This is where the second part of our pool upgrades come in: rvalue references.

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(Args&&... args) {
      return new(alloc()) T(std::forward(args)...);
   }
};

We’ve finally got our solution. That double-reference looking thing is the new syntax for rvalue references. This construct implements perfect forwarding: calling construct<foo>(a, b, c, d); will behave exactly as if we had called the constructor directly via new(alloc()) T(a, b, c, d).

This works because Args is a templated type that will resolve to references and const references if it needs to. One problem I have yet to figure out how to solve is a constructor where you know the type you want, and want to accept any reference type:

struct foo {
   foo(const bar &b) : m_b(b) {}
   foo(bar &&b) : m_b(std::move(b)) {}

   bar m_b;
};

I don’t care if b is a lvalue or rvalue reference: I just want the construction of m_b to be as efficient as possible so that it can use move semantics when you pass it an rvalue. So far the only way I can find to do it is with two separate constructors, which could mean a lot of code duplication on something less trivial than this example.

C++ sucks less than you think it does

Posted in Coding on January 10th, 2008 by Cory – Comments Off

C++ seems to be the quickest language to get a knee-jerk “it sucks!” reaction out of people.  The common reasons they list:

  • It’s ugly, hard to read, and unmaintainable.
  • It’s easy to get memory leaks – computers are fast enough, use Java or another language with garbage collection!
  • It results in larger, bloated executables.

C++ is a very powerful, very complex language.  Being a multi-paradigm (procedural-, functional-, object-oriented–, and meta-programming) language that implores the coder to always use the best tool for the job, C++ forces you to think differently than other popular languages and will take the average coder years of working with it before they start to get truly good at it.

C++ does have its flaws.  Some are fixable, some aren’t.  Most of what is fixable is being addressed in a new standard due some time next year (2009).

Its biggest problem is that newcomers tend to only see its complexity and syntax, not its power.  The primary reason for this is education.  In introductory and advanced courses, students are taught an underwhelming amount of templates without any of the useful practices that can make them so epically powerful.  Use of the interesting parts of the standard library, such as iterators and functional programming, are put aside in favor of object-oriented design and basic data structures which, while useful, is only a small part of what C++ is about.  RAII, an easy zero-cost design pattern that makes resource leaks almost impossible, is virtually untaught.

C++ does tend to produce larger executables.  This is a trade off – do you want smaller executables or faster code?  Whereas in C you might use callbacks for something (as shown by the qsort and bsearch functions in the standard library) and produce less code, in C++ everything is specialized with a template that gives improved performance.  This follows C++’s “don’t pay for what you don’t use” philosophy.

Don’t get me wrong, C++ is not the right tool for all jobs.  But among all the languages I have used, C++ stands out from the crowd as one that is almost never a bad choice, and a lot of times is the best choice.  It might take longer to get good at than most other languages, but once you’ve got it down its power is hard to match.

Writing a good parser

Posted in Coding on January 2nd, 2008 by Cory – Comments Off

From a simple binary protocol used over sockets to a complex XML document, many applications depend on parsing.  Why, then, do the great majority of parsers out there just plain suck?

I’ve come to believe a good parser should have two properties:

  • It should be completely decoupled from any I/O.  The same parser should work on a file, a pipe, a socket, or from straight memory.  This not only gets you flexibility but the ability to make test cases simpler: protocol handler acting up?  Just write a test case that reads it from a file instead.
  • Other than perhaps memory allocation, a parser should never block.  To accomplish the above task, I’ve seen several parsers use read/write callbacks.  This causes the parser to block on I/O.  Should you ever need to use the parser in a scalable environment, this just won’t do—a good server only has one or two threads per CPU and blocking on I/O doesn’t allow this.

Several parsers meet the first requirement, but almost none meet the second: they expect their input to come in without a break in execution.  So how do you accomplish this?  Lately I’ve been employing this fairly simple design:

struct buffer {
  struct buffer *next;
  char *buf;
  size_t len;
};

struct parser {
  struct buffer *input;
  struct buffer *lastinput;
  struct buffer *output;
  int (*func)(struct parser*, struct *state);
};

enum {
  CONTINUE,
  NEEDMORE,
  GOTFOO,
  GOTBAR
};

int parse(struct parser *p) {
  int ret;
  while((ret = p->func(p)) == CONTINUE);

  return ret;
}

The idea should be easy to understand:

  1. Add buffer(s) to input queue.
  2. If parse() returns NEEDMORE, add more input to the queue and call it again.
  3. If parse() returns GOTFOO or GOTBAR, state is filled with data.
  4. The function pointer is continually updated with a parser specialized for the current data: in this case, a FOO or a BAR, or even just bits and pieces of a FOO or a BAR.  It returns CONTINUE if parse() should just call a new function pointer.
  5. As the parser function pointers eat at the input queue, put used buffers into the output stack.

Other than meeting my two requirements above, the best thing about this design?  It doesn’t sacrifice cleanliness, and it won’t cause your code size to increase.

C++0x work progressing

Posted in Coding on November 28th, 2007 by Cory – Be the first to comment

A bit delayed, but I just found the results of the October 2007 C++ meeting.  In it, they voted in several really nice things for the C++0x draft:

  • nullptr – no more using 0 or NULL and getting int/pointer overload issues. (N2431)
  • Atomic library – comes with several classes and utility functions for working with memory ordering and lock–free algorithms. (N2427)
  • Threading library – basic threading, mutexes, and condition variables. (N2447)
  • Unicode literals – UTF-8 and UTF-16 string literals. (N2442)
  • Unicode codecvt facets – UTF-8 and UTF-16 codecvt facets for reading Unicode streams. (N2401)

Visual Studio 2008 released, TR1 support coming.

Posted in Coding, Microsoft on November 24th, 2007 by Cory – Be the first to comment

Anyone following Visual Studio 2008 will know that although it offers a plethora of new features for the managed world, there was little focus on the unmanaged side of things.  Now that it is finally out the door, I guess it’s a good time to look at what few new features are there for us unmanaged C++ coders.

  • Improved standards conformance with support for friend templates, an uncommon but powerful C++ feature.
  • Intrinsic support for SSSE3, SSE4.x, and SSE4a.  These are modern vector instructions (SSE4a literally just came out with AMD’s Phenom processors!) that anyone interested in writing high‐performance code will want to be familiar with.
  • Intrinsic support for the CMPXCHG16B instruction.  This instruction is essential when writing many lock-free algorithms for the x64 platform.  I’ve been lobbying to have it added for a long time, so I’m especially happy to finally see it.  Unfortunately, the generated code in Beta 2 was very sub‐optimal (considering the instruction is typically used in very tight loops) so I may end up using assembly anyway!  I’m anxious to see if it is improved in RTM.
  • Improved optimizer, with support for inlining transcendental functions and scheduling for the latest CPUs.
  • Linker options updated for Vista – ability to specify UAC and address space randomization properties.  For some reason, still no support for DPI independence so we’ll end up writing manifests anyway.
  • We can finally use those quad‐core CPUs that are coming out to reduce our compile times with Multi‐threaded compiling.

Not much, huh?  That’s because Microsoft was running under the assumption that people would flock to C# and only use unmanaged C++ to maintain "legacy" code.  Perhaps the best news so far, they’ve finally realized their mistake.  Although they didn’t have time to put things into VC++ 2008, they have re‐committed to unmanaged code for the next version and in the meantime made a small separate announcement that they will be bringing VC++ 2008 users a mostly complete TR1 implementation update in early January.