Archive for May, 2009

High Performance I/O on Windows

Posted in Coding, Feature Article, Scalability on May 13th, 2009 by Cory – Comments Off

I/O completion ports were first introduced in Windows NT 4.0 as a highly scalable, multi-processor capable alternative to the then-typical practices of using select, WSAWaitForMultipleEvents, WSAAsyncSelect, or even running a single thread per client. The most optimal way to perform I/O on Windows — short of writing a kernel-mode driver — is to use I/O completion ports.

A recent post on Slashdot claimed sockets have run their course, which I think is absolutely not true! The author seems to believe the Berkeley sockets API is the only way to perform socket I/O, which is nonsense. Much more modern, scalable and high performance APIs exist today such as I/O completion ports on Windows, epoll on Linux, and kqueue on FreeBSD. In light of this I thought I’d write a little about completion ports here.

The old ways of multiplexing I/O still work pretty well for scenarios with a low number of concurrent connections, but when writing a server daemon to handle a thousand or even tens of thousands of concurrent clients at once, we need to use something different. In this sense the old de facto standard Berkeley sockets API has run its course, because the overhead of managing so many connections is simply too great and makes using multiple processors hard.

An I/O completion port is a multi-processor aware queue. You create a completion port, bind file or socket handles to it, and start asynchronous I/O operations. When they complete — either successfully or with an error — a completion packet is queued up on the completion port, which the application can dequeue from multiple threads. The completion port uses some special voodoo to make sure only a specific number of threads can run at once — if one thread blocks in kernel-mode, it will automatically start up another one.

First you need to create a completion port with CreateIoCompletionPort:

HANDLE iocp = CreateIoCompletionPort(INVALID_HANDLE_VALUE,
   NULL, 0, 0);

It’s important to note that NumberOfConcurrentThreads is not the total number of threads that can access the completion port — you can have as many as you want. This instead controls the number of threads it will allow to run concurrently. Once you’ve reached this number, it will block all other threads. If one of the running threads blocks for any reason in kernel-mode, the completion port will automatically start up one of the waiting threads. Specifying 0 for this is equivalent to the number of logical processors in the system.

Next is creating and associating a file or socket handle, using CreateFile, WSASocket, and CreateIoCompletionPort:

#define OPERATION_KEY 1

HANDLE file = CreateFile(L"file.txt", GENERIC_READ,
   FILE_SHARE_READ, NULL, OPEN_EXISTING,
   FILE_FLAG_OVERLAPPED, NULL);

SOCKET sock = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP,
   NULL, 0, WSA_FLAG_OVERLAPPED);

CreateIoCompletionPort(file, iocp, OPERATION_KEY, 0);
CreateIoCompletionPort((HANDLE)sock, iocp, OPERATION_KEY, 0);

Files and sockets must be opened with the FILE_FLAG_OVERLAPPED and WSA_FLAG_OVERLAPPED flags before they are used asynchronously.

Reusing CreateIoCompletionPort for associating file/socket handles is weird design choice from Microsoft but that’s how it’s done. The CompletionKey parameter can be anything you want, it is a value provided when packets are dequeued. I define a OPERATION_KEY here to use as the CompletionKey, the significance of which I’ll get to later.

Next we have to start up some I/O operations. I’ll skip setting up the socket and go right to sending data. We start these operations using ReadFile and WSASend:

OVERLAPPED readop, sendop;
WSABUF sendwbuf;
char readbuf[256], sendbuf[256];

memset(&readop, 0, sizeof(readop));
memset(&sendop, 0, sizeof(sendop));

sendwbuf.len = sizeof(sendbuf);
sendwbuf.buf = sendbuf;

BOOL readstatus = ReadFile(file, readbuf,
   sizeof(readbuf), NULL, &readop);

if(!readstatus)
{
  DWORD readerr = GetLastError();

  if(readerr != ERROR_IO_PENDING)
  {
    // error reading.
  }
}

int writestatus = WSASend(sock, &sendwbuf, 1, NULL,
   0, &sendop, NULL);

if(writestatus)
{
  int writeerr = WSAGetLastError();

  if(writeerr != WSA_IO_PENDING)
  {
    // error sending.
  }
}

Every I/O operation using a completion port has an OVERLAPPED structure associated with it. Windows uses this internally in unspecified ways, only saying we need to zero them out before starting an operation. The OVERLAPPED structure will be given back to us when we dequeue the completion packets, and can be used to pass along our own context data.

I have left out the standard error checking up until now for brevity’s sake, but this one doesn’t work quite like one might expect so it is important here. When starting an I/O operation, an error might not really be an error. If the function succeeds all is well, but if the function fails, it is important to check the error code with GetLastError or WSAGetLastError.

If these functions return ERROR_IO_PENDING or WSA_IO_PENDING, the function actually still completed successfully. All these error codes mean is an asynchronous operation has been started and completion is pending, as opposed to completing immediately. A completion packet will be queued up regardless of the operation completing asynchronously or not.

Dequeuing packets from a completion port is handled by the GetQueuedCompletionStatus function:

OVERLAPPED *ovl;
ULONG_PTR completionkey;
DWORD transferred;

BOOL ret = GetQueuedCompletionStatus(iocp, &transferred,
   &completionkey, &ovl, INFINITE);

if(ret)
{
  // I/O completed successfully.
}
else if(ovl)
{
  // dequeued successfully but the I/O operation
  // failed, get extended information.
  DWORD err = GetLastError();
}
else
{
  // error dequeuing a packet.
}

This function dequeues completion packets, consisting of the completion key we specified in CreateIoCompletionPort and the OVERLAPPED structure we gave while starting the I/O. If the I/O transferred any data, it will retrieve how many bytes were transferred too. Again the error handling is a bit weird on this one, having three error states.

This can be run from as many threads as you like, even a single one. It is common practice to run a pool of twice the number of threads as there are logical processors available, to keep the CPU active with the aforementioned functionality of starting a new thread if a running one blocks.

Unless you are going for a single-threaded design, I recommend starting two threads per logical CPU. Even if your app is designed to be 100% asynchronous, you will still run into blocking when locking shared data and even get unavoidable hidden blocking I/Os like reading in paged out memory. Keeping two threads per logical CPU will keep the processor busy without overloading the OS with too much context switching.

This is all well and good, but two I/O operations were initiated — a file read and a socket send. We need a way to tell their completion packets apart. This is why we need to attach some state to the OVERLAPPED structure when we call those functions:

struct my_context
{
  OVERLAPPED ovl;
  int isread;
};

struct my_context readop, sendop;

memset(&readop.ovl, 0, sizeof(readop.ovl));
memset(&sendop.ovl, 0, sizeof(sendop.ovl));

readop.isread = 1;
sendop.isread = 0;

ReadFile(file, readbuf, sizeof(readbuf), NULL, &readop.ovl);
WSASend(sock, &sendwbuf, 1, NULL, 0, &sendop.ovl, NULL);

Now we can tell the operations apart when we dequeue them:

OVERLAPPED *ovl;
ULONG_PTR completionkey;
DWORD transferred;

GetQueuedCompletionStatus(iocp, &transferred,
   &completionkey, &ovl, INFINITE);

struct my_context *ctx = (struct my_context*)ovl;

if(ctx->isread)
{
  // read completed.
}
else
{
  // send completed.
}

The last important thing to know is how to queue up your own completion packets. This is useful if you want to split an operation up to be run on the thread pool, or if you want to exit a thread waiting on a call to GetQueuedCompletionStatus. To do this, we use the PostQueuedCompletionStatus function:

#define EXIT_KEY 0

struct my_context ctx;

PostQueuedCompletionStatus(iocp, 0, OPERATION_KEY, &ctx.ovl);
PostQueuedCompletionStatus(iocp, 0, EXIT_KEY, NULL);

Here we post two things onto the queue. The first, we post our own structure onto the queue, to be processed by our thread pool. The second, we give a new completion key: EXIT_KEY. The thread which processes this packet can test if the completion key is EXIT_KEY to know when it needs to stop dequeuing packets and shut down.

Other than the completion port handle, Windows does not use any of the parameters given to PostQueuedCompletionStatus. They are entirely for our use, to be dequeued with GetQueuedCompletionStatus.

That’s all I have to write for now, and should be everything one would need to get started learning these high performance APIs! I will make another post shortly detailing some good patterns for completion port usage, and some optimization tips to ensure efficient usage of these I/O APIs.

Update: this subject continued in I/O completion ports made easy.

Star Trek rocks!

Posted in Entertainment on May 11th, 2009 by Cory – Comments Off

090414trek_smallThe new Star Trek movie is fantastic! Everyone I’ve talked to, even non-geeks, are really loving this movie. If you haven’t seen it yet, definitely find the time for this one.

The acting from everyone was phenomenal. Zachary Quinto came along at just the right time for a Star Trek movie — he really looks and plays the part well. I was worried at first because Chris Pine looked very similar to Dexter in the advertisements, but that turned out to not be a problem.

The story was a little light, but I think they did a pretty subtle yet cool thing with time travel. Most movies that involved messed up time lines will eventually have their stars come up with some way to fix it, but this movie decided to not do that — basically clearing the slate by making all the stuff in the original TV series not happen. I’m not sure how the fans will appreciate that, but I thought it was cool enough.

The only questionable things I saw was the set brightness (really — is the future going to be so blindingly bright that you can’t take pictures of anything without a lens flare?) and the product placement. At least they did it early in the movie so it didn’t detract much, but as soon as Nokia and Budweiser showed up on screen it took me right out of the story.

I’ve never been a big fan of the original Star Trek series so I can’t really say if it did justice to that. It was just too campy and outdated by the time I came across it, so I’ve only seen a handful of episodes. I did watch a lot of The Next Generation, Deep Space 9, and Voyager though.

Overall I thought it did a good job of keeping with the Star Trek style of futuristic-yet-plausible. The more different looking alien races were either downplayed or not shown at all, which I think was a good thing because it helped keep it more realistic.

It appears everything J. J. Abrams touches turns to gold nowadays. I hope this does well so we can get more.

Seishun concert in a few weeks

Posted in Entertainment, Japanese on May 10th, 2009 by Cory – Comments Off

Kyodo Taiko is having their Spring concert “Seishun” in a few weeks. “Seishun” means a time of youth, so I wonder it will feature some new sets from their newbie members. Or maybe I’m reading too much into it.

It is on two days, the 29th and 31st. I’ll be there on the 31st!

Battlestar concerts hitting LA and Comic-Con

Posted in Entertainment on May 8th, 2009 by Cory – Comments Off

bgconcerts2009-posterBear McCreary has announced four new Battlestar concerts.

The first is on June 13th at the California Plaza in downtown LA, and is free as part of the Grand Performances series! So go get some shabu shabu for dinner in Little Tokyo and enjoy the music of BSG a few blocks over.

The last three concerts will be the nights of Comic-Con 2009 in the House of Blues of San Diego, which is just a few blocks away from the convention center. These also mark the launch of the dual-disc Season 4 soundtrack, so people going to that might get their hands on an early copy for signing but I bet the tickets will go very quick!

Needless to say (but who cares about need) I am ecstatic as the first concert is 5min down the road from me and I’ll be at Comic-Con on Saturday.

Boys and Girls with Short Hair

Posted in Entertainment, Japanese on May 7th, 2009 by Cory – Comments Off

Fusion X over at UCSD finished up not too long ago, and I didn’t get to go but I was watching some videos of the performances over at That’s Fresh and this one caught my eye. Boys and Girls with Short Hair is a pretty incredible collaboration between several dance teams.

I love the whole video but the second to last segment (about 3:10 in) is something really special. Performed to Adele’s Crazy for You, it’s a rare heartfelt piece that communicates a lot in a minute and a half. I can’t stop watching it!

The YouTube video has a few more details in its description.

Windows 7 RC is available for two months

Posted in Coding, Microsoft on May 5th, 2009 by Cory – Comments Off

A couple weeks before the Windows 7 RC came out, I formatted back to Vista so I could test the upgrade process. I found myself missing various smaller features that made Win7 so much nicer to use. Even the simple feature of moving the “show desktop” feature to the bottom right of the taskbar instead of as a quick launch shortcut.

Well, a little under a week ago the RC was released to testers and I’ve been pretty happy with it. Some features were added but if you’ve used the beta you probably won’t notice many significant changes — it is mostly bug fixes and optimizations. I’ve not only been running it on my desktop, but on my Eee PC where it has been performing quite well with no tweaks.

Today the RC was released for anyone to download, and will be available for the next two months. Developers can also download the SDK to get a head start on writing applications for it.

I’m happy that Windows Media Center, perhaps the most problematic portion of the beta, has got the polish it needed. It is much more stable and finally looks like something other than the TV Pack that was kind of released for Vista. One of the biggest feature complaints was recording to an incompatible “.wtv” format, which has been somewhat alleviated by a “convert to .dvr-ms” option which is enabled for non-DRMed .wtv recordings.

Dinner at Sushi Zo

Posted in Food, Japanese on May 2nd, 2009 by Cory – Comments Off

I ate dinner at Sushi Zo tonight. It is thought by many to have the best sushi in LA, so I was determined to try it out. Sushi Zo only serves omakase (where the chef chooses what to give you), and they have it very streamlined with three itamaes all working in unison to pump out single pieces of sushi and sashimi very quickly. For such a small area (it looks like it could only seat about 30-40 people, including 8 at the bar) it is by far the busiest sushi place I’ve ever been to. The itamaes and waitresses are always announcing orders back and forth in Japanese, and the place goes through about 1 plate per person every minute.

Unlike most places, Sushi Zo makes their own ponzu sauce and readily instructs you when to not use soy sauce because it will overpower the flavor. That might seem weird but with such fresh fish you really don’t need much to enhance the flavor. I had just about everything they had to offer, my favorites being hamachi, ankimo, and an oyster. It was all fantastic with each bite having it’s own subtle flavor, easily living up to the hype. If it wasn’t so damned expensive I would make it my regular place, but En Sushi will have to suffice for now.