Asynchronous

Limiting asynchrony with task parallelism

When I started re-writing this website, I wanted to make good use of my multi-core CPU. Generating hundreds of pages using XSL transforms and plenty of pre-processing in C#, there's a lot of parallelism to be had.

I began by using the TPL's data parallelism features: mainly Parallel.ForEach and Parallel.Invoke. These are super easy to use, and made an immediate huge difference.

Then the Visual Studio 11 developer preview came out, and I felt compelled to make use of its new async features. This meant ditching the Parallel methods all together and writing for task parallelism.

There are still parts of the .NET Framework which don't support async, and XML is one of them. Because I'm reading relatively small documents, I was able to work around these limitations by asynchronously filling a MemoryStream from a file and feeding the MemoryStream to the XML classes:

Task<FileStream> OpenReadAsync(string fileName)
{
    return Task.Factory.StartNew(state =>
        new FileStream((string)state, FileMode.Open, FileAccess.Read,
                       FileShare.Read, 4096, true), fileName);
}

async Task<XmlReader> CreateXmlReader(string fileName,
                                      XmlReaderSettings settings = null)
{
    MemoryStream ms = new MemoryStream();
    
    using (FileStream fs = await OpenReadAsync(fileName))
    {
        await fs.CopyToAsync(ms);
    }

    ms.Position = 0;
    return XmlReader.Create(ms, settings, fileName);
}

But I had one more problem to solve. For efficiency, Parallel.ForEach partitions its items into ranges which will be operated on concurrently. A side effect of this that I was relying on was that only so many I/O operations would be able to happen at once. In my new code I'm simply launching all these tasks at once rather than partitioning—this absolutely killed performance as potentially hundreds of concurrent I/Os caused my disk to seek like crazy.

What I ended up doing here was creating a ticket system which can be used to allow only a limited number of I/Os to happen concurrently: essentially a safe task-based semaphore.

sealed class AsyncLimiter
{
    public AsyncLimiter(int max);
    public Task<IDisposable> Lock();
}

The full implementation is available in Subversion and under a 2-clause BSD license. Using it is very simple:

AsyncLimiter limiter = new AsyncLimiter(4);

async Task<FileStream> OpenReadAsync(string fileName)
{
    using (IDisposable limiterlock = await limiter.Lock())
    {
        return await Task.Factory.StartNew(state =>
            new FileStream((string)state, FileMode.Open, FileAccess.Read,
                           FileShare.Read, 4096, true), fileName);
    }
}

async Task<XmlReader> CreateXmlReader(string fileName,
                                      XmlReaderSettings settings = null)
{
    MemoryStream ms = new MemoryStream();

    using (FileStream fs = await OpenReadAsync(fileName))
    using (IDisposable limiterlock = await limiter.Lock())
    {
        await fs.CopyToAsync(ms);
    }

    ms.Position = 0;
    return XmlReader.Create(ms, settings, fileName);
}

When the lock gets disposed, it'll let the next operation in line progress. This was simple to implement efficiently using Interlocked methods and a ConcurrentQueue.

Some operations—file opening and existence testing, directory creation, etc.—have no asynchronous analog. For these there is no good solution, so I simply wrapped them in a task as in the OpenReadAsync example above. They're rare enough that it hasn't been a problem.

The end result? Actually about 50% better performance than using the Parallel methods. When all the files are in cache, I'm able to generate this entire website from scratch in about 0.7 seconds.

Asynchronous page faults

With I/O, we’ve got some choices:

  1. Synchronous, copying from OS cache ( fread). This is the simplest form of I/O, but isn’t very scalable.
  2. Synchronous, reading directly from OS cache (memory mapping). This is wicked fast and efficient once memory is filled, but aside from some cases with read-​ahead, your threads will still block with page faults.
  3. Asynchronous, copying from OS cache ( ReadFile). Much more scalable than fread, but each read still involves duplicating data from the OS cache into your buffer. Fine if you’re reading some data only to modify the buffer in place, but still not very great when you’re treating it as read only (such as to send over a socket).
  4. Asynchronous, maintaining your own cache ( FILE_FLAG_NO_BUFFERING). More scalable still than ReadFile, but you need to do your own caching and it’s not shared with other processes.

Note that there’s one important choice missing: memory mapping with asynchronous page faults. As far as I know there are no operating systems that actually offer this—it’s kind of a dream feature of mine. There are two APIs that will help support this:

HANDLE CreateMemoryManager();
BOOL MakeResident(HANDLE, LPVOID, SIZE_T, LPOVERLAPPED);

CreateMemoryManager opens a handle to the Windows memory manager, and MakeResident will fill the pages you specify (returning true for synchronous completion, false for error/async like everything else). The best of both worlds: fast, easy access through memory, a full asynchronous workflow, and shared cache usage. This would be especially useful on modern CPUs that offer gigantic address spaces.

The memory manager already has similar functionality in there somewhere, so it might not be difficult to pull into user-​mode. Just an educated guess. Maybe it’d be terribly difficult. Dream feature!

Is C# the Boost of C-family languages?

For all the cons of giving a single entity control over C#, one pro is that it gives the language an unmatched agility to try new things in the C family of languages. LINQ—both its language integration and its backing APIs—is an incredibly powerful tool for querying and transforming data with very concise code. I really can’t express how much I’ve come to love it.

The new async support announced at PDC10 is basically the holy grail of async coding, letting you focus on what your task is and not how you’re going to implement a complex async code path for it. It’s an old idea that many async coders have come up with, but, as far as I know, has never been successfully implemented simply because it required too much language support.

The lack of peer review and standards committee for .​NET shows—there’s a pretty high rate of turnover as Microsoft tries to iron down the right way to tackle problems, and it results in a very large library with lots of redundant functionality. As much as this might hurt .​NET, I’m starting to view C# as a sort of Boost for the C language family. Some great ideas are getting real-​world use, and if other languages eventually feel the need to get something similar, they will have a bounty of experience to pull from.

C++, at least, is a terrifyingly complex language. Getting new features into it is an uphill battle, even when they address a problem that everyone is frustrated with. Getting complex new features like these into it would be a very long process, with a lot of arguing and years of delay. Any extra incubation time we can give them is a plus.