Chris Umbel

Parallel Programming with the Task Parallel Library and PLINQ in .Net 4.0

.Net 4.0 Logo It's no secret that parallel computing is becoming more important. As clock speeds have stagnated and the number of cores per die have increased one thing has become clear. Software developers have to adapt to the current state of processors by writing code that's more parallelizable.

In the past many programmers have avoided parallel processing when possible mainly due to its complexity even in the face of an obvious increase in throughput. Those that have parallelized have often done it poorly and suffered through some serious misery as a result.

We're running out of options, though. In order to get more done faster we have to do more at once.

Development platform providers have been scrambling lately to try to simplify parallel development and minimize the amount of work we'll have to invest in writing parallel code. It's clear that there's much market share to be gained by handling this problem well.

Microsoft's Approach

In .Net 4.0 Microsoft's attacking the problem by way of the Task Parallel Library and Parallel LINQ. These were previously available as a separate package under the name Parallel Extensions but have now officially joined the framework.

Task Parallel Library

The Task Parallel Library, or TPL, provides a set of tools that simplify common chores associated with parallel programming. Not only does it ease initial development but it also simplifies maintenance by auto-scaling to the capabilities of the machine hosting the code.

Tasks

The core component of the TPL is the Task, which is a single operation that can be executed asynchronously. These tasks are enqueued in a thread pool which manages their execution. The plumbing of maintaining the queue is entirely automated making the job of the application programmer quite simple.

Consider the following example which executes two crude tasks simultaneously.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
/* import the task parallel libary */
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            /* create a task that prints a bunch of "A"s */
            Task t1 = new Task(() => {
                    for(int i = 0; i <= 255; i++) {
                        Console.WriteLine("A");
                    }
                }
            );

            /* create a task that prints a bunch of "B"s */
            Task t2 = new Task(() => {
                    for (int i = 0; i <= 255; i++)
                    {
                        Console.WriteLine("B");
                    }
                }
            );

            /* execute each task */
            t1.Start();
            t2.Start();

            /* wait for both tasks to complete */
            t1.Wait();
            t2.Wait();
        }
    }
}

Here's a sample of possible output:

A
A
A
A
A
A
B
B
B
B
A
A
A
A
B
B
B

In that example two tasks run more or less concurrently. The real advantage of the thread pool in the back end is that I could queue up more tasks than my processors could handle and the pool would manage how they're worked through.

Simpler still you can execute tasks with the Invoke static method of the Parallel class. Note that the Invoke method blocks until all operations passed-in have completed.

 Parallel.Invoke(
    () => { /* operation 1 */
        for(int i = 0; i <= 255; i++) {
            Console.WriteLine("A");
        }
    },
    () => /* operation 2 */
    {
        for (int i = 0; i <= 255; i++)
        {
            Console.WriteLine("B");
        }
    }
 );

The Parallel class boasts a few other goodies as we'll see below.

Parallel Looping

With the basics out of the way lets look at a real time-saver. It's common to simply repeat an operation a number of times or execute on operation on each item in a list. The TPL provides a means of accomplishing that in parallel with the For and ForEach static methods of the Parallel class.

Here's a Parallel.For example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Parallel.For(0, 24, i => {
                        Console.WriteLine(i);
                    }
                );
        }
    }
}

Which could produce output similar to:

0
1
2
12
13
3
4
5
6
8
9
10
11
15
16
17
14
18
19
20
21
22
23
7

Parallel.ForEach is similar except it iterates an IEnumerable. For demonstrative purposes I'll create the functional equivalent of the example above except it will iterate a list of integers.

Parallel.ForEach<int>(Enumerable.Range(0, 32), n =>
{
    Console.WriteLine(n);
    }
);

Parallel LINQ

Another useful parallel processing tool available in .Net 4.0 is Parallel LINQ, or PLINQ. There really isn't much to it. Consider the following example which looks like your typical LINQ operation except the AsParallel method of IEnumerable is called resulting in an ParallelEnumerable. This allows LINQ operations to be performed in a parallel fashion.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication2
{
    class Program
    {
        struct Item {
            public int Id;
            public int Value;
        }

        static void Main(string[] args)
        {
            List<Item> items = new List<Item>();

            Random rand = new Random();

            foreach (int i in Enumerable.Range(0, 256)) 
            {
                items.Add(new Item() { Id = i, Value = rand.Next(10) });
            }

            int ct = (from item in items.AsParallel()
                          where item.Value > 5
                          select item).Count();

            Console.WriteLine(ct);
        }
    }
}

An interesting and simple twist on the above sample that would look familiar to T-SQL developers is that it's easy to specify the maximum number of cores to be used with ParallelEnumerable's WithDegreeOfParallelism method.

int ct = (from item in items.AsParallel().WithDegreeOfParallelism(2)
              where item.Value > 5
              select item).Count();

Conclusion

Based on the examples above it's clear that parallel programming has become at least a bit simpler with .Net 4.0. I've only scratched the surface here. There are still many more parallel tools (some good, some bad) available in .Net 4.0 and I recommend checking out the Parallel Programming documentation at MSDN for a more complete story.

Mon Dec 14 2009 22:12:00 GMT+0000 (UTC)

Follow Chris
RSS Feed
Twitter
Facebook
CodePlex
github
LinkedIn
Google