UPDATE: Microsoft has published a great article on this topic called Patterns of Parallel Programming, be sure to check it out.
Expanding my earlier post on this very topic, I decided to convert my Stack Overflow answer into a full-blown post.
As it often happens in software development industry, our ability to solve problems is often limited by our choice of paradigms. Good-old familiar tools lock us down to ancient paradigms, creating the infamous hammer-and-nail antipattern. This was evident to me with my own attitude towards LINQ: being familiar with LINQ and occasionally using it in my projects for over a year, only recently did I realize how many of our usual FOR and FOREACH constructs can be replaced with profoundly more elegant and concise LINQ queries.
Similar, and again, LINQ-inspired, was my discovery of functional programming paradigms: map, fold, and filter. The thought that any possible operation on single set of items can be expressed via these three, and more importantly, how following functional programming conventions can make my code more standard and therefore easier to understand, was an important step in my personal devolution. Long story short, viva la immutability!
Now, this post is not about those important discoveries. It’s about more mundane matters, specifically, parallelization. But enough ramblings, my faithful reader, and let’s hope you can see how the point I was making with the above examples relates to the rest of the post.
For one reason or another, people often want to execute stuff in parallel. Leaving aside why this may or may not be a good idea, let’s focus on how they do it. Most people know there’s such magical thing as the Thread Pool and blindly feed it everything but the kitchen sink. Let’s assume we have a bunch of items, say strings, that we need to process, preferrably in parallel.
Usual approach then would be to do this:
foreach (string someString in arrayStrings){
ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}
public void DoSomething(object data)
{
... do some processing...
}
But how do I wait for all items to complete before I can proceed? Let’s add a counter variable, start at the number of items, and wait until it’s down to zero:
this.counter = arrayStrings.Length; // Global variable
foreach (string someString in arrayStrings) {
ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}
while (this.counter > 0) Thread.Sleep(100);
...proceed with the rest of your logic...
public void DoSomething(object data)
{
... do some processing...
this.counter--;
}
Don’t laugh yet, this is exactly how most people solve this first, before they learn all the quirks of multithreading. Wait, everybody heard about these nasty race conditions? Let’s throw some locking into the mix:
this.counter = arrayStrings.Length;
this.counterLock = new object();
foreach (string someString in arrayStrings) {
ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}
while (this.counter > 0) Thread.Sleep(100);
...proceed with the rest of your logic...
public void DoSomething(object data)
{
... do some processing...
lock(this.counterLock){
this.counter--;
}
}
Can we get any better than this? Yes, by using the Interlocked:
this.counter = arrayStrings.Length;
this.counterLock = new object();
foreach (string someString in arrayStrings) {
ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}
while (this.counter > 0) Thread.Sleep(100);
...proceed with the rest of your logic...
public void DoSomething(object data)
{
... do some processing...
Interlocked.Decrement(ref this.counter);
}
This looks… almost professional
(You see, you can’t be almost professional, just like your girlfriend can’t be almost pregnant.) This Thread.Sleep looks ugly to me. Chances are, the main thread will sleep for as much as 100ms after all worker threads are done. Let’s see if we can add some ManualResetEvent love:
List<ManualResetEvent> statusFlags = new List<ManualResetEvent>();
foreach (string someString in arrayStrings) {
var flag = new ManualResetEvent(false);
statusFlags.Add(flag);
ThreadPool.QueueUserWorkItem(this.DoSomething, new AWrapperClass(someString, flag));
}
WaitHandle.WaitAll(statusFlags.ToArray());
...proceed with the rest of your logic...
public void DoSomething(object data)
{
... do some processing...
(data as AWrapperClass).flag.Set();
}
Hurray! This is perfect! Except that it got quite complicated with all this techy stuff and the AWrapperClass. You can feel proud showing off your …ummm… masterhood at the code reviews. Now, if you want to become a real master, you got to start thinking like one: how can I make this code trivial? Let’s see what .NET 4.0 holds for us. Let’s start with CountdownEvent, as it is the closest thing there’s to our initial counter idea:
this.flagFinish = new CountdownEvent(arrayStrings.Length); <=== INITIALIZING COUNT
foreach (string someString in arrayStrings) {
ThreadPool.QueueUserWorkItem(this.DoSomething, someString);
}
flagFinish.Wait();
...proceed with the rest of your logic...
void DoSomething(string someString){
... do some processing...
flagFinish.Decrement(); <== WORKER THREADS SIGNAL WHEN DONE
}
Nice and easy! Much cleaner than anything before! The only thing that worries me still is how the waiting logic is still separate from the queuing logic. Luckily for us there’s a new concept in .NET 4.0, called Task, that provides facilities for queuing as well as waiting, in the single package:
var tasks = new List<Task>();
foreach(var someString in arrayStrings)
{
tasks.Add(Task.Factory.StartNew(() => DoSomething(someString));
}
Task.WaitAll(tasks.ToArray());
Finally, there’s no need to show the DoSomething method. There’s nothing in there except business logic and das is good! Even more concise variation of same pattern would be to use Parallel.Invoke:
var actions = new List<Action>();
foreach(var someString in arrayStrings)
{
actions.Add(() => DoSomething(someString));
}
Parallel.Invoke(actions.ToArray());
Wait a second… There’s a little problem with this variant… When we used Tasks we started each task inside the foreach loop. Here, we only add them to the list and start them after the loop. Depending on your problem this may be better or worse. Particularly if preparing the task takes a long time and you want to make sure the previous tasks utilize that time to accomplish some of their processing. But don’t get stuck on these two. Believe it or not, there’s an even better option. Enter our final contestant, Parallel.ForEach:
Parallel.ForEach<string>(arrayStrings, someString =>
{
DoSomething(someString);
});
This is as good as it can get in C#, at least in its 4th incarnation. No control logic, no techy stuff, no nonsense to be proud off at code reviews. Just pure expressive power!
And so it is!
