Unit 1: Imperative Data Parallelism

 

 

High-level Performance Concepts/Lessons

 

DAG model of computation: Simple Model of Sequential/Parallel Code with widespread applicability (Series-Parallel graph)

 

Work/Span: analyzing an algorithm for parallelism

·         Work is about the overall size of the problem

·         Span measures the sequential portion (critical path)

 

We want the problem/algorithm to have high Parallelism

·         Work over Span is Parallelism

·         So, minimize Span

·         If you increase Span, better increase Work  

 

Amdahl's Law: Proportion of Sequential Code Limits Speedup

 

Gustafson's Law: As problems get very large, the hope is we can greatly reduce the proportion of sequential code

  

Partitioning: Tasks must be assigned to processors

·          Uneven partitioning of tasks to processors may not exploit the full parallelism available (load balancing)

·         Dynamic partitioning can alleviate this problem at the expense of additional synchronization overhead

 

Coordination vs. Computation: There is an overhead to creating tasks, or running iterations of a parallel loop

·         Programmers need to balance task size: the work performed by each task/iteration must be enough to offset coordination overheads

 

Task Parallelism vs. Data Parallelism:

·         loop parallelism works well for some applications (e.g. Raytracing)

·         for some problems (irregular matrix algorithms, algorithms over trees, etc), specific asynchronous tasks are necessary

 

Wavefront Pattern: Can parallelize this diagonal sweep through a matrix of partial results (useful for dynamic programming problems)

 

 

High-level Correctness Concepts/Lessons

 

DAG model of computation: Simple Model of Sequential/Parallel Code with widespread applicability (Happens-Before graph)

 

Interference of parallel delegates: both access the same memory location and one writes the location

 

Determinism: the order in which events occur shouldn't affect final result

·         The happens-before graph places constraints on the run-time.

·         The run-time can schedule the actions in any order that respects the graph edges (topological sort).

·         If the actions don't interfere with another, we will get a deterministic result

 

Assertions: Checkable statements which a programmer believes are true at a particular program point

 

Unit Testing: Isolate different parts of a program and show that each part is correct

 

 

Code-level Concepts/Lessons

 

·         C# lambdas: a way to write a function without a name

·         Parallel.For: with lambdas and .NET 4 libraries, we can easily parallelize sequential for loops

·         Parallel.Invoke: gives us a simple way to invoke different actions in parallel 

·         Parallel.ForEach: allows loop parallelism with IEnumerable collections

·         Stopping Parallel Loops: a way for multiple iterations of Parallel loops to interact with each other

·         ParallelLoopState.{Stop/Break}: both terminate a parallel loop, break runs all iterations with a lower index first

·         ParallelLoopResult: Returned by Parallel.For and Parallel.ForEach loops; a way to check loop termination conditions

·         AggregateException: Groups all exceptions thrown during a parallel loop

·         Cancellation: An external way to terminate parallel loops

·         Task: an asynchronous operation

 

 

Sample Learning Outcomes

 

·         Parallelize a for loop with Parallel.For and achieve speedup (or be able to explain why no speedup occurred).

·         Write a test to show that the parallel version of a piece of code computes the same result as the sequential.

·         Analyze an algorithm in terms of Work/Span and correctly predict the potential speedup.

·         Give tight upper bounds for potential speedups given parallelization scenarios (e.g. Amdahl's law), and explain why those upper bounds exist.

 

 

Assignment Ideas

 

Take a sequential program which contains some computation-intensive for loops. (Example: one of the Olden benchmarks) and attempt to speedup your benchmark by using Parallel.For loops. Some existing for loops cannot be easily parallelized in this manner. For each for loop in the original program, either explain why it was not appropriate to parallelize the loop, or else measure the performance improvement from making the loop parallel. Explain why speedup was or was not achieved for all parallelized loops, and include performance test results in your submission.

 

 

Resources

               

Parallel Extensions Samples & Extras (http://code.msdn.microsoft.com/ParExtSamples):

·         GameOfLife

·         BlendImages

·         ImageColorizer

·         Morph

·         MandlebrotsFractals

·         ComputePi

·         Strassens

·         EditDistance

·         ParallelExtensionsExtras.Partitioners

·         ParallelExtensionsExtras .Extensions.

·         AggregateExceptionExtensions

·         CancellationTokenExtensions

·         CompletedTask

·         DelegateExtensions

·         TaskCompletionSourceExtensions

 

Parallel Programming with Microsoft .NET book (http://parallelpatterns.codeplex.com):

·         Chapter 1 (Introduction)

·         Parallel.For/ForEach from Chapter 2 (Parallel Loops)

·         Tasks from Chapter 3 (Parallel Tasks)

·         Child Tasks from Chapter 6 (Dynamic Task Parallelism)

·         Appendix B (Debugging and Profiling Parallel Applications)