Unit 1: Imperative Data Parallelism
High-level Performance Concepts/Lessons
DAG model of computation: Simple Model of
Sequential/Parallel Code with widespread applicability (Series-Parallel graph)
Work/Span: analyzing an algorithm for
parallelism
·
Work is about the overall size of
the problem
·
Span measures the sequential portion
(critical path)
We want the problem/algorithm to have high
Parallelism
·
Work over Span is Parallelism
·
So, minimize Span
·
If you increase Span, better
increase Work
Amdahl's Law: Proportion of
Sequential Code Limits Speedup
Gustafson's Law: As problems get
very large, the hope is we can greatly reduce the proportion of sequential code
Partitioning: Tasks must be assigned to
processors
·
Uneven partitioning of tasks to processors may not
exploit the full parallelism available (load balancing)
·
Dynamic
partitioning can alleviate this problem at the expense of additional
synchronization overhead
Coordination
vs. Computation: There is an overhead to creating
tasks, or running iterations of a parallel loop
·
Programmers need to balance task
size: the work performed by each task/iteration must be enough to offset
coordination overheads
Task Parallelism vs. Data Parallelism:
·
loop parallelism works well for some
applications (e.g. Raytracing)
·
for some problems (irregular matrix
algorithms, algorithms over trees, etc), specific
asynchronous tasks are necessary
Wavefront Pattern: Can parallelize this diagonal sweep through a matrix of
partial results (useful for dynamic programming problems)
High-level Correctness Concepts/Lessons
DAG model of computation: Simple Model of
Sequential/Parallel Code with widespread applicability (Happens-Before graph)
Interference
of parallel delegates: both access
the same memory location and one writes the location
Determinism: the order in which events occur shouldn't affect final
result
·
The
happens-before graph places constraints on
the run-time.
·
The run-time can schedule the
actions in any order that respects the graph edges (topological sort).
·
If the actions don't interfere with
another, we will get a deterministic result
Assertions: Checkable statements which a programmer believes are true at a particular program point
Unit Testing: Isolate different parts of a program and show that each part
is correct
Code-level Concepts/Lessons
·
C# lambdas:
a way to write a function without a name
·
Parallel.For: with lambdas and .NET 4 libraries, we can easily
parallelize sequential for loops
·
Parallel.Invoke: gives us a simple way to invoke
different actions in parallel
·
Parallel.ForEach: allows loop parallelism with IEnumerable collections
·
Stopping
Parallel Loops: a way for
multiple iterations of Parallel loops to interact with each other
·
ParallelLoopState.{Stop/Break}: both terminate a parallel loop, break runs all
iterations with a lower index first
·
ParallelLoopResult: Returned
by Parallel.For and Parallel.ForEach
loops; a way to check loop termination conditions
·
AggregateException: Groups all
exceptions thrown during a parallel loop
·
Cancellation:
An
external way to terminate parallel loops
·
Task: an asynchronous operation
Sample Learning Outcomes
·
Parallelize
a for loop with Parallel.For
and achieve speedup (or be able to explain why no speedup occurred).
·
Write
a test to show that the parallel version of a piece of code computes the same
result as the sequential.
·
Analyze
an algorithm in terms of Work/Span and correctly predict the potential speedup.
·
Give
tight upper bounds for potential speedups given parallelization scenarios (e.g.
Amdahl's law), and explain why those upper bounds exist.
Assignment Ideas
Take a sequential program which contains some
computation-intensive for loops. (Example: one of the Olden benchmarks) and
attempt to speedup your benchmark by using Parallel.For loops. Some existing for loops cannot be
easily parallelized in this manner. For each for loop in the original program,
either explain why it was not appropriate to
parallelize the loop, or else measure the performance improvement from making
the loop parallel. Explain why speedup was or was not achieved for all
parallelized loops, and include performance test results in your submission.
Resources
Parallel Extensions Samples & Extras (http://code.msdn.microsoft.com/ParExtSamples):
·
GameOfLife
·
BlendImages
·
ImageColorizer
·
Morph
·
MandlebrotsFractals
·
ComputePi
·
Strassens
·
EditDistance
·
ParallelExtensionsExtras.Partitioners
·
ParallelExtensionsExtras .Extensions.
·
AggregateExceptionExtensions
·
CancellationTokenExtensions
·
CompletedTask
·
DelegateExtensions
·
TaskCompletionSourceExtensions
Parallel Programming with Microsoft .NET
book (http://parallelpatterns.codeplex.com):
·
Chapter 1 (Introduction)
·
Parallel.For/ForEach from Chapter 2 (Parallel Loops)
·
Tasks from Chapter 3 (Parallel Tasks)
·
Child Tasks from Chapter 6 (Dynamic Task
Parallelism)
·
Appendix B (Debugging and Profiling Parallel Applications)