Unit 2: Shared Memory
High-level Performance Concepts/Lessons
Cache Coherence, False Sharing: locations accessed on the same cache line may cause unnecessary contention
Memory bandwidth: insufficient bandwidth can have a negative impact on performance
Scalability and Locking: Locking has an overhead
· Coarse-grained locking may destroy the parallelism in your application
· Excessive lock acquisition and release can cause too much overhead
Data Partitioning: If parallel operations execute on partitioned data, may be able to avoid locking
· Striping refactors a computation to execute a sequence of parallel operations, each on disjoint data
High-level Correctness Concepts/Lessons
Atomicity Violations: Sometimes, sections of code are conceptually grouped together
· Atomic statement sequences appear to execute without interruption, from the perspective of all other threads
Data races: two concurrent accesses to a memory location, at least one of which is a write
· Can cause strange errors in code because of relaxed memory models
· Difficult to reason about
Data-Race-Free Discipline: writing data-race-free programs helps to avoid bugs
· Makes code robust against weak memory models
· Makes data race detectors more useful, e.g. to find atomicity problems
· Data Race Prevention techniques: Isolation, Immutability, and Synchronization
Isolation: Variables accessed by only one thread are implicitly protected from data races
Immutability: Read only variables are easy to reason about, even when multiple threads are executing concurrently
Synchronization: Protect access to variables with locks
· Need to consistently protect the same variable with the same lock
· Can cause problems (e.g. deadlock)
Deadlock: cyclic lock acquiring can cause all threads to block
· Lock leveling: avoid deadlock by only acquiring locks in a global order
Code-level Concepts/Lessons
Sample Learning Outcomes
· Identify data races in code examples.
· Be able to implement strategies for fixing data races.
· Recognize typical performance issues with naïve parallelization attempts
· Write code using each of the design patterns, and explain how the design patterns help improve parallelization and correctness.
· Be able to identify false sharing in code.
Assignment Ideas
Take the program parallelized after Unit 1. Analyze the program for data races, and fix any data races that you find. Create a version of this program that contains a data race, an atomicity violation, and a deadlock. Now, take your cleaned program version (e.g. the one without concurrency errors introduced) and see if you can achieve more parallelization through one of the design patterns. Analyze why or why not speedup occurs.
Resources
Parallel Extensions Samples & Extras (http://code.msdn.microsoft.com/ParExtSamples):
· FastBitmap
· ParallelExtensionsExtras.ParallelAlgorithms
Parallel Programming with Microsoft .NET book (http://parallelpatterns.codeplex.com):
· Patterns from Chapter 5 (Futures)
· Chapter 7 (Pipelines)
· Appendix B (Debugging and Profiling Parallel Applications)