Unit 2: Shared Memory

 

 

High-level Performance Concepts/Lessons

 

Cache Coherence, False Sharing: locations accessed on the same cache line may cause unnecessary contention

 

Memory bandwidth: insufficient bandwidth can have a negative impact on performance

 

Scalability and Locking: Locking has an overhead

·         Coarse-grained locking may destroy the parallelism in your application

·         Excessive lock acquisition and release can cause too much overhead

 

Data Partitioning: If parallel operations execute on partitioned data, may be able to avoid locking

·         Striping refactors a computation to execute a sequence of parallel operations, each on disjoint data

 

 

High-level Correctness Concepts/Lessons

 

Atomicity Violations: Sometimes, sections of code are conceptually grouped together

·         Atomic statement sequences appear to execute without interruption, from the perspective of all other threads

 

Data races: two concurrent accesses to a memory location, at least one of which is a write

·         Can cause strange errors in code because of relaxed memory models

·         Difficult to reason about

 

Data-Race-Free Discipline: writing data-race-free programs helps to avoid bugs

·         Makes code robust against weak memory models

·         Makes data race detectors more useful, e.g. to find atomicity problems

·         Data Race Prevention techniques: Isolation, Immutability, and Synchronization

 

Isolation: Variables accessed by only one thread are implicitly protected from data races

 

Immutability: Read only variables are easy to reason about, even when multiple threads are executing concurrently

 

Synchronization: Protect access to variables with locks

·         Need to consistently protect the same variable with the same lock

·         Can cause problems (e.g.  deadlock)

 

Deadlock: cyclic lock acquiring can cause all threads to block

·         Lock leveling: avoid deadlock by only acquiring locks in a global order

 

 

Code-level Concepts/Lessons

 

 

Sample Learning Outcomes

 

·         Identify data races in code examples.

·         Be able to implement strategies for fixing data races.

·         Recognize typical performance issues with naïve parallelization attempts

·         Write code using each of the design patterns, and explain how the design patterns help improve parallelization and correctness.

·         Be able to identify false sharing in code.

 

 

Assignment Ideas

 

Take the program parallelized after Unit 1. Analyze the program for data races, and fix any data races that you find. Create a version of this program that contains a data race, an atomicity violation, and a deadlock. Now, take your cleaned program version (e.g. the one without concurrency errors introduced) and see if you can achieve more parallelization through one of the design patterns. Analyze why or why not speedup occurs.

 

Take code from the Parallel Programming with Microsoft® .NET: Design Patterns for Decomposition and Coordination on Multicore Architectures book. Write Alpaca tests for these code samples. See if you can find any hidden concurrency bugs!

 

 

Resources

               

Parallel Extensions Samples & Extras (http://code.msdn.microsoft.com/ParExtSamples):

·         FastBitmap

·         ParallelExtensionsExtras.ParallelAlgorithms

 

Parallel Programming with Microsoft .NET book (http://parallelpatterns.codeplex.com):

·         Patterns from Chapter 5 (Futures)

·         Chapter 7 (Pipelines)

·         Appendix B (Debugging and Profiling Parallel Applications)