Spring 2015: CS/EE 6810 Computer Architecture
More recent versions of this class have moved here.
General Information:
- Venue: WEB L103
- Time: Monday, Wednesday 11:50am - 1:10pm
- Instructor: Rajeev Balasubramonian, email: rajeev@cs, MEB 3414, office hours: by appointment
- Pre-Requisite: CS 3810 or equivalent
- TAs (email/office hours): Akhila Gundu (akhila.gundu@utah.edu / Fri 12-1pm), Sahil Koladiya (sahilkoladiya@gmail.com / Tue 1-2pm), Shravanthi Manohar (shravantimanohar@gmail.com / Tue 2-3pm). All TA office hours will be held in the Utah Arch Lab (MEB 2180).
- Textbook: Computer Architecture A Quantitative Approach - 5th Edition, John Hennessy and David Patterson
- Class mailing list: cs6810@list.eng.utah.edu. Visit the mailman system to sign up or modify.
Prerequisites:
You are expected to know introductory computer architecture concepts,
such as those covered in CS 3810 (textbook for 3810: Computer Organization
and Design, Patterson and Hennessy, 5th edition). You will be well-served
to re-visit some of the basic concepts in the 3810 textbook before the
first day of class.
College of Engineering Add/Drop Policy:
Guidelines from the college.
Special Needs:
The University of Utah seeks to provide equal access to its programs,
services and activities for people with disabilities. If you will
need accommodations in the class, reasonable prior notice needs to be
given to the Center for Disability Services, 162 Olpin Union Building,
581-5020 (V/TDD). CDS will work with you and the instructor to make
arrangements for accommodations.
All written information in this course can be made available in
alternative format with prior notification to the Center for
Disability Services.
Grading:
The following is a tentative guideline and may undergo changes.
Two exams will count for 50% of the final grade; one will be in March
and the other during Finals week.
The remaining 50% will be based on homework assignments.
We have zero tolerance for cheating -- if your class rank in the
assignments is significantly different from your class rank in
the exams, only your rank in the exams will count towards your
grade. We know you're juggling multiple activities and the
assignment deadline may not always be favorable. You are
therefore allowed to skip one of the assignments -- use
this freebie prudently. Late submissions will not be graded.
Modified Flipped Classroom Model
Videos based on every lecture are already posted on YouTube. Students
will learn best if they view the videos before the topic is discussed in
class. The class discussion will not go into every detail and will assume
some familiarity with the topic. This frees up time in class to work out
a few example problems. Assuming students watch videos beforehand, this
should lead to more efficient and effective learning. You'll spend time
outside class watching videos, but you'll spend a lot less time working
on assignments.
Class Schedule
YouTube Video Playlist
Homework Solutions
- Mo 12th Jan:
Logistics and introduction. Evaluating computer systems.
Reading: review the pre-req textbook for CS 3810: Computer Organization and Design, Patterson and Hennessy.
Reading: 6810 textbook: Sections 1.1-1.5, 1.8-1.10.
No YouTube Video for this class.
Slides:
(powerpoint)
(pdf)
- We 14th Jan:
Performance Metrics.
Reading: Sections 1.5-1.10.
Remember to sign up for the class mailing list!!
YouTube Video 1 (Benchmarks, latency, throughput, sum of exec times, sum of weighted exec times)
YouTube Video 2 (Geometric mean, CPU performance equation)
YouTube Video 3 (AM/HM/GM of IPCs, speedup/improvement)
YouTube Video 21 (Power and energy basics)
YouTube Video 22 (DFS and DVFS examples)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #1 handed out .
- Mo 19th Jan: holiday
- We 21st Jan:
Pipelining basics.
Reading: Sections C.1-C.4.
YouTube Video 4 (Pipelining analogy and basic instruction pipeline)
YouTube Video 5 (The role of clocks and latches, pipelining equations)
YouTube Video 6 (Description of the basic 5-stage pipeline)
YouTube Video 7 (Explains RISC/CISC architectures and Load/Store instructions)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #1 due.
- Mo 26th Jan:
Pipelining Hazards.
Reading: Sections C.4-C.8.
Remember to sign up for the class mailing list!!
YouTube Video 8 (Structural hazards)
YouTube Video 9 (Data dependences, data hazards, pipeline stalls)
YouTube Video 10 (Data forwarding/bypassing, examples)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #2 handed out .
- We 28th Jan:
Pipelining extensions.
Reading: Sections C.4-C.8.
YouTube Video 11 (Control hazards, branch delay slots)
YouTube Video 12 (Multi-cycle instructions, handling out-of-order instruction completions)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 2nd Feb:
Compiler-based (Static) ILP.
Reading: Sections C.5, 3.2.
YouTube Video 13 (Precise exceptions and the reorder buffer)
YouTube Video 14 (The performance vs. pipeline depth curve)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #2 due.
Homework: #3 handed out .
- We 4th Feb:
Loop optimizations.
Reading: Sections C.5, 3.2.
YouTube Video 15 (Introduction to the compiler-based approach for high ILP)
YouTube Video 16 (Smart compiler scheduling and loop unrolling)
YouTube Video 17 (VLIW and software pipelining)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 9th Feb:
Branch Prediction.
Reading: Section 3.3.
YouTube Video 23 (Branch predictor introduction and 1-bit bimodal predictor)
YouTube Video 24 (2-bit predictor, indexing into a branch predictor table)
YouTube Video 25 (Global predictor)
YouTube Video 26 (Local predictor, tournament predictor, branch target buffer)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #3 due.
- We 11th Feb:
Compiler-based ILP.
Reading: Sections C.5, 3.2.
YouTube Video 18 (Handling control hazards with predication)
YouTube Video 19 (Hoisting above a branch, handling exceptions)
YouTube Video 20 (Memory dependences and hoisting a load before a store)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #4 handed out .
- Mo 16th Feb: holiday
- We 18th Feb:
Out-of-order processor basics.
Reading: Detailed Notes on Out-of-order execution.
YouTube Video 27 (Out-of-order design 1, with a rename register file, part 1)
YouTube Video 28 (Out-of-order design 1, with a rename register file, part 2)
YouTube Video 29 (Out-of-order design 2, with a physical register file, part 1)
YouTube Video 30 (Out-of-order design 2, with a physical register file, part 2)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #4 due.
- Mo 23rd Feb:
Out-of-order processors.
Reading: Detailed Notes on Out-of-order execution.
YouTube Video 31 (Stalls, issue width, window size, WAW and WAR hazards)
YouTube Video 32 (Handling branch mispredicts, waking up dependents in an issue queue)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #5 handed out .
- We 25th Feb:
LSQ, Out-of-order wrap-up.
Reading: Detailed Notes on Out-of-order execution.
YouTube Video 33 (Handling memory dependences with a load-store queue)
YouTube Video 34 (What is simultaneous multithreading (SMT), what resources are shared/private in SMT)
YouTube Video 35 (Performance impact of SMT)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 2nd Mar:
SMT, Cache basics.
Reading: B.1-B.3, 2.1.
YouTube Video 36 (Organization of a multi-core cache hierarchy)
YouTube Video 37 (Organization of a single cache -- sets, ways, tags, associativity, direct-mapped, index, offset)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #5 due.
- We 4th Mar:
Cache innovations.
Reading: 2.1-2.2.
YouTube Video 38 (Compulsory, capacity, and conflict misses)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 9th Mar: Pre-exam review, large caches.
Reading: Sections 2.4, B.4, B.5.
YouTube Video 39 (Inclusion, handling writes, serial-tag-data access)
YouTube Video 40 (Victim caches, replacement policies, stream prefetchers)
YouTube Video 41 (Large caches, last level cache (LLC), shared and private caches)
YouTube Video 42 (UCA/NUCA caches, a tiled shared LLC)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- We 11th Mar:
Mid-Term Exam (open book, open notes), based on material covered until Spring Break (Chapters 1, 3, C, B, 2.1-2.2).
Homework: #6 handed out .
- Mo 16th Mar: Spring Break
- We 18th Mar: Spring Break
- Mo 23rd Mar:
Virtual memory, TLB/Cache.
Reading: 2.3.
YouTube Video 43 (Virtual memory, page tables)
YouTube Video 44 (Page tables, TLBs)
YouTube Video 45 (Accessing the TLB and cache, aliasing)
YouTube Video 46 (Virtually Indexed Physically Tagged Cache)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- We 25th Mar:
Memory Systems.
Reading: 2.3. Detailed notes on memory systems.
YouTube Video 47 (DRAM cell intro, DRAM chips, DIMMs, memory channel/bus, memory rank)
YouTube Video 48 (DRAM banks, arrays, RAS/CAS, row buffers, DDR)
YouTube Video 49 (More details on rank, bank, array, row buffer)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #6 due.
- Mo 30th Mar:
Main Memory Innovations.
YouTube Video 50 (Row buffer management, row buffer hit/miss/conflict, open/close page policies)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #7 handed out .
- We 1st Apr:
Memory systems wrap-up.
YouTube Video 51 (Handling reads and writes, write buffer, address mapping policies)
YouTube Video 52 (Scheduling policies - FCFS, FR-FCFS, Stall Time Fair, refresh, error correction)
YouTube Video 53 (State-of-the-art memory systems, buffer chips, increasing bandwidth and capacity, 3D stacking)
YouTube Video 54 (Emerging non-volatile cells (phase change memory - PCM), silicon photonics)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 6th Apr:
Multi-threading, coherence.
Reading: Sections 5.1-5.5.
YouTube Video 55 (Symmetric shared-memory multiprocessors, distributed shared-memory multiprocessors)
YouTube Video 56 (Overview of shared-memory and message-passing programming models)
YouTube Video 57 (Example of the Ocean kernel and its parallelization with shared-memory)
YouTube Video 58 (Example of the Ocean kernel and its parallelization with message-passing)
YouTube Video 59 (Introduction to cache coherence protocols, write propagation, write serialization, snooping-directory, write update-invalidate)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #7 due.
Homework: #8 handed out .
- We 8th Apr:
Coherence protocols, intro to synchronization.
Reading: 5.1-5.5.
YouTube Video 60 (Detailed example of a snooping-based protocol -- part 1)
YouTube Video 61 (Detailed example of a snooping-based protocol -- part 2)
YouTube Video 62 (Detailed example of a directory-based protocol -- part 1)
YouTube Video 63 (Detailed example of a directory-based protocol -- part 2)
YouTube Video 64 (Synchronization primitives, atomic exchange, test and set)
YouTube Video 65 (Effect of caching locks, test and test and set)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 13th Apr:
Synchronization, Consistency models.
Reading: Sections 5.5-5.7.
YouTube Video 66 (Load-linked and store-conditional for constructing locks)
YouTube Video 67 (Further reducing coherence traffic with ticket and array-based locks)
YouTube Video 68 (Example multi-threaded programs and sequentially consistent results)
YouTube Video 69 (Hardware support for sequential consistency, example of how SC is violated if program order is violated)
YouTube Video 70 (Example on how a coherence protocol may violate write atomicity and sequential consistency, hardware support for sequential consistency, safe optimizations to speed up the hardware)
YouTube Video 71 (A hardware-software approach to improving performance with relaxed consistency models and fences)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #9 handed out .
- We 15th Apr: No class.
Homework: #8 due (email the TAs a pdf).
- Mo 20th Apr:
Consistency models, Transactional Memory.
YouTube Video 72 (Motivation for transactional memory, problems when using locks)
YouTube Video 73 (Overview of hardware transactional memory semantics and implementation)
YouTube Video 74 (Detailed implementation of hardware transactional memory (lazy-lazy technique))
YouTube Video 75 (Handling problems in transactional memory (cache evictions, starvation), using a commit token to handle these problems)
YouTube Video 76 (The transactional memory design space, versioning, conflict detection, lazy-lazy, and eager-eager)
YouTube Video 77 (Detailed description of the eager-eager implementation, handling deadlocks, livelocks, starvation)
Slides for the videos:
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
Homework: #9 due.
Homework: #10 handed out .
- We 22nd Apr:
Interconnection Networks.
Reading: Appendix F.
YouTube Video 78 (Introduction to network-on-chips, deterministic and adaptive routing, deadlock example in networks and turn model)
YouTube Video 79 (Turn model and deadlock avoidance with adaptive routing, numbering links to prove deadlock freedom)
YouTube Video 80 (Defining messages, packets, flits)
YouTube Video 81 (Flow control, bufferless, circuit switching, store-and-forward, cut-through, wormhole routing)
YouTube Video 82 (Virtual channels)
YouTube Video 83 (Allocating resources (virtual channel, buffers, physical channel) before a hop, buffer management, deadlock avoidance with VCs)
YouTube Video 84 (Router power breakdown, router pipeline stages, RC-VA-SA-ST)
YouTube Video 85 (Speculative router pipelines with 1, 2, and 3 stages)
YouTube Video 86 (Crossbars, multi-stage crossbars (Omega network), bisection bandwidth)
YouTube Video 87 (Performance and cost for different topologies, k-ary d-cubes)
Slides for the videos:
(powerpoint)
(pdf)
(powerpoint)
(pdf)
Slides used in class:
(powerpoint)
(pdf)
- Mo 27th Apr:
Datacenters, GPUs, disks.
Reading: Sections 6.1-6.7.
Slides used in class:
(powerpoint)
(pdf)
Homework: #10 due.
- Mo 4th May: 10:30am-12:30pm
Final Exam (open book, open notes), based primarily on Caches, Memory systems, Multiprocessors, Transactional Memory, Datacenters, Interconnection Networks, Storage.