Most of the reading links are based on DOI. You can find the papers in ACM DL, IEEE Explore, or on the Morgan Claypool website. You will have to log into the Campus VPN to access the papers.
Let us know on Teams if you can’t find the paper or can’t log into the VPN and we can upload a version of it on Teams for you.
NOTE: All chapter/section numbers are inclusive. I.e., if it’s Sections 4-4.2 you should read Sections 4, 4.1, 4.1.1, 4.1.2, and Section 4.2.
Hint
You’re encoraged to discuss the reading outside of class with your fellow classmates. You are welcome to use Teams to discuss the paper and ask questions. You may also find it useful to form “reading groups” to discuss the paper together.
Reading list
Week 1
Intro to High-performance Computer Architecture

Tuesday, April 4
Intro and technology
Required reading: Watch the 2019 Turing Lecture by Hennessy and Patterson. https://youtu.be/3LVeEjsn8Ts
Thursday, April 6
Required reading: IEEE MICRO papers on AMD’s Zen2 and Intel’s Skylake.
Optional/Reference: Wikichip’s coverage of Zen2 and Skylake.
Week 2
Tuesday, April 11

Cache coherence (intro) and memory consistency
Required reading: Synthesis Lecture: A Primer on Memory Consistency and Cache Coherence, Second Edition
- Chapter 1
- Chapter 2
- Chapter 3 (Skip 3.8-3.11)
- Sections 4.1, 4.2
- Sections 5.1, 5.2-5.2.2
- Optional: Sections 5.4 and 5.9
Thursday, April 13
Choice of papers for presentation on current trends in computer architecture.
- Dark Silicon and the End of Multicore Scaling
- Gables: A Roofline Model for Mobile SoCs
- ACT: designing sustainable computer systems with an architectural carbon modeling tool
- A Systematic Evaluation of Transient Execution Attacks and Defenses
- Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product
- Attack of the Killer Microseconds
- There’s plenty of room at the Top: What will drive computer performance after Moore’s law?
- Amdahl’s Law in the Multicore Era
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Week 3
Tuesday, April 18
Required reading: Synthesis Lecture: A Primer on Memory Consistency and Cache Coherence, Second Edition
- Chapter 6
- Sections 7-7.2.5
- Sections 8-8.2.6
- Optional: Chapter 11
Thursday, April 20
Paper presentations on memory consistency models. See paper list below.
- PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models
- TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
- Heterogeneous-race-free memory models
- Frightening Small Children and Disconcerting Grown-ups: Concurrency in the Linux Kernel
- Non-Speculative Load-Load Reordering in TSO
- Fast RMWs for TSO: semantics and implementation
- Atomic SC for simple in-order processors
- Efficient sequential consistency via conflict ordering
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Week 4
Tuesday, April 25
NO CLASS!
Thursday, April 27
Paper presentations on cache coherence protocols. See paper list below.
- Token coherence: decoupling performance and correctness
- Heterogeneous system coherence for integrated CPU-GPU systems
- In-Network Snoop Ordering (INSO): Snoopy coherence on unordered interconnects
- Cache coherence for GPU architectures
- HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols
- Spandex: A Flexible Interface for Efficient Heterogeneous Coherence
- DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
- Crossing Guard: Mediating Host-Accelerator Coherence Interactions
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Week 5
Tuesday, May 2

Required reading: Computer Architecture - A Quantitative Approach, 6th Edition,Appendix F (Interconnection Networks)
- Section F.2 and Section F.3 (skim through these sections quickly)
- Section F.4
- Section F.5
- Section F.6
- Section F.8
Thursday, May 4
Project presentations.
This week, you will present a 5 minute “lightning” talk on the problem you are going to work on. See the project page for details.
Week 6
Tuesday, May 9
Paper presentations on on-chip networks (OCNs). See paper list below.
- Flattened Butterfly Topology for On-Chip Networks
- Design tradeoffs for tiled CMP on-chip networks
- An In-Depth Analysis of the Slingshot Interconnect
- Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics
- Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
- Technology-Driven, Highly-Scalable Dragonfly Topology
- Network-on-Chip Microarchitecture-based Covert Channel in GPUs
- Experiences with ML-Driven Design: A NoC Case Study
-
CryoWire: wire-driven microarchitecture designs for cryogenic computing
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Thursday, May 11

Hardware support for virtualization
Required reading: Synthesis Lecture: Hardware and Software Support for Virtualization
- Chapter 1
- Sections 2-2.2
- Sections 3.2, 3.3
- Chapter 4
- Chapter 5
Week 7
Tuesday, May 16
Warehouse scale computing
Required reading: The Datacenter as a Computer Designing Warehouse-Scale Machines, Third Edition
- Chapter 1
- Sections 2-2.3, 2.6.1
- Sections 3-3.2
- Section 5-5.3.1
Thursday, May 18
Paper presentations on hardware support for virtual machines. See paper list below.
- The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup
- Efficient virtual memory for big memory servers
- Translation caching: skip, don’t walk (the page table)
- CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization
- Reconfigurable Virtual Memory for FPGA-Driven I/O
- Every walk’s a hit: making page walks single-access cache hits
- Parallel virtualized memory translation with nested elastic cuckoo page tables
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Week 8
Tuesday, May 23
Paper presentations on warehouse-scale computers. See paper list below.
- Software-Defined Far Memory in Warehouse-Scale Computers
- Attack of the killer microseconds
- Cores that don’t count
- Architectural Implications of Function-as-a-Service Computing
- SoftSKU: optimizing server architectures for microservice diversity @scale
- Clearing the clouds: a study of emerging scale-out workloads on modern hardware
- Profiling a warehouse-scale computer
- AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers
Thursday, May 25
Required reading: General-Purpose Graphics Processor Architectures
- Chapter 1
- Sections 2-2.1
- Sections 3-3.1.1
- Sections 4-4.3.3
Week 9
Tuesday, May 30
Required reading: Data Orchestration in Deep Learning Accelerators
- Chapter 1
- Sections 2-2.3
- Sections 3-3.2.1
- Sections 6-6.2.2
Thursday, June 1
Paper presentations on GPUs. See paper list below.
- Thread block compaction for efficient SIMT control flow
- Energy-efficient mechanisms for managing thread context in throughput processors
- Cache conscious wavefront scheduling
- MCM-GPU: Multi-chip-module GPUs for continued performance scalability
- Chimera: Collaborative preemption for multitasking on a shared GPU
- Mitigating GPU Core Partitioning Performance Effects
- Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Week 10
Tuesday, June 6
Paper presentations on DNN accelerators. See paper list below.
- Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
- From high-level deep neural models to FPGAs
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
- Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
- TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
- SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
-
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
- User-selected paper. Ask Prof. Jason Lowe-Power at least 1 week in advance for approval.
Thursday, June 8
Project presentations.
This day, you will present 10 minute presentations on your project proposal. This is a pitch to see if you can get the class to buy your solution. See the project page for details.