Papers for Supplemental Reading
A list of papers that serve as supplemental reading for this course. Students are required to submit a paper summary to Gradescope for each of the assigned papers.
Technology/History
Paper #1
Robert P. Colwell, Charles Y. Hitchcock III, E. Douglas Jensen, H. M. Brinkley Sprunt, and Charles P. Kollar, “Instruction Sets and Beyond: Computers, Complexity, and Controversy,” in IEEE Computer, vol. 18, no. 9, pp. 8–19, September 1985.
Paper Optional
Gene M. Amdahl, “Validity of the Single Processor Approach to Achieving Large Computing Capabilities.”
Paper Optional
Mark Bohr, “A 30 Year Retrospective on Dennard’s MOSFET Scaling Paper.”
Branch Prediction
Paper #2
Tse-Yu Yeh and Yale N. Patt, “Two-Level Adaptive Training Branch Prediction,” in International Symposium on Microarchitecture (MICRO), 1991.
Paper Optional
James E. Smith, “A Study of Branch Prediction Strategies,” in International Symposium on Computer Architecture (ISCA), 1981.
Paper Optional
Scott McFarling, “Combining Branch Predictors (GShare and Tournament Predictors),” in Western Research Lab Technical Note, TN-36, 1999.
Instruction Level Parallelism
Paper #3
Eric Rotenberg, Steve Bennett, and James E. Smith, “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching,” in International Symposium on Microarchitecture (MICRO), 1996.
Paper #4
Michael S. Schlansker and B. Ramakrishna Rau, “EPIC: Explicitly Parallel Instruction Computing,” in IEEE Computer, vol. 33, no. 2, pp. 37–45, February 2000.
Superscalar/Out-of-Order Execution
Paper #5
Subbarao Palacharla, Norman P. Jouppi, and J. E. Smith, “Complexity-Effective Superscalar Processors,” in Proceedings of the International Symposium on Computer Architecture (ISCA), June 1997.
Paper #6
Glenn Hinton, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, and Patrice Roussel, “The Microarchitecture of the Pentium 4 Processor,” in Intel Technology Journal Q1, 2001.
Paper #7
Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan, “Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors,” in Proceedings of the International Symposium on Microarchitecture (MICRO), Dec 2003.
Caches/Memory Systems
Paper #8
Norman P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” in Proceedings of the International Symposium on Computer Architecture (ISCA), May 1990.
Paper #9
Changkyu Kim, Doug Burger, and Stephen W. Keckler, “An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches,” in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002.
Tutorial Slides Optional
Bruce Jacob and David Wang, “DRAM: Architectures, Interfaces, and Systems (A Tutorial),” held in conjunction with the International Symposium of Computer Architecture (ISCA), June 2002.
Paper Optional
James E. Smith and Ravi Nair, “The Architecture of Virtual Machines,” in IEEE Computer, vol. 38, no. 5, pp. 32–38, May 2005.
Multithreading
Paper #10
Dean M. Tullsen, Susan J. Eggers, and Hank M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” in Proceedings of the International Symposium on Computer Architecture (ISCA), June 1995.
Paper #11
Gurindar S. Sohi, Scott E. Breach, and T.N. Vijaykumar, “Multiscalar Processors,” in Proceedings of the International Symposium on Computer Architecture (ISCA), June 1995.
Multicore/Multiprocessor
Paper #12
Trevor Mudge, “Power: A First-Class Architectural Design Constraint,” in IEEE Computer, vol. 34, no. 4, pp. 52–58, April 2001.
Paper #13
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas, “Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance,” in Proceedings of the International Symposium on Computer Architecture (ISCA), June 2004.
Memory Consistency/Cache Coherence
Synthesis Lecture Optional
Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood, “A Primer on Memory Consistency and Cache Coherence (Second Edition).” Synthesis Lectures on Computer Architecture, 2020.
Warehouse Scale Computers
Paper #14
Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, and Krste Asanovic, “FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud,” in Proceedings of the International Symposium on Computer Architecture (ISCA), June 2018.
Paper Optional
Luiz André Barroso and Urs Hölzle, “The Case for Energy-Proportional Computing,” in IEEE Computer, vol. 40, no. 12, pp. 33–37, December 2007.
GPU
Paper #15
Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym, “NVIDIA Tesla: A Unified Graphics and Computing Architecture,” in IEEE Micro, vol. 28, no. 2, pp. 39–55, April 2008.
Paper #16
Gabin Schieffer, Jacob Wahlgren, Jie Ren, Jennifer Faj, and Ivy Peng, “Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper,” in *Proceedings of the International Conference on Parallel Processing (ICPP), August 2024.
Sustainability
Paper #17
Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, and Kim Hazelwood, “Sustainable AI: Environmental Implications, Challenges and Opportunities,” in Proceedings of the Conference on Machine Learning and Systems (MYSys), August 2022.