Computer Architecture. A Quantitative Approach. Fifth Edition. John L. Hennessy.
Stanford University. David A. Patterson. University of California, Berkeley.
Computer Architecture A Quantitative
Approach
Fifth Edition
John L. Stanford
Hennessy
University
David A. Patterson University of California, Berkeley With Contributions
by
Krste Asanovid
Norman P. Jouppi HP Lobs
University of California Berkeley
Sheng
Jason D. Bakos
HP Labs
University of South Carolina
Naveen Muralimanohar
Robert P. Colwell
HP Labs
R&E Colwell& Assoc Inc.
Gregory
Thomas M. Conte
Universitat Politecnica de Valencia and Simula
University of Tennessee Timothy M. Pinkston University of Southern California Parthasarathy Ranganathan
Diana Franklin
HP Labs
University of California, Santa Barbara
David A. Wood
David
University of Wisconsin-Madison
The
Amr
North Carolina State University Jose Duato
Goldberg Scripps Research Institute
Li
D. Peterson
Zaky
IMventy of Santa Clam TECHNISCHE INFORMATIONS 81BLIOTHEK
UNIVERSITATSGIBLIOTHEK HANNOVER Amsterdam
ELSEVIER
Boston
•
New York
•
San Francisco
•
Oxford •
Heidelberg •
Paris
•
•
London
San Diego
Singapore Sydney Tokyo •
•
Contents
Foreword
ix
Preface
xv
xxiii
Acknowledgments Chapter
1
Fundamentals of Quantitative Design and 1.1
Introduction
1.2
Classes of Computers
5
1.3
Defining Computer Technology
11
2
Architecture
1.4
Trends in
1.5
Trends in Power and
1.6
Trends in Cost
1.7
Dependability
1.8
17
Energy
Circuits
33 and
Performance
1.11
Fallacies and Pitfalls
1.13
21 27
1.10
1.12
Concluding Historical Perspectives
36 44
Power
52 55
Remarks
59 and References
Case Studies and Exercises
2
Integrated
in
Measuring, Reporting, Summarizing Quantitative Principles of Computer Design Putting It All Together: Performance, Price, and
1.9
Chapter
Analysis
Diana Franklin
by
61 61
Memory Hierarchy Design 72
2.1
Introduction
2.2
Ten Advanced
2.3
Memory Technology and Optimizations
2.4
Protection: Virtual
2.5 2.6
2.7
Optimizations
of Cache Performance
78
96
Memory and Virtual Machines Crosscutting Design of Memory Hierarchies All It Together: Memory Hierachies in the Putting
105
ARM Cortex-A8 and Intel Core i7
113
Fallacies and Pitfalls
125
Issues: The
112
xi
xii
«
Contents
2.8 2.9
Chapter
3
Concluding Remarks: Looking Ahead Historical Perspective and References Case Studies and Exercises by Norman IMaveen Muralimanohar, and Sheng Li
129 131 P.
Jouppi, 131
Instruction-Level Parallelism and Its Exploitation 3.1
Instruction-Level Parallelism: Concepts and
3.2
Basic
Challenges
156
167
3.9
Overcoming Dynamic Scheduling Dynamic Scheduling: Examples and the Algorithm Hardware-Based Speculation Exploiting ILP Using Multiple Issue and Static Scheduling Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation Advanced Techniques for Instruction Delivery and Speculation
202
3.10
Studies of the Limitations of ILP
213
3.3 3.4 3.5 3.6 3.7
3.8
3.11 3.12
Data Hazards with
Issues: ILP
and the
Approaches Memory System Cross-Cutting Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput
3.14
Putting Together: The Intel Core Fallacies and Pitfalls
3.15
Concluding
3.16
Historical
3.13
It All
i7 and ARM Cortex-A8
183 192
197
221
223 233
Remarks: What's Ahead?
245
Perspective and References
247 247
4.1
Introduction
262
4.2
Vector Architecture
264
4.3
SIMD Instruction Set Extensions for Multimedia
282
4.4
Units
288
Graphics Processing Detecting and Enhancing Loop-Level
4.6
Crosscutting
4.7
Putting It All and Tesla
4.8
5
176
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
4.5
Chapter
162
241
Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
Chapter4
148
Compiler Techniques for Exposing ILP Reducing Branch Costs with Advanced Branch Prediction
Parallelism
Issues
Together:
315 322
Mobile
versus
Server GPUs
versus Core i7
Fallacies and Pitfalls Remarks
4.9
Concluding
4.10
Historical Perspective and References Case Study and Exercises by Jason D. Bakos
323
330 332 334 334
Thread-Level Parallelism 5.1
Introduction
344
5.2
Centralized Shared-Memory Architectures Performance of Symmetric Shared-Memory Multiprocessors
366
5.3
351
Contents
378
5.8
5.9
Fallacies and Pitfalls
405
5.10
Concluding Remarks
409
5.11
Historical Perspectives and References Case Studies and Exercises by Amr Zaky and David A. Wood
412
5.5 5.6 5.7
Distributed
Warehouse-Scale Computers Data-Level Parallelism
to
392 395 400
412
Exploit Request-Level and
Introduction
432
6.2
Programming Models and Workloads for Warehouse-Scale Computers Computer Architecture of Warehouse-Scale Computers Physical Infrastructure and Costs of Warehouse-Scale Computers Cloud Computing: The Return of Utility Computing Crosscutting Issues Putting It All Together: A Google Warehouse-Scale Computer
436
6.8
Fallacies and Pitfalls
471
6.9
Concluding Remarks
6.10
Historical
6.4 6.5
6.6 6.7
Perspectives
Instruction Set
441 446 455 461 464
475 and References
Case Studies and Exercises by Parthasarathy
476
Ranganathan
476
Principles
A.I
Introduction
A-2
A.2
A-3
A.5
Classifying Instruction Set Architectures Memory Addressing Type and Size of Operands Operations in the Instruction Set
A.6
Instructions for Control Flow
A-16
A.7
A-21
A.9
Encoding an Instruction Set Crosscutting Issues: The Role of Compilers Putting It All Together:The MIPS Architecture
A.10
Fallacies and Pitfalls
A-39
A.11
Concluding Remarks Historical Perspective and References Exercises by Gregory D. Peterson
A-45
A.3 A.4
A.8
A. 12
Appendix B
386
6.1
6.3
Appendix A
xiii
Shared-Memory and Directory-Based Coherence Synchronization: The Basics Models of Memory Consistency: An Introduction Crosscutting Issues Putting It All Together: Multicore Processors and Their Performance
5.4
Chapter6
«
Review of Memory
A-7 A-13 A-14
A-24 A-32
A-47 A-47
Hierarchy B-2
B. 1
Introduction
B.2
Cache Performance
B-16
B.3
Six Basic Cache
B-22
Optimizations
xiv
•
Contents
B.4
Virtual Memory
B-40
B.5
Protection and Examples of Virtual Memory
B-49
B.6
Fallacies and Pitfalls
B-57
B.7
Concluding Remarks Historical Perspective
B-59
B.8
Exercises
B-59
and References
B-60
by AmrZaky
Pipelining: Basic and Intermediate Concepts
Appendix C
C1
C-2
Introduction Hazards
C-11
C.6
Major Hurdle of Pipelining—Pipeline Pipelining Implemented? What Makes Pipelining Hard to Implement? Extending the MIPS Pipeline to Handle Multicycle Operations Putting It All Together: The MIPS R4000 Pipeline
C.7
Crosscutting
C.8
Fallacies and Pitfalls
C-80
C.9
Concluding Remarks Historical Perspective Updated Exercises by
C-81
C.2
The
C3
How Is
C.4 C.5
CIO
C-30 C-43 C-51 C-61 C-70
Issues
and References
C-81
Diana Franklin
C-82
Online Appendices
Appendix
D
Appendix
E
Storage Systems Embedded Systems
By Thomas M.Conte F
Appendix
Interconnection Networks Revised by Timothy M. Pinkston andJose' Duato
Appendix
G
Vector Processors in More
Depth
Revised by Krste Asanovic
Appendix
H
Appendix
I
Appendix
J
Hardware and Software for VLIW and EPIC
Large-Scale Multiprocessors and Scientific Applications Computer Arithmetic by David Goldberg