Computer architecture : a quantitative approach

17 downloads 98 Views 212KB Size Report
Computer Architecture. A Quantitative Approach. Fifth Edition. John L. Hennessy. Stanford University. David A. Patterson. University of California, Berkeley.
Computer Architecture A Quantitative

Approach

Fifth Edition

John L. Stanford

Hennessy

University

David A. Patterson University of California, Berkeley With Contributions

by

Krste Asanovid

Norman P. Jouppi HP Lobs

University of California Berkeley

Sheng

Jason D. Bakos

HP Labs

University of South Carolina

Naveen Muralimanohar

Robert P. Colwell

HP Labs

R&E Colwell& Assoc Inc.

Gregory

Thomas M. Conte

Universitat Politecnica de Valencia and Simula

University of Tennessee Timothy M. Pinkston University of Southern California Parthasarathy Ranganathan

Diana Franklin

HP Labs

University of California, Santa Barbara

David A. Wood

David

University of Wisconsin-Madison

The

Amr

North Carolina State University Jose Duato

Goldberg Scripps Research Institute

Li

D. Peterson

Zaky

IMventy of Santa Clam TECHNISCHE INFORMATIONS 81BLIOTHEK

UNIVERSITATSGIBLIOTHEK HANNOVER Amsterdam

ELSEVIER

Boston



New York



San Francisco



Oxford •

Heidelberg •

Paris





London

San Diego

Singapore Sydney Tokyo •



Contents

Foreword

ix

Preface

xv

xxiii

Acknowledgments Chapter

1

Fundamentals of Quantitative Design and 1.1

Introduction

1.2

Classes of Computers

5

1.3

Defining Computer Technology

11

2

Architecture

1.4

Trends in

1.5

Trends in Power and

1.6

Trends in Cost

1.7

Dependability

1.8

17

Energy

Circuits

33 and

Performance

1.11

Fallacies and Pitfalls

1.13

21 27

1.10

1.12

Concluding Historical Perspectives

36 44

Power

52 55

Remarks

59 and References

Case Studies and Exercises

2

Integrated

in

Measuring, Reporting, Summarizing Quantitative Principles of Computer Design Putting It All Together: Performance, Price, and

1.9

Chapter

Analysis

Diana Franklin

by

61 61

Memory Hierarchy Design 72

2.1

Introduction

2.2

Ten Advanced

2.3

Memory Technology and Optimizations

2.4

Protection: Virtual

2.5 2.6

2.7

Optimizations

of Cache Performance

78

96

Memory and Virtual Machines Crosscutting Design of Memory Hierarchies All It Together: Memory Hierachies in the Putting

105

ARM Cortex-A8 and Intel Core i7

113

Fallacies and Pitfalls

125

Issues: The

112

xi

xii

«

Contents

2.8 2.9

Chapter

3

Concluding Remarks: Looking Ahead Historical Perspective and References Case Studies and Exercises by Norman IMaveen Muralimanohar, and Sheng Li

129 131 P.

Jouppi, 131

Instruction-Level Parallelism and Its Exploitation 3.1

Instruction-Level Parallelism: Concepts and

3.2

Basic

Challenges

156

167

3.9

Overcoming Dynamic Scheduling Dynamic Scheduling: Examples and the Algorithm Hardware-Based Speculation Exploiting ILP Using Multiple Issue and Static Scheduling Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation Advanced Techniques for Instruction Delivery and Speculation

202

3.10

Studies of the Limitations of ILP

213

3.3 3.4 3.5 3.6 3.7

3.8

3.11 3.12

Data Hazards with

Issues: ILP

and the

Approaches Memory System Cross-Cutting Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput

3.14

Putting Together: The Intel Core Fallacies and Pitfalls

3.15

Concluding

3.16

Historical

3.13

It All

i7 and ARM Cortex-A8

183 192

197

221

223 233

Remarks: What's Ahead?

245

Perspective and References

247 247

4.1

Introduction

262

4.2

Vector Architecture

264

4.3

SIMD Instruction Set Extensions for Multimedia

282

4.4

Units

288

Graphics Processing Detecting and Enhancing Loop-Level

4.6

Crosscutting

4.7

Putting It All and Tesla

4.8

5

176

Data-Level Parallelism in Vector, SIMD, and GPU Architectures

4.5

Chapter

162

241

Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell

Chapter4

148

Compiler Techniques for Exposing ILP Reducing Branch Costs with Advanced Branch Prediction

Parallelism

Issues

Together:

315 322

Mobile

versus

Server GPUs

versus Core i7

Fallacies and Pitfalls Remarks

4.9

Concluding

4.10

Historical Perspective and References Case Study and Exercises by Jason D. Bakos

323

330 332 334 334

Thread-Level Parallelism 5.1

Introduction

344

5.2

Centralized Shared-Memory Architectures Performance of Symmetric Shared-Memory Multiprocessors

366

5.3

351

Contents

378

5.8

5.9

Fallacies and Pitfalls

405

5.10

Concluding Remarks

409

5.11

Historical Perspectives and References Case Studies and Exercises by Amr Zaky and David A. Wood

412

5.5 5.6 5.7

Distributed

Warehouse-Scale Computers Data-Level Parallelism

to

392 395 400

412

Exploit Request-Level and

Introduction

432

6.2

Programming Models and Workloads for Warehouse-Scale Computers Computer Architecture of Warehouse-Scale Computers Physical Infrastructure and Costs of Warehouse-Scale Computers Cloud Computing: The Return of Utility Computing Crosscutting Issues Putting It All Together: A Google Warehouse-Scale Computer

436

6.8

Fallacies and Pitfalls

471

6.9

Concluding Remarks

6.10

Historical

6.4 6.5

6.6 6.7

Perspectives

Instruction Set

441 446 455 461 464

475 and References

Case Studies and Exercises by Parthasarathy

476

Ranganathan

476

Principles

A.I

Introduction

A-2

A.2

A-3

A.5

Classifying Instruction Set Architectures Memory Addressing Type and Size of Operands Operations in the Instruction Set

A.6

Instructions for Control Flow

A-16

A.7

A-21

A.9

Encoding an Instruction Set Crosscutting Issues: The Role of Compilers Putting It All Together:The MIPS Architecture

A.10

Fallacies and Pitfalls

A-39

A.11

Concluding Remarks Historical Perspective and References Exercises by Gregory D. Peterson

A-45

A.3 A.4

A.8

A. 12

Appendix B

386

6.1

6.3

Appendix A

xiii

Shared-Memory and Directory-Based Coherence Synchronization: The Basics Models of Memory Consistency: An Introduction Crosscutting Issues Putting It All Together: Multicore Processors and Their Performance

5.4

Chapter6

«

Review of Memory

A-7 A-13 A-14

A-24 A-32

A-47 A-47

Hierarchy B-2

B. 1

Introduction

B.2

Cache Performance

B-16

B.3

Six Basic Cache

B-22

Optimizations

xiv



Contents

B.4

Virtual Memory

B-40

B.5

Protection and Examples of Virtual Memory

B-49

B.6

Fallacies and Pitfalls

B-57

B.7

Concluding Remarks Historical Perspective

B-59

B.8

Exercises

B-59

and References

B-60

by AmrZaky

Pipelining: Basic and Intermediate Concepts

Appendix C

C1

C-2

Introduction Hazards

C-11

C.6

Major Hurdle of Pipelining—Pipeline Pipelining Implemented? What Makes Pipelining Hard to Implement? Extending the MIPS Pipeline to Handle Multicycle Operations Putting It All Together: The MIPS R4000 Pipeline

C.7

Crosscutting

C.8

Fallacies and Pitfalls

C-80

C.9

Concluding Remarks Historical Perspective Updated Exercises by

C-81

C.2

The

C3

How Is

C.4 C.5

CIO

C-30 C-43 C-51 C-61 C-70

Issues

and References

C-81

Diana Franklin

C-82

Online Appendices

Appendix

D

Appendix

E

Storage Systems Embedded Systems

By Thomas M.Conte F

Appendix

Interconnection Networks Revised by Timothy M. Pinkston andJose' Duato

Appendix

G

Vector Processors in More

Depth

Revised by Krste Asanovic

Appendix

H

Appendix

I

Appendix

J

Hardware and Software for VLIW and EPIC

Large-Scale Multiprocessors and Scientific Applications Computer Arithmetic by David Goldberg

Appendix

K

Survey of Instruction

Appendix

L

Historical

Perspectives and

References Index

Set Architectures

References

R-1 1-1