compiler-assisted multiple instruction rollback recovery using a read ...

1 downloads 0 Views 1MB Size Report
by providingan operand read bufferwhile othersare resolvedmore efficiently ..... of a hazard variable reach a node in which the hazard variable is live, the node is ...
UILU-ENG-93-2220 CRHC-93-11

May 1993

Center for Reliable

and High-Performance

Computing

• . / - .- -: .

_.,

J"

COMPILER-ASSISTED MULTIPLE INSTRUCTION ROLLBACK RECOVERY USING A READ BUFFER

N. J. Alewine,

S.-K.

Chen,

W. K. Fuchs,

and W.-M.

(NASa-Ca.-1931?5) CUMPILER-ASS MULTIPLE INSTRUCTION R_3LLBACK _,LCCI_/F!RY !.J$I _'':-_: _, R_AL) IUFF_R, (Illinois

Univ.)

3_

Hwu

ISTED

N93-zgI70

Uncl

p

G3/61

Coordinated

Science

Laboratory

College of Engineering UNIVERSITY OF ILLINOIS Approved

for Public Release.

Distribution

AT URBANA-CHAMPAIGN Unlimited.

0111503

_s

L]_CL.-\S S ! F I ED SECUmFY C_S_IFI_rION

OF f_JS PAGE

REPORT

DOCUMENTATION

la. REPORT SECURITY CLASSIFICATION

lb.

PAGE

RESTRICTIVE MARKINGS

Unclassified

None

2a. SECURITY CLASSIFICATION

AUTHORITY

2b. DECLASSIFICATION / DOWNGRADING

3. DISTRIBUTION/AVAILABILITY Approved

SCHEDULE

for

distribution 4. PERFORMING

ORGANIZATION

REPORT NUMBER(S)

OF PERFORMING ORGANIZATION

Coordinated

Science

University

of

6c. ADDRESS(_

6b. OFFICE N/A SY'IVlBOL

Illinois

NASA 7b.

XX__Y_X_X_

1308 IL

&¢. Main

Moffitt

Security

ALEWINE,

Instruction

N.

J.,

113b.TIME

S.-K.

I

._Iultii)le

rapid hazards

been This

ing !.,t.,

by

f:om

providing

PROJECT

W.

Recovery

I TNAOSK

K.

Fuchs,

Using

and

May I4. DATE1993 OFREPORT

paper

focuses some

operand

a

W.-M.

indicate

is a Iechnique processor

redundancy rollback

Read

_a_Mocrth,

while

wit h compiler-d efficiency

31 COUNT IS. PAGE

others

[]

rom are hazard

over

previous

SAME AS RPT.

84 MAR

by bl_k

instruction

numbed

retry,

retry

been

implemented

in

Ilardware-based

MIR

in hardware.

hazzards

directly

resolved

more

removal

can

efficiently

and

J21. ABSTRACT [-1 DTIC USERS Ij

M[I_

to

rollback

designs

have

transformations.

resolved compiler

by

X\"e

provid-

transformations.

A

hardware-implemented

Experimental compiler-based

recovery.

emciently

combines

SECURITY

computers

eliminate

instructionrollback be with

which

transformation.s

hardware-based

data-flow

multiple

rollback

is developed

mainframe designs

Compiler-based

with

to achieve

instruction scheme

riven

ar_didenti_

recovery,

assisted

has

techniques

resulting

OF ABSTRACT

that

i,nplemeuted

instructionrollback

OF RESPONSIBLE INDIVIDUAL 1473,

Buffer

Oa_

performance

eval,:a-

schemes.

CLASSIFICATION

Unclassified

122b.TELEPHONEOncIudeAre,

Code) IZ2c. OFF,CE SYMBOL

I FORM

ACCESSION NO.

humor)

failures.

data

compiler-assisted

buffer

improved

by bl_k

(.\111{)

hazards

multiple

redundancy

_IUNCLASSIFIEDAJNLIMITED

DD

UNIT

Hwu

onrever_if_ece_a_ error

hardware

transien_

remove

on

data

read

aodidenti_

data

which

20. DISTRIBUTION/AVAILABILITY

22a. NAME

rollback

recovery

compiler-assisted

tions

PROGRAM

fault-tolerance,

ifnece_a_

developed

that

an

Chen,

compilers,

instruction

pro_i,te

observe

WORK

NO.

18. S_BJEETTERMS(Continue SUB-GROUP

ABSTR_CT(Continueonreve_e

also

NUMBERS

ELEMENT NO.

Rollback

TO

COSATICODES GROUP

data

NUMBER

NOTATION

I !9

CA

VA

COVERED

! FROM

Technical

FIELD

Research

andZIPCodc)

Field,

10. SOURCE OF FUNDING

Multiple

13a. TYPE OF REP'ORT

17,

Naval

Oa_fication)

Compiler-Assisted AUTHOR(S)

State, FL

of

9. PROCUREMENTINSTRUMENTIDENTIFICATION

State, and ZlPCode)

16. SUPPLEMENTARY

NUMBER(S)

(If applicab/c) I Bb. OFFICE SYMBOL

7B

12. PERSONAL

REPORT

Office

Arlington,

7A

1. TITLE Onclude

and

AOORESS(Ciry, Boca Raton

St.

61801

8a. NAME OF FUNDING/SPONSORING ORGANIZATION

_.AODRESS(O_

ORGANIZATION

7a. NAME OF MONITORING ORGANIZATION Intl Business Machines

(If applicable)

Lab

State, and ZIPCodc)

Urbana,

release;

unlimited

CRHC-93-11

UILU-ENG-93-2220 6a. NAME

5. MONITORING

OF REPORT

public

83 APR edition may be used until exhausted. All other editions are obsolete.

I SECURITY CLASSIFICATION OF THIS PAGE UFCLASS

IFIED

COMPILER-ASSISTED ROLLBACK

MULTIPLE

RECOVERY

N. J. Alewine

Center

USING

INSTRUCTION A READ

x, ,ft.-I(. G"hen, W. K. Fuch_,

W.-M.

BUFFER

Hum

for Reliable and High-Performance Computing Coordinated Science Laboratory 1308 West Main Street University Urbana, Primary

contact:

of Rlinois IL 61801 W. Kent Fuchs

Phone: (217) 333-8294 FAX: (217) 244-5686 e-mail to [email protected],

edu

May, 1993

ABSTRACT Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compilerbased MIR designs have also been developed which remove rollbackdata hazards directlywith data-flowtransformations. This paper focuseson compiler-assisted techniquesto achievemultipleinstructionrollbackrecovery. We observe that some data hazards resultingfrom instructionrollbackcan be resolved efficiently by providingan operand read bufferwhile othersare resolvedmore efficiently with compilertransformations.A compiler-assisted multipleinstructionrollbackscheme isdeveloped which combines hardware-implemented data redundancy with compiler-drivenhazard removal transformations.Experimental performance evaluationsindicateimproved efficiency over previous hardware based and compiler-basedschemes.

/ndez

terr_:

fault-tolerance,

error recovery,

instruction

retry, compilers,

hardware

assisted

retry.

aInternationM Business Mschines Corporation, Boca ]Lston, FI. This research wu supported in part by the National Aeronautics and Space Administration (NASA) under grant NASA NAG 1-613, in cooperation with the Illinois Computer Laboratory for Aerospace Systems and Software (ICLASS), and in part by the Department of the Navy and managed by the Office of the Chief of Naval Research under Contract N00014-91-J-1283.

1

Introduction

Instruction Multiple

retry is a technique instruction

rollback

when error reporting When

transient

tiple instruction checkpointing

processor

1.1

latencies

and rollback

is particularly

are greater

retry)

[1-6].

[2-5], or re-execution

algorithm-based,

multiple

instruction

recovery

from

transient

appropriate

faults

Multiple

instruction

of a few cycles

or control-flow

rollback

retry

(also

latencies

referred

alternative

or

within

methods

to as mul-

to system-level

a sliding

[7], can be implemented

error detection

system.

cycle.

can be an effective instruction

in a processing

when error detection

than a single instruction

errors occur,

retry or simply

window

in parallel

for recovery

of

with

from transient

errors.

Hardware-Based

Hardware

implemented

and 2) incremental space

at regular,

back

to the appropriate system

by undoing,

Instruction instruction

checkpointing.

state

to the

recovery

processor

a few instructions concurrent,

for rapid recovery

state

Rollback

retry schemes

Full checkpointing

or predetermined, checkpointed in a "sliding

or "backing-out"

belong

intervals. system

state.

window".

the system

state

Upon

to one of two groups:

maintains

"snapshots"

Upon error detection, Incremental

of the required system the system

checkpointing

error detection

changes

1) full checkpointing

the

maintains

system

up to the instruction

can be rolled

state

changes is restored

in which the error

occurred. The issuesassociatedwith instructionretryare similarto the issuesencountered with exception handling in an out-of-orderinstructionexecution architecture.If an instructionis to write to a registerand N is the ma_mum

error detectionlatency (or exception latency),two copiesof the

data must be maintained forN cycles.Hardware schemes such as reorder buffers,historybuffers, future files[8],and micro-rollba_k[2]differin where the updated and old values reside,circuit complexity,CPU

cycletimes,and rollbackefficiency.

Table 1 gives a descriptionof varioushardware-ba_ed methods to restorethe generalpurpose registerfile contentsduring singleor multipleinstructionrollback.In the VAX

8600 and VAX

errorsare detected prior to the completion of a faultyinstruction.For most VAX

9000,

instructions,

updates to the system state occur at the end of the instruction.If the error is detected prior to the updating of the system state,the instructioncan be rolledback and re-executed.Ifthe system

Table 1: Hardware-based Rollback

single

Scheme

Rollback

Type

Distance

full full

VAX 8600 [10]

full

full

IBM patent

4,912,707

[6]

IBM patent

4,044,337

[11]

incremental

micro-rollback [2]

incremental

history

incremental

buffer [8]

VAX 9000 [12]

has

cannot

changed

be accomplished.

The IBM 4341, require shadow state the

file structures

additional

scan

old data from being

the delayed

write

and the maximum The history

until

In a delayed

impact

introduced

rollback

distance

buffer scheme

circuitry

file port

the

complicates

and

testability.

write

error

by the

physical

that

file

instruction

4,044,337,

rollback

IBM 3081

latency

detection

bypass

circuit

provides

2 The VAX 8600

a delayed

the system

overhead,

although

this feature

without

VAX 9000

schemes

and

write

has expired;

recent

to forward

circuitry

file all

of only one instruction.

latency

the most

is required

and history

This data is used to restore

fries by using

scheme,

circuitry

patent

can add significant

IBM 4341

the

write

buffer

to prevent

ensuring

values

that

the

are contained

in

this data on subsequent

is a function

of the

register

reads. Me size

[2].

maintains

fore does not require bypass which

IBM data.

file structures

to obtain

buffer, and bypass

The performance

4,912,707,

also avoids shadow

overwritten

virtualfile

is not required for the VAX 8600 and VAX 9000.

require an error detection

scheme

data is fault-free.

variable

redundant

[13] of the

incurred

files, however,

The micro-rollback

new

design

cost over that

avoid shadow

Shadow

file

registerfile registerfile historybuffer re_ster file shadow file

vaxiable

data storage

to maintain

shadow

single instr. registerfile not required

IBM patent

during rollback recovery. level sensitive

singleinstr. registerfile variable feaster file singleinstr. registerfile write buffer variable

not required shadow file shadow files

of the error, a flag is set to indicate

Redundant

IBM 3081,

Redundant

Primary

shadow file

incremental

prior to detection

schemes.

Location of Data

variable

full

rollback

singleinstr. feaster file 10-20 instr. registerfile

incremental

IBMz/s 9000[5]

instruction

Checkpoint

IBM4341[9] IBM3o81[z]

history fih [8]

state

and multiple

redundant [8]. The history

file design

data in a separate

buffer does however

and can impact

2The 126 scan rings of the IBM 3081 contains 35,000 bits of data.

2

push-down

performance

array and there-

require an extra by increasing

fih

register access

times. In an effort to ity relative

increase

(VRM)

system

registers

into 32 physical

physical

register

was primarily

1.2

to provide

result

register

are removed

being assigned assigned

to physical

by the compiler. redundancy

1.3

to physical

registers,

Compiler-based tions.

assistance unique superior

hardware

to resolve

characteristics performance

multiple

instruction

This paper introduces

data redundancy

2) machine-code

of differing to either

the

the VR.M system performance,

In the VRM until

it has

extension,

the error detection

recovery

(or just

the maximum level,

have

been

data

hazards)

rollback

inves-

hazards

that

are

identified

distance.

Antide-

or the code level prior to variables

level,, or the code level in which variables

instruction

rollback

assembler-level

reduces

rollback

are

code emitted

the requirement

for data

approaches.

Rollback resolves

a compiler-assisted

the remaining

obsolete,

to remove

level, which represents

rollback

to resolve

rollback

data hazards

instruction

Instruction

multiple

becomes

register.

manipulations

1) pseudo-code

in hardware-based

Compiler-Assisted

architectural

system

recovery.

register

the eight

Although

is postponed

instruction

Rollback

and 3) post-pass

Compiler-based

logic present

to multiple

levels:

registers,

a virtual

register

in the physical

__ N, where N represents

at three

introduced maps

compatibil-

Rollback

rollback.

3 of length

code

improve

in rollback

register

down-level

register.

and therefore

MIR uses data-flow

instruction

dynamically

as a new virtual

to a new virtual

approaches

Compiler-based

9000 has

the data in a physical

to assist

Instruction

from multiple

pendencies

circuitry

for the data contained

compiler-based

by antidependencie#

When

data redundancy

Compiler-Based

[3,4].

The VRM

for reassignment

has been exceeded

tigated

[14].

maintaining

the IBM E/S

to reduce register pressure

of a physical

Recently,

file size while

registers,

registers.

is released

intended

been extended

latency

register

to the 16 architectural

management

remapping

the

all data

instruction

one type of rollback

hazards. hazard

Experimental

types,

a hardware-only

hazards rollback

using scheme

data hazard results

relying

on compiler

that by exploiting MIR design

instruction

3For a complete presentation of dat_-flow properties and manipulation methods, see [15]. 3

transforma-

which uses dedicated

while

indicate

the new compiler-assisted or compiler-based

compiler

rollback

the

can achieve scheme.

2

Error

2.1

Model

Rollback

and

Data

Hazard

Hazard

Classification

Model

The followingfour assumptions areused in the generalerrormodel: i)the maximum latencyisN instructions, 2) memory

errordetection

and I/O have delayed writebuffersand can rollbackN cycles,

3) the statesof the program counter and program statusword (PSW) recordingdevice or by shadow registers [2],and 4) the CPU

are preservedby an external

state can be restoredby loading the

correctcontentsof the registerfile, progrmm counter,and PSW. Given the above assumptions,any errorwhich does not manifest itself as an illegal path in the control-flow graph (CFG) of the program isMlowed provided that the followingtwo conditionsare satisfied: I) registerfilecontentsdo not spontaneouslychange, and 2) data can not be written to an incorrectregisterlocation.There are four targetederror types: 1) CPU caused by an ALU

errorssuch as those

failure, 2) incorrectvalues being read from I/O, memory, the registerfile, or

extern_lfunctionalunits such as the floatingpoint unit, 3) correct/incorrect values being read from incorrectlocationswithin the I/O, memory, or register file, and 4) incorrectbranch decisions resultingfrom errortypes i, 2, or 3.

2.2

Hazard

Classification

The code can be representedas a CFG

G(V',E), where V isthe setof nodes denoting instructions

and E is the set of edges denoting control-flow. If there is a directcontrol-flowfrom instruction i, denoted denote

I_, to lj, where I_ E V and

the smallest

The hazard registers)

whose

of instructions

values are inconsistent classification

Proof:

z will be in an inconsistent

'A

wo/k is a sequence

of edge

and

_n error occurring state

a sequence

as the set of pseudo executions

registers

of an instruction

in a graph

of instructions

z is defined

(or machine sequence

due

where

4

I1, I2,...,

by IN.

during the walk.

the edges

IN which form a

during the walk.

in Il will be detected

since it was defined

traversals

Let d,,_,_(I_, Ij)

set Hregm follows.

1: z E Hre_e iff there exists

For the i.fcase,

is defined

during different

of hazard

legal walk 4 in G such that z is live at/1,

E E.

along any path from I_ to Ij.

set Hregs of the error model

to retry. A formal Property

number

Ij E V, then there is an edge (I_, Ij)

visited

During

Since

can be repeated

the retry of I1,

z is live at I1, there

[16],

is some path along which z isused priorto itsredefinition, and sincez isin an inconsistentstate, z E Hregm. For the only ifcase,we suppose the contrary.Assume

that among

alllegalwalks of

length N in G, eitherz isnot liveat the beginning,or z isnot definedduring the walk. It then followsthat z eitherhas no use, or z isnot changed. (The errormodel does not allow a write to a wrong locationand the contentsof registerz can not spontaneously change.) Therefore there is no inconsistency

problem

for z, which implies

Property

2: Hazards

can be classified

pendencies

of length

branch boundaries, Proof: /1, and that

referred

index

that

and the content

Ii-1.

hazard

Since d_i,,(It, An on-path

the corrupted and branch hazard

3

in sequence,

defining of register

Ii defines z, where i E {1, 2, ..., with

These

Ix, such that Ij and

z along

two hazard

types

model

value.

W2 referring

l/) _< N, there is a hazard

a branch

at

z is live at implies

does not allow a write

Ij along

an antidependency

appear

The latter

that

exists

that

in G, such that

1 implies

Property

as antide-

may overlap.

N).

the first instruction

occurs

and 2) those

z has a different

Wx (the error

appear

change).

on z. Case 2: if W2 _ Wx, there

to

Let i be the largest

there

of length instruction

exists

a legal walk

z is a use. _< N,

and

Case

there

is

It between

Ix and

z, and after rollback,

Ij uses

on z at a branch boundary. when Ii defines

z value prior to its being redefined. will be denoted

hazards,

that

z can not spontaneously

Ii constitute

or branch data hazard

hazards

1) those

a legal walk Wx = I1,I2,...,IN

of Ix,I2,...,IN

1: if W2 C W1, instructions an on-path

exists

is at least one instruction

W2 in G, beginning

to as on-path

to as branch hazards.

Since z E H, there

a wrong location

as one of two types:

N [3].Since nop insertioncan be costlyto performance, previous compiler transformationsremoved allhazards possible,leavingonly unresolvablehazards to be removed by the post-passtransformation. In Section 3.1.2,a new post-passtransformationwas introduced in which nop insertionwas replacedby read insertions as the primary hazard removal technique.As illustrated in Figure 6, up to two branch hazards can be removed by a singleread instruction.The new post-passtransformation isvery efficient and in some casescan resolvebranch hazards with lessperformance impact than pseudo-leveltransformations.Figures 11 and 13 of Section 4.2 show performance overhead comparisons between compiler-drivendata-flowmanipulationsand the post-passtransformationfor the PUZZLE

and TBL

applicationsdescribedin Table 3 of Section4.1. Comp//PP

indicatesthat

hazards areresolvedby the compilerwhere possible, with the remaining hazards being resolvedat

13

the pOstopass level. that all hazards

PP (post-pass)

are removed

For the PUZZLE post-pass remove

all hazards

performance appUcation,

performance

better

is infrequent.

impact

than

produce

than

The save/restore when

loop

the post-pass

and

to

of compiler

and

a guaranteed

As demonstrated

but small

by the PUZZLE

impacting

performance

of loop protection

protection

than the

transformation

combination

without

operations

performance

introduces

path length. hazards

better

the

via read insertion

can eliminate

read insertion

using

performance

Hazard elimination

renaming

have been disabled

can result

is frequent,

when in more

as demonstrated

by

for the TBL application.

Figure removal: removal

7 illustrates 1) hazard

the potential

removal

is executed

instructions,

produce

two times,

impact

be used to aid in loop protection

3.3.2

Profiling

Profiled

data was included

comprised

for areas

profiling,

of 10 times assigned

a loop

depending

weights

Protection

than

that results

would

in loop protection,

require

the execution

frequencies

20 times and the hazard of 40 additional

of only two additional

As shown

of hazard

and 2) hazard

the execution

were reversed,

loop protection.

two types

instructions.

then read insertion

in Figure

7, profiling

decisions.

would data can

"'

effectiveness

of both dynamic

a supplement

loop protection

execution

given the following

loop of Figure 7 is executed

would require

instruction

more performance

renaming

If the protected

where read insertion

If the loop and hazard

effect on performance

using register

using read insertion.

instruction

static

transformations

For the TBL appl/cation,

slightly

register

transformations

phase.

due to the longer instruction

pseudo

loop protection

results

produces

impact

compiler

alone.

transformations.

that compiler

at the post-pass

application,

transformation

post-pass

indicates

in the pseudo-level profile sampling

of the application is assumed

transformations and static

code that

to iterate

prediction.

3.2.

The static

are unexecuted

ten times.

on the depth of loop nesting.

of Section

during

Inner loops,

All loop header

The profile prediction

profile

data is

is used as

sampling.

therefore,

iterate

nodes and

hazard

For

multiples nodes

are

condition:

if

based on the profile data.

of loop

I due to hazard

node

nh is required protect

nh_weight

> 3 • (hdr_node(1)_weight),

then

to account

for both

loop protection

direct and

indirect

14

loop

based

on the following

I. The constant

costs.

Direct

3 adjusts

loop protection

the weights costs

result

Read Insertion

Loop Protection

save I ,.. rx dead

rt = _

1211.0

"

.....

I

i |

!

change

:

all _'s

]



mlr s [

profile dam

I

Figure 7: Loop protection

versus read

insertion.

from the save/restoreinstructionpair shown in Figure 7. Indirectloop protection costs result from: 1) an increasednumber of hazards which in turn requiredmore node splitting and more loop protection,and 2) increasedregisterusage due to the save/restoreinstructionswhich can result in additionalregisterspills.Figure 8 shows the run-time overhead for the TBL

applicationwith

rollbackdistancesfrom I to 10. Pro//PP indicatesthat profiling data was used in loop protection decisions. The resultsshow that the use ofprofile data can improve applicationperformance by postponing some hazard resolutionsuntilthe post-passphase. Using profiledata to aid in loop protection decisionsdid not produce performance equal to that forthe post-passtransformation,forthe TBL application.As an extensionto thiswork, profile data can be used to aid in registerallocation.As discussedin Section 3.2,hazards that are present afterpseudo registerrenaming are resolvedby adding hazard constraintsto liverange constraintsprior to registerallocation.These additional constraintscan cause increasedregisterspillageand impact performance. Similar techniques to those developed forloop protectioncan be used to enhance registerallocationdecisions.

15

Time

OH: TBL

10-

pp:

8-

--.,,iJi-° -i_lr o

Comp/PP: Prof/PP:

°o_°oo

_

&

&.



2

_

&

," o..-'"_..



I"

&

.." .." _

A

,-_.:'-i"

"2-. ..:. ....... "_,,-0"

"0. ......

n

0 -2 -4

I

I

1

I

I

I

12345678910 Rollback

I

I

I

I

Distance

Figure 8: TBL: profile data used for loop protection

Performance

4 4.1

Evaluation

Implementation

The hazard

and

removal

of the IMPACT

C compiler

machine

and before called

register

Table 3 lists the eleven

3100.

hazards

are called

allocation. code output

routines

The results

pseudo

in the MIPS register

register

allocation.

algorithm,

code generator

hazards

after the live range constraints

The nop insertion

(loop

protec-

Transformations

have

been

or post-pass

generated

algorithm,

is

routine.

application

on a SPARCserver

resolving

are called just before

programs

used in the evaluations.

490 and then the compiled

Static Size is the number of assembly

the library

have been implemented

[18]. Transformations

register

before the assembly

Programs

algorithms

and loop expansion)

physical

cross-compiled

Application

transformation

tion, node splitting, resolving

decisions.

instructions

emitted

program

The applications

were

was run on a DECstation

by the code generator,

not including

and other fixed overhead.

are summarized

plot shows the percent

of run-time

in Figures overhead

and the second plot shows the percent

9 through

13. Each

figure contains

( Time 01t) of the referenced

of code growth

overhead

were evaluated.

Compiler

two plots,

the first

resolution

scheme,

hazard

(Size OH) relative

to the base values

in Table 3. Four hazard

resolution

ing the compiler-driven

techniques

data-flow

manipulations.

Compiler

16

I resolves on-path

2 extends

the compiler

hazards

only, us-

transformations

Table3: Applicationprograms. Program

Static

QUEEN WC QSORT CMP GR,EP PUZZLE COMPRESS

to resolve

both

tions and compiler assumes

Description

148 181 252

eisht-queen program UNIX utility

262

UNIX

utility

907

UNIX

utility

quick sort algorithm

simple game UNIX utility

LEX

932 1826 6856

YACC TBL

8099 8197

parser-generator table formatting

CCCP

8775

preprocessor

hazards.

PP

on-path

relies solely

and

branch

on the post-pass

transformations

to resolve

a read buffer to resolve

remaining

Size

branch hazards.

lexical

transformation

on-path

hazards,

represents

preprocessor

for gnu C compiler

(post-pass)

disables

presented

branch hazards

Comp/PP

analyzer

the compiler

in Section

with the techniques

and uses the post-pass the compiler-assisted

3.1.2.

described

transforma-

Comp/PP

uses

in Section

3.2,

transformation

multiple

to remove

instruction

rollback

scheme. Due to the excessive large

applications,

COMPRESS,

compile

times

the evaluations

CMP, PUZZLE,

of the previous

of these

schemes

sad QSORT.

Compiler

1 and

were restricted

Both Comp/PP

Compiler

2 algorithms

to applications

QUEEN,

sad PP were evaluated

for WC,

for all eleven

applications.

4.2

Performance

Compiler ways.

analysis

transformations

Loop protection

used for the removal of data hazards

inserts save/restore

the path length

and,

more

to be generated,

spill code

can be costly MOV

therefore,

operations

the run time. increasing

since up to N hops could be inserted

rk, rk instructions

to create

covering

on-path

17

performance

at the head and tail of the loop.

Additional memory

can impact

arcs in the dependency

references

and

cache

for each unresolved hazards

misses.

hazard.

in the post-pass

in several

This increases

graph can cause Nop insertion The insertion

transformation

of also

increasespath lengths, code size, mainly numbers

9 through

3100 after they have

Results:

typically

due to loop expansion,

shown in Figures

DECstation

4.3

although

less than with nop insertions. may cause

more run-time

13 are for execution

been compiled

cache misses.

of the eleven

with the transforms

Finally,

application

the increase

in

The performance programs

on a

described.

Compiler

As can be seen in Figures9 through 11,extendingthe compiler hazard resolutionscheme to include branch hazards introduceslittle incrementalperformance impact or code growth overhead. Given a rollbackdistanceof 10,resolvingboth on-path and branch hazards using compilertransformations resultedin a maximum

performance impact of 32.6% and an average performance impact of 12.6%.

This compares with maximum

and average impacts of 35.4% and 15.4%, respectively, forcompiler-

drivenon-path hazard resolutiononly.The maximum

code sizeoverhead measured forthe extended

compiler-basedtechnique was 328% with an average overhead of 207%, for a rollbackdistanceof 10. This compares with a maximum

and average overhead of 372% and 225%, respectively, for the

unextended compiler-basedscheme. These resultsindicatea small incremental run-time performance overhead and a small code sizeoverhead given compiler-basedbranch hazard removal compared to compiler-based on-path hazard removal alone. Three factorsaccount forthese small incremental impacts. First,on-path hazards dominate in frequency of occurrence.Second, resolvingan on-path hazard at instruction Ii through renazning can sometimes resolvea branch hazard at instructionIi. Third, resolving on-path hazards with nop insertionmay resolvea corresponding branch hazard by increasingthe distancebetween the hazard node and itsnearestpredecessorbranch node.

4.4

Results:

PP

Figures 9 through 13 show the run-time and code sizeoverheads foreach applicationstudied using the read bufferto resolveon-path hazards and the post-passtransformationdescribedin Section 3 to cover allbranch hazards. The resultsare worst case in that many

of the branch hazards

could have been resolved with no performance impact using the compiler techniques;instead, they are resolvedby the insertionof MOV

instructions which cause a guaranteed,although small,

performance impact. Given a rollbackdistanceof 10, the post-pass transformation produced a

18

maximum

performance

impact

below the levels produced correspondingly

4.5

lower

Results:

The

by the compiler-baaed

with a maximum

tions and slightly

scheme

with an average

better

performance

performance

Code growth

of 2.43%,

overhead

of 13.0% and an average

overhead

dicate

techniques

compiler

compiler

techniques,

growth.

The primary buffer

than

scheme

produced

of 2.03%,

and

significantly

measurements

overhead

The run

frequent

were

of 8.59%.

20

Comp/PP:

+

across

only.

performance code

growth

of PUZZLE,

run-time

of requiring

Given

schemes

a rollback of 6.57%

overhead

of 51.2%

YACC,

re, compilation

all appUca.

impact

performance

mad post-pans

on-path

and

CCCP

penalties.

and additional are their

p /:

in-

These code

utilization

hazards.

Size OH: QUEEN ) ('_ Compiler h --oCompiler 2: -o 350 pP. ...K.... 400-

35(%)

3250t

time results

of the compiler-aasisted

Time OH: QUEEN 2: - .0. ...x.... h .-.a-

a maximum

axe still useful in reducing

the more

overheads

transformation

a maximum

have the disadvantage

advantage

Compiler pP. Compiler

low performance

with the post-pass

of 15.5%.

however,

to resolve

consistently

impact

with and an average

of the read

overhead

achieved

of 10, the compiler-aasisted

that

scheme.

impact

Comp/PP

compiler-assisted

distance

of 7.695{ with an average performance

,_ /9

3250_0

#/

15 I0

n..._ ---a "'"

200 150 C__

°° ....

50 & "

1

2

I

I

I

3 4 5 6 7 Rollback Distance

8

9

Figure

overhead

9: thin-time

0

10

0

J 1

#

A

and code size overhead:

19

A

,&

A

A

A

m T .....T---T .....7" ....Y .....? 2 3 4 .5 6 7 8 Rollback Distance

QUEEN.

&

? 9

&

? 10

Size OH: WC (%) 400 Compiler 1: --e350 Compiler 2: - opp.. ...x.... 300 Comp/Pp. .-_-.-

Time OH: WC (%) 35 Compiler 1: _ 30 Compiler 2: - opp.. ...x.... 25 Comp/Pp. _

15 20 10

2OO 250 150

5 0

_

-5

, 1

Tm_

100 50 , 2

, , , , , 3 4 5 6 7 Rollback Distance

, 8

0

, , 9 10

, 1

0

, '_" V Y 7 , 2 3 4 5 6 7 Rollback Distance

, 8

, , 9 10

Size OH: COMPRESS

OH: COMPRESS

35-,(')Compiler 11 30 Compiler -opP. ...K-.-. 25 Comp/PP: -.,t-

400- )Compiler 1: (: Compiler 2: - o 350 pP. ...x.... 300 Comp/PP: ..._...

,=

15 20

200 250

/f /

105

__ .a..

0

"--w'"

..d

"" "

150 100

_"_

-5

, 1

50 , 2

, , , , , , 3 4 5 6 7 8 RoLlback Distance

, , 9 10

0 0

I

2

3 4 5 6 7 8 Rollback Distance

9

10

Size OH: CMP

Tin_ OH: CMP (¢,

;) Compiler

3O

Compilerpp. 2: -...x....o -

400(._;) Compiler Compiler 350 - pp.

25

Comp/PP:

300 -

35-

1: --0+

20

250

15-

200

10 -

150

5 -

Comp/PP:

..._...

I00

0 -5

h --02: - o ...K-..

m,

A

I

,

1

2

A_

_

_, ,.,

A

j,

a

A

A

A

I

,

I

,

,

I

8

9

10

_...."- ........ x.-........ ,---,

50 e......._ ..... _ ..... _ ..... _ ..... _....._

i

Figure

l

I

3 4 5 6 7 RoLlback Distance

10: Run-time

overhead

0

I

0

and code size overhead: 2O

1

I

2

I

I

I

I

..... _ ..... _ ..... :_ |

3 4 5 6 7 RoLlback Distance

WC, COMPRESS,

and

!

I

I

8

9

10

CMP.

Size OH: PUZZLE

"Fmu OH: PUZZLE _) Compiler 1" Compiler 2: - opp. ...x.... Comp/PP: _,-

400 (%) " Compiler I:---0350 Compiler 2: - o .,.)(..,. PP: 3OO Comp/PP: ...a,...

20-

250

15

2OO 150

I0 .X......X

5 ..,_,.....X-...,.X,

L....._"'"'_'" *" " • _, A

0 -5

100

°X°. o." .....X. ....

,

,

I

I

,

,,

50 0

I

!

I

2

I

I

I

I

3 4 5 6 7 8 Rollback Distance

I

I

9

10

0

,OH: QSORT 35-

w T

_ T

_

I

2

3 4 5 6 7 8 Rollback Distance

.... I

_ ..... I

X ...... I

:(. ..... I

X ...... I

X ...... I

X. ..... I

9

X I

10

Size( ;OH: QSORT

')

30-

Compiler I:_vCompiler 2: -o pp. -..x....

,

400 350

a # Q'"a_',/

Compiler I:--oCompiler 2: -o pP: ...K....

,-,

252015-

C_

200250300C omp/PP:

I0-

0-5

/,

_ ..j,.__:.

150

5-

...a,...

,

_,,,_

"

too "_ -_..... -_---,,_ I

I

1

2

I

I

I

_- -_..... _ _ I

I

3 4 5 6 7 Rollback Distance

A

1

!

I

8

9

10

50 0

Jm •

_ I

I

2

db I

Size OH: GREP %) 35 pp:

Tune OH: GREP (%) 10PP: 8- Comp/PP:

-.a,-

30

_

_

I

sl_ I

I

dr ..... I

3 4 5 6 7 Rollback Distance

_Ir...... I

_ ...... |

8

9

]I I

10

-w-

Comp/PP:

- _-

25

6-. ,,

"-A.-A-.&.-A-.A.-A-.4..A

4-

20 #

_

v

v

A..A--A

15

2-

.- A.- -&''"

0

10

-2-

5

..4,

Figure

1

I

!

2

I

I

I

I

I

3 4 5 6 7 Rollb_k Distance

11: Run-time

overhead

0

I

I

I

8

9

10

and

code size overhead: 21

.A...A

I

I

1

2

I

-'A

I

_

I

l

I

I

3 4 5 6 7 8 Rollback Distance

PUZZLE,

QSORT,

and

v

__

l

I

9

I0

GREP.

Time OH: LEX

Size OH: LEX (%) 35 pp.

(%) 10 8

pp.

_,-

Comp/PP:

-.a.-

30

6

25

4

20

2

15

0

_

Comp/I'P:

10

-2

Rollback Time OH: YACC (%) 10 pp:

A''A''A''A''A'-A''_ ""

,,,

Rollback Size OH: YACC st) 35 pp.. Comp/PP: 30

-.a.-

6 4

2 0

A-'"

Distance

-_

Comp/PP: -

-.,s.-

A

5

8

--,,-

v

Distance

--_ -_,-

20 _..._-.

_ .... _-. _-- _"

15 I0 A...&--

-2

A'-_

5

-4

, 1

, 2

, , , , , _ 3 4 5 6 7 8 Rollback Distance

T'mm OH: CCCP (%) I0 pp: 8

_ , 9 10

0

I

2

I

I

I

I

I

3 4 5 6 7 Rollback Distance

I

I

I

8

9

1O

Size OH: CCCP (%) 35 _ pp:

-_

Comp/PP:

I

1

--_-

30 1 Comp/PP:

-,_-

64-

2O 25 t

2-

A

I

0A'"

,



""4 "''A""

I0-_

_.- 4-.._s_. A."

A..A.. 4-- 4'"'"_"15 ,,

A..A-

-A''"

-2 -4

t"

I

I

1

2

I

I

I

I

I

I

3 4 5 6 7 8 Rollback Distance

I

l

9

I0

/

I

I

1

2

t

I

Figure 12: Run-time overhead and code sizeoverhead: LEX, YACC, 22

I

I

I

I

3 4 5 6 7 8 Rollback Distance and CCCP.

I

1

9

I0

Size OH: TBL

Time OH: TBL 10 8

pp:

-_-

60 -

Comp/PP:

"'_"

50

6

..A"'A''A''A"'A--A'"

,,,A

40

4

_,

,

2

. ,,

0

A..

A .--

-

.A

"'.

.A"

.. A

20

,

I0

-4

o

12345678910 Rollback

RoLlback Distance Figure

Read

section

Buffer

lower

13: Run-time

Size

bound

by modifying

and

read buffer sizes.

Given a read buffer, purpose

register

register

reads

average

to save only the

of ten application

programs

configurations

(FIFO)

read buffers

the register

size requirement

file given

of 22V is the

from the GPILF, assuring

time.

If this information

bit field for source

Figure buffer.

at compile

14 illustrates

The register

1 and

during

for source

As long as the required

The study

in this measures with

to be the most efficient.

the read buffer

back to the general

Provided

that the depth

copies of the appropriate

register

a rollback of _< N. worst

rollback,

2), then

are established

six read buffer configurations

are shown

data redundancy

be determined

size design

buffer

for ronback.

are N, redundant

required

read buffer

TBL.

in the _everse order of which the values were saved.

to restore

values required.

using

by first flushing

is not

those

for the read

data required

may also save data which

an extra

code size overhead:

size requirement

Two alternative

of the dual first-in-first-out

The read buffer

and

rollback is accomplished

GPRF

values are available

overhead

Distance

Requirement

the design

the effect on the performance varying

- _-

A'"

_

A practical

Co_np/PP:

30

-2

5

PP"

case.

The buffer

maintains

for all values required. gegister

is added

reads

that

to the instruction

last

The read

encoding

for N cycles,

N

buffer

must be saved

the read buffer can be designed

values are maintained

the

(e.g.,

can as

to save only

a less than 22V

is possible. a case in which

values (denoted

all register

_alue(r_))

which

23

reads

do not have

require saving

to be placed

are marked

in the read

with an "*." Since

: ,, =

¢i ovemow

ovemow I

GPR

Figure

only the required

values

In this case, however, for at least must

from memory.

count must

In the event

to memory Given

the read buffer

the instruction

N cycles.

be pushed

are saved,

14: Read buffer of size < 2N.

total

size can now potentially

be less than N.

also be saved so that the value can be maintained

that the read buffer overflows,

and a record kept so that during

a dual FIFO depth of M, memory

the oldest

rollback

value in the buffer

the value can be retrieved

would serve the function

of the remaining

N - M of the two FIFOs.

5.1

Read

Buffer

Designs

Six read buffer configurations FIFO for each source Configuration

B1 contains

the single FIFO

within

register file design the cycle

bus.

and

were studied. Configuration

Configuration

the same cycle.

A1, shown

A2 allows access

This latter

to either

in Figure FIFO

from either

that both source operands

split-cycle-save

15, has a separate

assumption

source

bus.

can be written

into

is consistent

that writes during the first half of the cycle and reads during the second

[19]. Configuration

to allow access

B2 assumes

to either

[18] was instrumented

no split-cycle-save

a simultaneous

queue from either

The read buffer was simulated C compiler

Methodology

a single FIFO and assumes

a single level dual queue to absorb design

Evaluation

operand source

at the instruction with procedure

for the six read buffer configurations.

Branch

capability.

save and configuration

24

half of

C contains

D extends

this

bus.

level.

The s-code

calls to a simulation

hazards

Configuration

with a

were removed

emitted program

by the IMPACT containing

by the compiler

models

for a rollback

$1 S2

$1 $2

Config.

A1

Comfig. A2

SI $2

SI $2

Config.

B2

of 10. Parameters

at the post-pass the simulation applications

such as which operands

C

code se_nents

on a SPARCserver

from 0 to 20 (note that 20 represents

5.2

Results

5.2.1

D

were adjusted programs

to pass this information used in the evaluations.

490 and run on a DECstation

buffer sizes ranging

Evaluation

Config.

require saving in the read buffer were determined

Table 3 lists the ten s application

were cross-compiled

B1

15: Read buffer configurations.

level and instrumentation program.

Cop.fig. SI $2

Config. Figure

distance

$1 $2

the maximum

to The

3100 with read

read buffer

size of 2N).

Detailed analysis: QUEEN

Figure 16 shows changes in performance overhead (Cycles OH) for various read buffersizesand configurationsrunning the QUEEN

application.Looking at Figure 16, configurationAt, it can

be seen that significant performance impact is incurredeven with a modest reduction in read buffersize. ConfigurationA1 was consistentlythe leastefficient of the six configurationsacross the ten applicationsstudied/ This is due to the fact that the dual FIFO's are dedicated to a singlesource bus. In many casessaving$1 willcause an overflowbecause the $1 FIFO isfull, even though thereisroom in the $2 FIFO. ConfigurationA1 does allow forsimultaneous savesof $1 and $2, given sufficient room in each, but thisfeaturedoes not compensate for the latterinefficiency. 6The

TBL

7An

efficient

application

was not included

configuration

is one

in the read buffer

with _ low performance

25

size evaluation. overhead

given

a small

read

buffer

size.

cycle:OH

cyW OH

100 80

Conf. AI: Conf. A2: Conf. BI:

t

0"]

I

I

0

I

4

I

I

I

--

I

T'-Y-

12 16 Read Buffer Size

either

A2 demonstrates

FIFO.

Configuration

application. impact

In this

configuration

It should handled

be noted that

within

the

same

Conf. C: Conf. D: Conf. B2:

I

20

4

16: Cycle overhead:

the

most

a total

I

0

the improvement

B1 was

with a 35% reduction

-I 80 100 _

l

8

Figltre

Configuration

-o..a..-

gained effident

read buffer

I

I

I

I

I

-o..,_.-.

I

I

8

12 16 Read Buffer Size

20

QUEEN.

by allowing

of the

either

source

six configurations

size of 13 would

bus access

to

for the QUEEN

produce

zero

performance

in read buffer size. configuration

cycle.

B1 assumes

If this latter

shows that no less than 9.4% performance

that simultaneous

assumption

impact

is invalid,

is achieved

saves of $1 and $2 can be Figure

regardless

16, configuration

B2,

of the read buffer size.

The

41

"leveling

off" of B2 is due to the bottleneck

the FIFO.

The fiat part

of S1 and

$2 in the

Figure

QUEEN

16, configuration

and the single

must

be saved

is empty).

$2, distributing

D, shows

bus into either while

point

and not the

requiring

depth of

simultaneous

saves

the

effects.

placed

between

The dual queue

saves over multiple

cycles.

due to cases in which the dual queue

the source

bus

can absorb a single

A nonzero

minimum

has not emptied

before

save occurs.

16, configuration

saves from either

of instructions

level dual queue

some of the bottleneck

is still present

the next simultaneous Figure

C, shows how a single

save of S1 and overhead

the percent

FIFO entry

application.

FIFO can alleviate

simultaneous performance

of the curve shows

at the single

the

Configuration

queue

the

queue. dedicated

results

of an improved

This configuration

avoids

to $2 in configuration

D also has a nonzero

minimum

26

queue

structure

stalls

in some

C is full and

performance

overhead

which cases the

permits (e.g.,

$2

other

queue

but gives

better

Table 4: Read buffer size evaluation

performance

given the ability cycle-save

RBosize

Oil_level

Program

A2[

B1

A2

QUEEN WC

14 10

12 8

1.66 0.00

1.36 2.54

QSORT CMP

16 12

15 11

2.28 0.00

0.94 0.00

GREP PUZZLE

10 10

10 9

0.18 2.87

0.18 0.32

COMPRESS LEX

12 12

12 12

2.87 2.73

1.12 1.55

YACC

16

15

1.07

0.00

CCCP

12

12

2.34

1.74

than configuration

The simulation

results

performance

for QUEEN

configuration

overhead

show that configuration configuration

overhead

5.2.2

Evaluation

of all application

Results

for the other nine application the application

results

and, in the case of configurations Table 4 summarizes configurations, tolerated made

measurements

B1 is the most efficient.

Without

resulting

performance,

and that the split-

in a minimum

A2 is the best of the dual FIFO designs

to maximize

programs

resulting

B1, B2, C, and

s

are similar to those for QUEEN

are the points B2 through obtained

at which the curve _levels

[17]. The differences

off" (i.e.,

D, at what level the performance for the ten applications

for this study that minimal

of read buffer size reduction.

at read buffer size values which produce

A2 does not level off like configuration STwo

efficient

programs

A2 and B1. It is assumed

as a result

A1 is the least

with a read buffer size of 14. For configurations

D, a total read buffer size of 13 is su_cient

between

(_ B1

D is the best of the single FIFO designs

of 4.5%, and configuration

in a 1.7% performance

[

C.

to do split-cycle-saves,

capability,

summary.

D and

For this reason,

given

does not rapidly

overhead

configuration

approach

overhead.

overhead

efficient can be

comparisons

are

Configuration

zero like configuration

must be added to each read buffer size value in C and D to account for the queues.

27

stabilizes.

the two most

performance

low values of performance

the buffer size)

B1. For

a better

comparison

where the performance RB_size

overhead

and the performance

It can be seen from application, B1).

regardless

The measurements split-cycle-save

value is referred read buffer

was achieved.

that

large performance

of the B1 curve around

applications

requirement

(configuration

B1) consistently

between studied

the

buffer

A2)

efficient

sizes

split-cycle-save overhead

assumption

6

configuration

Concluding

25_,

Of

same,

configurations

a maximum

of

A2 and no split-cycle-save of 38.0% reduction to the ultimate

the RB_size

value,

per

A2 and

Given the and an

60_,

assumption,

was achieved.

selection

small

were the most efficient.

the

single

The

of read buffer

decreases

in size can

FIFO

with

A dual FIFO with source

the split-cycle-save

the other four configurations. for minimum

stabilization was achieved

performance

value assuming with an average

for the applications.

and single FIFO

result from small changes

taken in the final selection

and

required

and the performance

read buffer

comparing

the

summary

out-performed

Up to a 55% read buffer size reduction most

to as

overheads.

show that two read buffer configurations

variances

rn|n|mnm

be taken relative

Results

(configuration

is roughly

in read buffer size is achievable.

an average

Read

to each

size

(i.e.,

For configuration

5.2.3

bus access

buffer

size value

- from 8 for WC, to 15 for QSOR.T and YACC.

B1, a

of 50%, and

care should

size requirement

reduction

and configuration

of 20%, a maximum indicate

the read buffer

to as OH_level.

assumption

dependent

show that a considerable

Given the steepness

produce

is application

assumption

measurements size.

overhead

of the split-cycle-save

of 42% reduction

a minimum

A2 and B1, Table 4 gives

value drops below 3%. The read buffer size value is referred

Table 4 that the

The size requirement

average

of configurations

configuration,

in the read buffer size.

of read buffer size in any given

There impact

between

changes

Our results indicate

the ten

capability.

of 39.5% given

It was also found significant

were moderate

no split-cycle-save reduction

capability

that

the

given

the

in the performance that care should

be

design.

Remarks

This paper has presented

a compiler-assisted

compiler-driven

manipulations

data-flow

multiple

with dedicated

28

instruction

rollback

data redundancy

scheme hardware

which combines to remove

data

hazards that resultfrom multipleinstructionrollbac.k. Experimental evaluation of the proposed compiler-assisted scheme with a maximum

rollbackdistanceof ten showed performance impacts of

no more than 6.57% and an averageimpact of 1.80%, over the elevenapplicationprograms studied. The performance evaluationindicateslower performance penaltiesthan forpreviouscompiler-only approac.hesor comparable hardware-only approac.hes.Six read bufferconfigurationswere studied to determine the minimum

sizerequirementforgeneralapplications.It was found that a 55% read

buffersizereductionis achievablewith an average reductionof 39.5%, but that additionalcontrol logicto handle read bufferoverflowsmay limitthe overallhardware savings. Future researchincludesapplicationof compiler-assisted multipleinstructionrollbackrecovery to super-scalar, VLIW,

and parallelprocessingarchitectures. Evaluationsof compiler-assisted

rollbackrecovery applied to speculativeexecution repaLrwould includemodifying compiler transformations to operate in a super-scalarand VLIW

7

environment.

Acknowledgements

The authors wish to thank C.-C. Jim Li for hishelp with the compiler aspects of thispaper, and Scott Mahlke and William Chen for theirinvaluableassistancewith the IMPACT

compiler. We

alsoexpressour thanks to Janak Patel forhiscontributionsto thisresearch.

References

[1]M.

S. Pittler,D. M. Powers, and D. L. Schnabel, "System Development and Technology Aspects of the IBM 3081 Processor Complex," IBM J. Res. Des., vol.26, pp. 2-11,Jan. 1982.

[2]Y.

Tamir and M. Tremblay, "IIigh-PerformanceFanlt-TolerantVLSI Rollback," IEEE Trans. Comput., vol.39, pp. 548-554,Apr. 1990.

[3]C.-C.

J. Li, S.-K. Chen, W.

K. Fuchs, and W.-M.

InstructionRetry," Tech. Rep. CRHC-91-31, Illinois, May 1991.

W.

Hwu,

Systems Using Micro

"Compiler-Assisted Multiple

Coordinated Science Laboratory, Universityof

[4]N.

J. Alewine, S.-K. Chen, C.-C. J. Li, W. K. Fuchs, and W.-M. W. Hwu, "Branch Recovery with Compiler-Assisted Multiple Instruction Retry," in Proc. 22th Int. Syrup. Fault-Tolerant Comput., pp. 66--73, July 1992.

[5]L.

Spalnhower, J. Isenberg,R. Chillarege,and J. Berding, "Design for Fanlt-Tolerancein

System. ES/9000 July 1992.

Model 900," in Proc. 22th Int.Syrap. Fault-TolerantComput., pp. 38-47,

29

[sl

P. M. Kogge, K. T. Trnong, D. A. Richard, sad It. L. Schoenike, "Checkpoint Retry Mechsnism." United States Patent, no. 4912707, Max. 1990. Assignee: International Business Machines

Corporation,

Armonk,

N.Y.

[7] Y.

Tsmir, M. Liang, T. Lal, sad M. Tremblay, for Self-Checking Self-Repairing Computing Comput.,

[81J.

E. Smith

IEEE

[91M.

pp. 178-185,

Comput.,

L. CiaceUi,

Tolerant

June 1991.

and A. It. Pleszkun,

Trans.

"Fault

Comput.,

"The UCLA Mirror Processor: A Building Block Nodes," in Proc. 2Ith Int. Syrup. Fault.Tolerant

"Implementing

vol. 37, pp. 562-573, Handling

pp. 9-12,

May

on the IBM 4341

June

Precise

Interrupts

in Pipelined

Processor,"

in Prac.

11th Int. Symp.

[11] G. L. Hicks, D. Howe, Jr., sad A. Zurla,

Jr., "Insrnction

ing System." United States Patent, no. 4044337, Machines Corporation, Armonk, N.Y.

Tech. J. Digital

[13] E. B. Eichelberger Proc.

l_th

Design

[14] J. S. Liptay, May 1992.

and D. Manley,

Equip.

_rhe

ES/9000

Strategy

for a Data Process-

International

Business

for the VAX 9000 System,"

Digital

High End

Processor

Compilers:

Structure

Design,"

IBM

Principles,

for LSI Testability,"

J. Res.

Techniques,

Dev.,

in

vol. 36, no. 3,

and Tools. Reading,

Graph

Theory

with Applications.

London,

England:

Macmillan

Recovery

using a Read Buffer.

1979.

[1T] N. J. Alewine,

Compiler.assisted

Tech. Rep.

for Multiple-Instruction-Issue pp. 266-275,

[19] J. L. Hennessy

Multiple

CRttC-93-06,

[18] P. Chang, W. Chen, N. Waxter,

Mateo,

Aug. 1977. Assignee:

System,"

1986.

[16] J. A. Bondy sad U. Murty,

PhD thesis,

Mechanism

8600

Corp., vol. 2, no. 4, pp. 13-24, Fall 1990.

[15] A. V. Aho, It. Serial, and J. D. Ullman,

Press Ltd.,

"Design

Retry

and T. W. Williams, "A Logic Design Aurora. Conf., pp. 462-468, 1977.

MA: Addison-Wesley,

Fault-

1981.

"Designing Reliability into the VAX [10] W. F. Brnckert and tL E. Josephson, Digital Tech. J. Digital Equip. Corp., vol. 1, no. 1, pp. 71-77, Aug. 1985.

[12]D. B. Fite, T. Fossum,

Processors,"

1988.

Instruction

University

Rollback

of Illinois

at Urbane-Champaign,

and W.-M. W. Hwu, "IMPACT: Processors," in Proc. 18th Annu.

An Architecture Syrup. Comput.

1993. Framework Architecture,

May 1991.

sad CA: Morgan

D. A. Patterson, Computer Architecture: Kaufmann Publishers, Inc., 1990.

30

A Quantitative

Approach.

San

Suggest Documents