UILU-ENG-93-2220 CRHC-93-11
May 1993
Center for Reliable
and High-Performance
Computing
• . / - .- -: .
_.,
J"
COMPILER-ASSISTED MULTIPLE INSTRUCTION ROLLBACK RECOVERY USING A READ BUFFER
N. J. Alewine,
S.-K.
Chen,
W. K. Fuchs,
and W.-M.
(NASa-Ca.-1931?5) CUMPILER-ASS MULTIPLE INSTRUCTION R_3LLBACK _,LCCI_/F!RY !.J$I _'':-_: _, R_AL) IUFF_R, (Illinois
Univ.)
3_
Hwu
ISTED
N93-zgI70
Uncl
p
G3/61
Coordinated
Science
Laboratory
College of Engineering UNIVERSITY OF ILLINOIS Approved
for Public Release.
Distribution
AT URBANA-CHAMPAIGN Unlimited.
0111503
_s
L]_CL.-\S S ! F I ED SECUmFY C_S_IFI_rION
OF f_JS PAGE
REPORT
DOCUMENTATION
la. REPORT SECURITY CLASSIFICATION
lb.
PAGE
RESTRICTIVE MARKINGS
Unclassified
None
2a. SECURITY CLASSIFICATION
AUTHORITY
2b. DECLASSIFICATION / DOWNGRADING
3. DISTRIBUTION/AVAILABILITY Approved
SCHEDULE
for
distribution 4. PERFORMING
ORGANIZATION
REPORT NUMBER(S)
OF PERFORMING ORGANIZATION
Coordinated
Science
University
of
6c. ADDRESS(_
6b. OFFICE N/A SY'IVlBOL
Illinois
NASA 7b.
XX__Y_X_X_
1308 IL
&¢. Main
Moffitt
Security
ALEWINE,
Instruction
N.
J.,
113b.TIME
S.-K.
I
._Iultii)le
rapid hazards
been This
ing !.,t.,
by
f:om
providing
PROJECT
W.
Recovery
I TNAOSK
K.
Fuchs,
Using
and
May I4. DATE1993 OFREPORT
paper
focuses some
operand
a
W.-M.
indicate
is a Iechnique processor
redundancy rollback
Read
_a_Mocrth,
while
wit h compiler-d efficiency
31 COUNT IS. PAGE
others
[]
rom are hazard
over
previous
SAME AS RPT.
84 MAR
by bl_k
instruction
numbed
retry,
retry
been
implemented
in
Ilardware-based
MIR
in hardware.
hazzards
directly
resolved
more
removal
can
efficiently
and
J21. ABSTRACT [-1 DTIC USERS Ij
M[I_
to
rollback
designs
have
transformations.
resolved compiler
by
X\"e
provid-
transformations.
A
hardware-implemented
Experimental compiler-based
recovery.
emciently
combines
SECURITY
computers
eliminate
instructionrollback be with
which
transformation.s
hardware-based
data-flow
multiple
rollback
is developed
mainframe designs
Compiler-based
with
to achieve
instruction scheme
riven
ar_didenti_
recovery,
assisted
has
techniques
resulting
OF ABSTRACT
that
i,nplemeuted
instructionrollback
OF RESPONSIBLE INDIVIDUAL 1473,
Buffer
Oa_
performance
eval,:a-
schemes.
CLASSIFICATION
Unclassified
122b.TELEPHONEOncIudeAre,
Code) IZ2c. OFF,CE SYMBOL
I FORM
ACCESSION NO.
humor)
failures.
data
compiler-assisted
buffer
improved
by bl_k
(.\111{)
hazards
multiple
redundancy
_IUNCLASSIFIEDAJNLIMITED
DD
UNIT
Hwu
onrever_if_ece_a_ error
hardware
transien_
remove
on
data
read
aodidenti_
data
which
20. DISTRIBUTION/AVAILABILITY
22a. NAME
rollback
recovery
compiler-assisted
tions
PROGRAM
fault-tolerance,
ifnece_a_
developed
that
an
Chen,
compilers,
instruction
pro_i,te
observe
WORK
NO.
18. S_BJEETTERMS(Continue SUB-GROUP
ABSTR_CT(Continueonreve_e
also
NUMBERS
ELEMENT NO.
Rollback
TO
COSATICODES GROUP
data
NUMBER
NOTATION
I !9
CA
VA
COVERED
! FROM
Technical
FIELD
Research
andZIPCodc)
Field,
10. SOURCE OF FUNDING
Multiple
13a. TYPE OF REP'ORT
17,
Naval
Oa_fication)
Compiler-Assisted AUTHOR(S)
State, FL
of
9. PROCUREMENTINSTRUMENTIDENTIFICATION
State, and ZlPCode)
16. SUPPLEMENTARY
NUMBER(S)
(If applicab/c) I Bb. OFFICE SYMBOL
7B
12. PERSONAL
REPORT
Office
Arlington,
7A
1. TITLE Onclude
and
AOORESS(Ciry, Boca Raton
St.
61801
8a. NAME OF FUNDING/SPONSORING ORGANIZATION
_.AODRESS(O_
ORGANIZATION
7a. NAME OF MONITORING ORGANIZATION Intl Business Machines
(If applicable)
Lab
State, and ZIPCodc)
Urbana,
release;
unlimited
CRHC-93-11
UILU-ENG-93-2220 6a. NAME
5. MONITORING
OF REPORT
public
83 APR edition may be used until exhausted. All other editions are obsolete.
I SECURITY CLASSIFICATION OF THIS PAGE UFCLASS
IFIED
COMPILER-ASSISTED ROLLBACK
MULTIPLE
RECOVERY
N. J. Alewine
Center
USING
INSTRUCTION A READ
x, ,ft.-I(. G"hen, W. K. Fuch_,
W.-M.
BUFFER
Hum
for Reliable and High-Performance Computing Coordinated Science Laboratory 1308 West Main Street University Urbana, Primary
contact:
of Rlinois IL 61801 W. Kent Fuchs
Phone: (217) 333-8294 FAX: (217) 244-5686 e-mail to
[email protected],
edu
May, 1993
ABSTRACT Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compilerbased MIR designs have also been developed which remove rollbackdata hazards directlywith data-flowtransformations. This paper focuseson compiler-assisted techniquesto achievemultipleinstructionrollbackrecovery. We observe that some data hazards resultingfrom instructionrollbackcan be resolved efficiently by providingan operand read bufferwhile othersare resolvedmore efficiently with compilertransformations.A compiler-assisted multipleinstructionrollbackscheme isdeveloped which combines hardware-implemented data redundancy with compiler-drivenhazard removal transformations.Experimental performance evaluationsindicateimproved efficiency over previous hardware based and compiler-basedschemes.
/ndez
terr_:
fault-tolerance,
error recovery,
instruction
retry, compilers,
hardware
assisted
retry.
aInternationM Business Mschines Corporation, Boca ]Lston, FI. This research wu supported in part by the National Aeronautics and Space Administration (NASA) under grant NASA NAG 1-613, in cooperation with the Illinois Computer Laboratory for Aerospace Systems and Software (ICLASS), and in part by the Department of the Navy and managed by the Office of the Chief of Naval Research under Contract N00014-91-J-1283.
1
Introduction
Instruction Multiple
retry is a technique instruction
rollback
when error reporting When
transient
tiple instruction checkpointing
processor
1.1
latencies
and rollback
is particularly
are greater
retry)
[1-6].
[2-5], or re-execution
algorithm-based,
multiple
instruction
recovery
from
transient
appropriate
faults
Multiple
instruction
of a few cycles
or control-flow
rollback
retry
(also
latencies
referred
alternative
or
within
methods
to as mul-
to system-level
a sliding
[7], can be implemented
error detection
system.
cycle.
can be an effective instruction
in a processing
when error detection
than a single instruction
errors occur,
retry or simply
window
in parallel
for recovery
of
with
from transient
errors.
Hardware-Based
Hardware
implemented
and 2) incremental space
at regular,
back
to the appropriate system
by undoing,
Instruction instruction
checkpointing.
state
to the
recovery
processor
a few instructions concurrent,
for rapid recovery
state
Rollback
retry schemes
Full checkpointing
or predetermined, checkpointed in a "sliding
or "backing-out"
belong
intervals. system
state.
window".
the system
state
Upon
to one of two groups:
maintains
"snapshots"
Upon error detection, Incremental
of the required system the system
checkpointing
error detection
changes
1) full checkpointing
the
maintains
system
up to the instruction
can be rolled
state
changes is restored
in which the error
occurred. The issuesassociatedwith instructionretryare similarto the issuesencountered with exception handling in an out-of-orderinstructionexecution architecture.If an instructionis to write to a registerand N is the ma_mum
error detectionlatency (or exception latency),two copiesof the
data must be maintained forN cycles.Hardware schemes such as reorder buffers,historybuffers, future files[8],and micro-rollba_k[2]differin where the updated and old values reside,circuit complexity,CPU
cycletimes,and rollbackefficiency.
Table 1 gives a descriptionof varioushardware-ba_ed methods to restorethe generalpurpose registerfile contentsduring singleor multipleinstructionrollback.In the VAX
8600 and VAX
errorsare detected prior to the completion of a faultyinstruction.For most VAX
9000,
instructions,
updates to the system state occur at the end of the instruction.If the error is detected prior to the updating of the system state,the instructioncan be rolledback and re-executed.Ifthe system
Table 1: Hardware-based Rollback
single
Scheme
Rollback
Type
Distance
full full
VAX 8600 [10]
full
full
IBM patent
4,912,707
[6]
IBM patent
4,044,337
[11]
incremental
micro-rollback [2]
incremental
history
incremental
buffer [8]
VAX 9000 [12]
has
cannot
changed
be accomplished.
The IBM 4341, require shadow state the
file structures
additional
scan
old data from being
the delayed
write
and the maximum The history
until
In a delayed
impact
introduced
rollback
distance
buffer scheme
circuitry
file port
the
complicates
and
testability.
write
error
by the
physical
that
file
instruction
4,044,337,
rollback
IBM 3081
latency
detection
bypass
circuit
provides
2 The VAX 8600
a delayed
the system
overhead,
although
this feature
without
VAX 9000
schemes
and
write
has expired;
recent
to forward
circuitry
file all
of only one instruction.
latency
the most
is required
and history
This data is used to restore
fries by using
scheme,
circuitry
patent
can add significant
IBM 4341
the
write
buffer
to prevent
ensuring
values
that
the
are contained
in
this data on subsequent
is a function
of the
register
reads. Me size
[2].
maintains
fore does not require bypass which
IBM data.
file structures
to obtain
buffer, and bypass
The performance
4,912,707,
also avoids shadow
overwritten
virtualfile
is not required for the VAX 8600 and VAX 9000.
require an error detection
scheme
data is fault-free.
variable
redundant
[13] of the
incurred
files, however,
The micro-rollback
new
design
cost over that
avoid shadow
Shadow
file
registerfile registerfile historybuffer re_ster file shadow file
vaxiable
data storage
to maintain
shadow
single instr. registerfile not required
IBM patent
during rollback recovery. level sensitive
singleinstr. registerfile variable feaster file singleinstr. registerfile write buffer variable
not required shadow file shadow files
of the error, a flag is set to indicate
Redundant
IBM 3081,
Redundant
Primary
shadow file
incremental
prior to detection
schemes.
Location of Data
variable
full
rollback
singleinstr. feaster file 10-20 instr. registerfile
incremental
IBMz/s 9000[5]
instruction
Checkpoint
IBM4341[9] IBM3o81[z]
history fih [8]
state
and multiple
redundant [8]. The history
file design
data in a separate
buffer does however
and can impact
2The 126 scan rings of the IBM 3081 contains 35,000 bits of data.
2
push-down
performance
array and there-
require an extra by increasing
fih
register access
times. In an effort to ity relative
increase
(VRM)
system
registers
into 32 physical
physical
register
was primarily
1.2
to provide
result
register
are removed
being assigned assigned
to physical
by the compiler. redundancy
1.3
to physical
registers,
Compiler-based tions.
assistance unique superior
hardware
to resolve
characteristics performance
multiple
instruction
This paper introduces
data redundancy
2) machine-code
of differing to either
the
the VR.M system performance,
In the VRM until
it has
extension,
the error detection
recovery
(or just
the maximum level,
have
been
data
hazards)
rollback
inves-
hazards
that
are
identified
distance.
Antide-
or the code level prior to variables
level,, or the code level in which variables
instruction
rollback
assembler-level
reduces
rollback
are
code emitted
the requirement
for data
approaches.
Rollback resolves
a compiler-assisted
the remaining
obsolete,
to remove
level, which represents
rollback
to resolve
rollback
data hazards
instruction
Instruction
multiple
becomes
register.
manipulations
1) pseudo-code
in hardware-based
Compiler-Assisted
architectural
system
recovery.
register
the eight
Although
is postponed
instruction
Rollback
and 3) post-pass
Compiler-based
logic present
to multiple
levels:
registers,
a virtual
register
in the physical
__ N, where N represents
at three
introduced maps
compatibil-
Rollback
rollback.
3 of length
code
improve
in rollback
register
down-level
register.
and therefore
MIR uses data-flow
instruction
dynamically
as a new virtual
to a new virtual
approaches
Compiler-based
9000 has
the data in a physical
to assist
Instruction
from multiple
pendencies
circuitry
for the data contained
compiler-based
by antidependencie#
When
data redundancy
Compiler-Based
[3,4].
The VRM
for reassignment
has been exceeded
tigated
[14].
maintaining
the IBM E/S
to reduce register pressure
of a physical
Recently,
file size while
registers,
registers.
is released
intended
been extended
latency
register
to the 16 architectural
management
remapping
the
all data
instruction
one type of rollback
hazards. hazard
Experimental
types,
a hardware-only
hazards rollback
using scheme
data hazard results
relying
on compiler
that by exploiting MIR design
instruction
3For a complete presentation of dat_-flow properties and manipulation methods, see [15]. 3
transforma-
which uses dedicated
while
indicate
the new compiler-assisted or compiler-based
compiler
rollback
the
can achieve scheme.
2
Error
2.1
Model
Rollback
and
Data
Hazard
Hazard
Classification
Model
The followingfour assumptions areused in the generalerrormodel: i)the maximum latencyisN instructions, 2) memory
errordetection
and I/O have delayed writebuffersand can rollbackN cycles,
3) the statesof the program counter and program statusword (PSW) recordingdevice or by shadow registers [2],and 4) the CPU
are preservedby an external
state can be restoredby loading the
correctcontentsof the registerfile, progrmm counter,and PSW. Given the above assumptions,any errorwhich does not manifest itself as an illegal path in the control-flow graph (CFG) of the program isMlowed provided that the followingtwo conditionsare satisfied: I) registerfilecontentsdo not spontaneouslychange, and 2) data can not be written to an incorrectregisterlocation.There are four targetederror types: 1) CPU caused by an ALU
errorssuch as those
failure, 2) incorrectvalues being read from I/O, memory, the registerfile, or
extern_lfunctionalunits such as the floatingpoint unit, 3) correct/incorrect values being read from incorrectlocationswithin the I/O, memory, or register file, and 4) incorrectbranch decisions resultingfrom errortypes i, 2, or 3.
2.2
Hazard
Classification
The code can be representedas a CFG
G(V',E), where V isthe setof nodes denoting instructions
and E is the set of edges denoting control-flow. If there is a directcontrol-flowfrom instruction i, denoted denote
I_, to lj, where I_ E V and
the smallest
The hazard registers)
whose
of instructions
values are inconsistent classification
Proof:
z will be in an inconsistent
'A
wo/k is a sequence
of edge
and
_n error occurring state
a sequence
as the set of pseudo executions
registers
of an instruction
in a graph
of instructions
z is defined
(or machine sequence
due
where
4
I1, I2,...,
by IN.
during the walk.
the edges
IN which form a
during the walk.
in Il will be detected
since it was defined
traversals
Let d,,_,_(I_, Ij)
set Hregm follows.
1: z E Hre_e iff there exists
For the i.fcase,
is defined
during different
of hazard
legal walk 4 in G such that z is live at/1,
E E.
along any path from I_ to Ij.
set Hregs of the error model
to retry. A formal Property
number
Ij E V, then there is an edge (I_, Ij)
visited
During
Since
can be repeated
the retry of I1,
z is live at I1, there
[16],
is some path along which z isused priorto itsredefinition, and sincez isin an inconsistentstate, z E Hregm. For the only ifcase,we suppose the contrary.Assume
that among
alllegalwalks of
length N in G, eitherz isnot liveat the beginning,or z isnot definedduring the walk. It then followsthat z eitherhas no use, or z isnot changed. (The errormodel does not allow a write to a wrong locationand the contentsof registerz can not spontaneously change.) Therefore there is no inconsistency
problem
for z, which implies
Property
2: Hazards
can be classified
pendencies
of length
branch boundaries, Proof: /1, and that
referred
index
that
and the content
Ii-1.
hazard
Since d_i,,(It, An on-path
the corrupted and branch hazard
3
in sequence,
defining of register
Ii defines z, where i E {1, 2, ..., with
These
Ix, such that Ij and
z along
two hazard
types
model
value.
W2 referring
l/) _< N, there is a hazard
a branch
at
z is live at implies
does not allow a write
Ij along
an antidependency
appear
The latter
that
exists
that
in G, such that
1 implies
Property
as antide-
may overlap.
N).
the first instruction
occurs
and 2) those
z has a different
Wx (the error
appear
change).
on z. Case 2: if W2 _ Wx, there
to
Let i be the largest
there
of length instruction
exists
a legal walk
z is a use. _< N,
and
Case
there
is
It between
Ix and
z, and after rollback,
Ij uses
on z at a branch boundary. when Ii defines
z value prior to its being redefined. will be denoted
hazards,
that
z can not spontaneously
Ii constitute
or branch data hazard
hazards
1) those
a legal walk Wx = I1,I2,...,IN
of Ix,I2,...,IN
1: if W2 C W1, instructions an on-path
exists
is at least one instruction
W2 in G, beginning
to as on-path
to as branch hazards.
Since z E H, there
a wrong location
as one of two types:
N [3].Since nop insertioncan be costlyto performance, previous compiler transformationsremoved allhazards possible,leavingonly unresolvablehazards to be removed by the post-passtransformation. In Section 3.1.2,a new post-passtransformationwas introduced in which nop insertionwas replacedby read insertions as the primary hazard removal technique.As illustrated in Figure 6, up to two branch hazards can be removed by a singleread instruction.The new post-passtransformation isvery efficient and in some casescan resolvebranch hazards with lessperformance impact than pseudo-leveltransformations.Figures 11 and 13 of Section 4.2 show performance overhead comparisons between compiler-drivendata-flowmanipulationsand the post-passtransformationfor the PUZZLE
and TBL
applicationsdescribedin Table 3 of Section4.1. Comp//PP
indicatesthat
hazards areresolvedby the compilerwhere possible, with the remaining hazards being resolvedat
13
the pOstopass level. that all hazards
PP (post-pass)
are removed
For the PUZZLE post-pass remove
all hazards
performance appUcation,
performance
better
is infrequent.
impact
than
produce
than
The save/restore when
loop
the post-pass
and
to
of compiler
and
a guaranteed
As demonstrated
but small
by the PUZZLE
impacting
performance
of loop protection
protection
than the
transformation
combination
without
operations
performance
introduces
path length. hazards
better
the
via read insertion
can eliminate
read insertion
using
performance
Hazard elimination
renaming
have been disabled
can result
is frequent,
when in more
as demonstrated
by
for the TBL application.
Figure removal: removal
7 illustrates 1) hazard
the potential
removal
is executed
instructions,
produce
two times,
impact
be used to aid in loop protection
3.3.2
Profiling
Profiled
data was included
comprised
for areas
profiling,
of 10 times assigned
a loop
depending
weights
Protection
than
that results
would
in loop protection,
require
the execution
frequencies
20 times and the hazard of 40 additional
of only two additional
As shown
of hazard
and 2) hazard
the execution
were reversed,
loop protection.
two types
instructions.
then read insertion
in Figure
7, profiling
decisions.
would data can
"'
effectiveness
of both dynamic
a supplement
loop protection
execution
given the following
loop of Figure 7 is executed
would require
instruction
more performance
renaming
If the protected
where read insertion
If the loop and hazard
effect on performance
using register
using read insertion.
instruction
static
transformations
For the TBL appl/cation,
slightly
register
transformations
phase.
due to the longer instruction
pseudo
loop protection
results
produces
impact
compiler
alone.
transformations.
that compiler
at the post-pass
application,
transformation
post-pass
indicates
in the pseudo-level profile sampling
of the application is assumed
transformations and static
code that
to iterate
prediction.
3.2.
The static
are unexecuted
ten times.
on the depth of loop nesting.
of Section
during
Inner loops,
All loop header
The profile prediction
profile
data is
is used as
sampling.
therefore,
iterate
nodes and
hazard
For
multiples nodes
are
condition:
if
based on the profile data.
of loop
I due to hazard
node
nh is required protect
nh_weight
> 3 • (hdr_node(1)_weight),
then
to account
for both
loop protection
direct and
indirect
14
loop
based
on the following
I. The constant
costs.
Direct
3 adjusts
loop protection
the weights costs
result
Read Insertion
Loop Protection
save I ,.. rx dead
rt = _
1211.0
"
.....
I
i |
!
change
:
all _'s
]
•
mlr s [
profile dam
I
Figure 7: Loop protection
versus read
insertion.
from the save/restoreinstructionpair shown in Figure 7. Indirectloop protection costs result from: 1) an increasednumber of hazards which in turn requiredmore node splitting and more loop protection,and 2) increasedregisterusage due to the save/restoreinstructionswhich can result in additionalregisterspills.Figure 8 shows the run-time overhead for the TBL
applicationwith
rollbackdistancesfrom I to 10. Pro//PP indicatesthat profiling data was used in loop protection decisions. The resultsshow that the use ofprofile data can improve applicationperformance by postponing some hazard resolutionsuntilthe post-passphase. Using profiledata to aid in loop protection decisionsdid not produce performance equal to that forthe post-passtransformation,forthe TBL application.As an extensionto thiswork, profile data can be used to aid in registerallocation.As discussedin Section 3.2,hazards that are present afterpseudo registerrenaming are resolvedby adding hazard constraintsto liverange constraintsprior to registerallocation.These additional constraintscan cause increasedregisterspillageand impact performance. Similar techniques to those developed forloop protectioncan be used to enhance registerallocationdecisions.
15
Time
OH: TBL
10-
pp:
8-
--.,,iJi-° -i_lr o
Comp/PP: Prof/PP:
°o_°oo
_
&
&.
•
2
_
&
," o..-'"_..
•
I"
&
.." .." _
A
,-_.:'-i"
"2-. ..:. ....... "_,,-0"
"0. ......
n
0 -2 -4
I
I
1
I
I
I
12345678910 Rollback
I
I
I
I
Distance
Figure 8: TBL: profile data used for loop protection
Performance
4 4.1
Evaluation
Implementation
The hazard
and
removal
of the IMPACT
C compiler
machine
and before called
register
Table 3 lists the eleven
3100.
hazards
are called
allocation. code output
routines
The results
pseudo
in the MIPS register
register
allocation.
algorithm,
code generator
hazards
after the live range constraints
The nop insertion
(loop
protec-
Transformations
have
been
or post-pass
generated
algorithm,
is
routine.
application
on a SPARCserver
resolving
are called just before
programs
used in the evaluations.
490 and then the compiled
Static Size is the number of assembly
the library
have been implemented
[18]. Transformations
register
before the assembly
Programs
algorithms
and loop expansion)
physical
cross-compiled
Application
transformation
tion, node splitting, resolving
decisions.
instructions
emitted
program
The applications
were
was run on a DECstation
by the code generator,
not including
and other fixed overhead.
are summarized
plot shows the percent
of run-time
in Figures overhead
and the second plot shows the percent
9 through
13. Each
figure contains
( Time 01t) of the referenced
of code growth
overhead
were evaluated.
Compiler
two plots,
the first
resolution
scheme,
hazard
(Size OH) relative
to the base values
in Table 3. Four hazard
resolution
ing the compiler-driven
techniques
data-flow
manipulations.
Compiler
16
I resolves on-path
2 extends
the compiler
hazards
only, us-
transformations
Table3: Applicationprograms. Program
Static
QUEEN WC QSORT CMP GR,EP PUZZLE COMPRESS
to resolve
both
tions and compiler assumes
Description
148 181 252
eisht-queen program UNIX utility
262
UNIX
utility
907
UNIX
utility
quick sort algorithm
simple game UNIX utility
LEX
932 1826 6856
YACC TBL
8099 8197
parser-generator table formatting
CCCP
8775
preprocessor
hazards.
PP
on-path
relies solely
and
branch
on the post-pass
transformations
to resolve
a read buffer to resolve
remaining
Size
branch hazards.
lexical
transformation
on-path
hazards,
represents
preprocessor
for gnu C compiler
(post-pass)
disables
presented
branch hazards
Comp/PP
analyzer
the compiler
in Section
with the techniques
and uses the post-pass the compiler-assisted
3.1.2.
described
transforma-
Comp/PP
uses
in Section
3.2,
transformation
multiple
to remove
instruction
rollback
scheme. Due to the excessive large
applications,
COMPRESS,
compile
times
the evaluations
CMP, PUZZLE,
of the previous
of these
schemes
sad QSORT.
Compiler
1 and
were restricted
Both Comp/PP
Compiler
2 algorithms
to applications
QUEEN,
sad PP were evaluated
for WC,
for all eleven
applications.
4.2
Performance
Compiler ways.
analysis
transformations
Loop protection
used for the removal of data hazards
inserts save/restore
the path length
and,
more
to be generated,
spill code
can be costly MOV
therefore,
operations
the run time. increasing
since up to N hops could be inserted
rk, rk instructions
to create
covering
on-path
17
performance
at the head and tail of the loop.
Additional memory
can impact
arcs in the dependency
references
and
cache
for each unresolved hazards
misses.
hazard.
in the post-pass
in several
This increases
graph can cause Nop insertion The insertion
transformation
of also
increasespath lengths, code size, mainly numbers
9 through
3100 after they have
Results:
typically
due to loop expansion,
shown in Figures
DECstation
4.3
although
less than with nop insertions. may cause
more run-time
13 are for execution
been compiled
cache misses.
of the eleven
with the transforms
Finally,
application
the increase
in
The performance programs
on a
described.
Compiler
As can be seen in Figures9 through 11,extendingthe compiler hazard resolutionscheme to include branch hazards introduceslittle incrementalperformance impact or code growth overhead. Given a rollbackdistanceof 10,resolvingboth on-path and branch hazards using compilertransformations resultedin a maximum
performance impact of 32.6% and an average performance impact of 12.6%.
This compares with maximum
and average impacts of 35.4% and 15.4%, respectively, forcompiler-
drivenon-path hazard resolutiononly.The maximum
code sizeoverhead measured forthe extended
compiler-basedtechnique was 328% with an average overhead of 207%, for a rollbackdistanceof 10. This compares with a maximum
and average overhead of 372% and 225%, respectively, for the
unextended compiler-basedscheme. These resultsindicatea small incremental run-time performance overhead and a small code sizeoverhead given compiler-basedbranch hazard removal compared to compiler-based on-path hazard removal alone. Three factorsaccount forthese small incremental impacts. First,on-path hazards dominate in frequency of occurrence.Second, resolvingan on-path hazard at instruction Ii through renazning can sometimes resolvea branch hazard at instructionIi. Third, resolving on-path hazards with nop insertionmay resolvea corresponding branch hazard by increasingthe distancebetween the hazard node and itsnearestpredecessorbranch node.
4.4
Results:
PP
Figures 9 through 13 show the run-time and code sizeoverheads foreach applicationstudied using the read bufferto resolveon-path hazards and the post-passtransformationdescribedin Section 3 to cover allbranch hazards. The resultsare worst case in that many
of the branch hazards
could have been resolved with no performance impact using the compiler techniques;instead, they are resolvedby the insertionof MOV
instructions which cause a guaranteed,although small,
performance impact. Given a rollbackdistanceof 10, the post-pass transformation produced a
18
maximum
performance
impact
below the levels produced correspondingly
4.5
lower
Results:
The
by the compiler-baaed
with a maximum
tions and slightly
scheme
with an average
better
performance
performance
Code growth
of 2.43%,
overhead
of 13.0% and an average
overhead
dicate
techniques
compiler
compiler
techniques,
growth.
The primary buffer
than
scheme
produced
of 2.03%,
and
significantly
measurements
overhead
The run
frequent
were
of 8.59%.
20
Comp/PP:
+
across
only.
performance code
growth
of PUZZLE,
run-time
of requiring
Given
schemes
a rollback of 6.57%
overhead
of 51.2%
YACC,
re, compilation
all appUca.
impact
performance
mad post-pans
on-path
and
CCCP
penalties.
and additional are their
p /:
in-
These code
utilization
hazards.
Size OH: QUEEN ) ('_ Compiler h --oCompiler 2: -o 350 pP. ...K.... 400-
35(%)
3250t
time results
of the compiler-aasisted
Time OH: QUEEN 2: - .0. ...x.... h .-.a-
a maximum
axe still useful in reducing
the more
overheads
transformation
a maximum
have the disadvantage
advantage
Compiler pP. Compiler
low performance
with the post-pass
of 15.5%.
however,
to resolve
consistently
impact
with and an average
of the read
overhead
achieved
of 10, the compiler-aasisted
that
scheme.
impact
Comp/PP
compiler-assisted
distance
of 7.695{ with an average performance
,_ /9
3250_0
#/
15 I0
n..._ ---a "'"
200 150 C__
°° ....
50 & "
1
2
I
I
I
3 4 5 6 7 Rollback Distance
8
9
Figure
overhead
9: thin-time
0
10
0
J 1
#
A
and code size overhead:
19
A
,&
A
A
A
m T .....T---T .....7" ....Y .....? 2 3 4 .5 6 7 8 Rollback Distance
QUEEN.
&
? 9
&
? 10
Size OH: WC (%) 400 Compiler 1: --e350 Compiler 2: - opp.. ...x.... 300 Comp/Pp. .-_-.-
Time OH: WC (%) 35 Compiler 1: _ 30 Compiler 2: - opp.. ...x.... 25 Comp/Pp. _
15 20 10
2OO 250 150
5 0
_
-5
, 1
Tm_
100 50 , 2
, , , , , 3 4 5 6 7 Rollback Distance
, 8
0
, , 9 10
, 1
0
, '_" V Y 7 , 2 3 4 5 6 7 Rollback Distance
, 8
, , 9 10
Size OH: COMPRESS
OH: COMPRESS
35-,(')Compiler 11 30 Compiler -opP. ...K-.-. 25 Comp/PP: -.,t-
400- )Compiler 1: (: Compiler 2: - o 350 pP. ...x.... 300 Comp/PP: ..._...
,=
15 20
200 250
/f /
105
__ .a..
0
"--w'"
..d
"" "
150 100
_"_
-5
, 1
50 , 2
, , , , , , 3 4 5 6 7 8 RoLlback Distance
, , 9 10
0 0
I
2
3 4 5 6 7 8 Rollback Distance
9
10
Size OH: CMP
Tin_ OH: CMP (¢,
;) Compiler
3O
Compilerpp. 2: -...x....o -
400(._;) Compiler Compiler 350 - pp.
25
Comp/PP:
300 -
35-
1: --0+
20
250
15-
200
10 -
150
5 -
Comp/PP:
..._...
I00
0 -5
h --02: - o ...K-..
m,
A
I
,
1
2
A_
_
_, ,.,
A
j,
a
A
A
A
I
,
I
,
,
I
8
9
10
_...."- ........ x.-........ ,---,
50 e......._ ..... _ ..... _ ..... _ ..... _....._
i
Figure
l
I
3 4 5 6 7 RoLlback Distance
10: Run-time
overhead
0
I
0
and code size overhead: 2O
1
I
2
I
I
I
I
..... _ ..... _ ..... :_ |
3 4 5 6 7 RoLlback Distance
WC, COMPRESS,
and
!
I
I
8
9
10
CMP.
Size OH: PUZZLE
"Fmu OH: PUZZLE _) Compiler 1" Compiler 2: - opp. ...x.... Comp/PP: _,-
400 (%) " Compiler I:---0350 Compiler 2: - o .,.)(..,. PP: 3OO Comp/PP: ...a,...
20-
250
15
2OO 150
I0 .X......X
5 ..,_,.....X-...,.X,
L....._"'"'_'" *" " • _, A
0 -5
100
°X°. o." .....X. ....
,
,
I
I
,
,,
50 0
I
!
I
2
I
I
I
I
3 4 5 6 7 8 Rollback Distance
I
I
9
10
0
,OH: QSORT 35-
w T
_ T
_
I
2
3 4 5 6 7 8 Rollback Distance
.... I
_ ..... I
X ...... I
:(. ..... I
X ...... I
X ...... I
X. ..... I
9
X I
10
Size( ;OH: QSORT
')
30-
Compiler I:_vCompiler 2: -o pp. -..x....
,
400 350
a # Q'"a_',/
Compiler I:--oCompiler 2: -o pP: ...K....
,-,
252015-
C_
200250300C omp/PP:
I0-
0-5
/,
_ ..j,.__:.
150
5-
...a,...
,
_,,,_
"
too "_ -_..... -_---,,_ I
I
1
2
I
I
I
_- -_..... _ _ I
I
3 4 5 6 7 Rollback Distance
A
1
!
I
8
9
10
50 0
Jm •
_ I
I
2
db I
Size OH: GREP %) 35 pp:
Tune OH: GREP (%) 10PP: 8- Comp/PP:
-.a,-
30
_
_
I
sl_ I
I
dr ..... I
3 4 5 6 7 Rollback Distance
_Ir...... I
_ ...... |
8
9
]I I
10
-w-
Comp/PP:
- _-
25
6-. ,,
"-A.-A-.&.-A-.A.-A-.4..A
4-
20 #
_
v
v
A..A--A
15
2-
.- A.- -&''"
0
10
-2-
5
..4,
Figure
1
I
!
2
I
I
I
I
I
3 4 5 6 7 Rollb_k Distance
11: Run-time
overhead
0
I
I
I
8
9
10
and
code size overhead: 21
.A...A
I
I
1
2
I
-'A
I
_
I
l
I
I
3 4 5 6 7 8 Rollback Distance
PUZZLE,
QSORT,
and
v
__
l
I
9
I0
GREP.
Time OH: LEX
Size OH: LEX (%) 35 pp.
(%) 10 8
pp.
_,-
Comp/PP:
-.a.-
30
6
25
4
20
2
15
0
_
Comp/I'P:
10
-2
Rollback Time OH: YACC (%) 10 pp:
A''A''A''A''A'-A''_ ""
,,,
Rollback Size OH: YACC st) 35 pp.. Comp/PP: 30
-.a.-
6 4
2 0
A-'"
Distance
-_
Comp/PP: -
-.,s.-
A
5
8
--,,-
v
Distance
--_ -_,-
20 _..._-.
_ .... _-. _-- _"
15 I0 A...&--
-2
A'-_
5
-4
, 1
, 2
, , , , , _ 3 4 5 6 7 8 Rollback Distance
T'mm OH: CCCP (%) I0 pp: 8
_ , 9 10
0
I
2
I
I
I
I
I
3 4 5 6 7 Rollback Distance
I
I
I
8
9
1O
Size OH: CCCP (%) 35 _ pp:
-_
Comp/PP:
I
1
--_-
30 1 Comp/PP:
-,_-
64-
2O 25 t
2-
A
I
0A'"
,
•
""4 "''A""
I0-_
_.- 4-.._s_. A."
A..A.. 4-- 4'"'"_"15 ,,
A..A-
-A''"
-2 -4
t"
I
I
1
2
I
I
I
I
I
I
3 4 5 6 7 8 Rollback Distance
I
l
9
I0
/
I
I
1
2
t
I
Figure 12: Run-time overhead and code sizeoverhead: LEX, YACC, 22
I
I
I
I
3 4 5 6 7 8 Rollback Distance and CCCP.
I
1
9
I0
Size OH: TBL
Time OH: TBL 10 8
pp:
-_-
60 -
Comp/PP:
"'_"
50
6
..A"'A''A''A"'A--A'"
,,,A
40
4
_,
,
2
. ,,
0
A..
A .--
-
.A
"'.
.A"
.. A
20
,
I0
-4
o
12345678910 Rollback
RoLlback Distance Figure
Read
section
Buffer
lower
13: Run-time
Size
bound
by modifying
and
read buffer sizes.
Given a read buffer, purpose
register
register
reads
average
to save only the
of ten application
programs
configurations
(FIFO)
read buffers
the register
size requirement
file given
of 22V is the
from the GPILF, assuring
time.
If this information
bit field for source
Figure buffer.
at compile
14 illustrates
The register
1 and
during
for source
As long as the required
The study
in this measures with
to be the most efficient.
the read buffer
back to the general
Provided
that the depth
copies of the appropriate
register
a rollback of _< N. worst
rollback,
2), then
are established
six read buffer configurations
are shown
data redundancy
be determined
size design
buffer
for ronback.
are N, redundant
required
read buffer
TBL.
in the _everse order of which the values were saved.
to restore
values required.
using
by first flushing
is not
those
for the read
data required
may also save data which
an extra
code size overhead:
size requirement
Two alternative
of the dual first-in-first-out
The read buffer
and
rollback is accomplished
GPRF
values are available
overhead
Distance
Requirement
the design
the effect on the performance varying
- _-
A'"
_
A practical
Co_np/PP:
30
-2
5
PP"
case.
The buffer
maintains
for all values required. gegister
is added
reads
that
to the instruction
last
The read
encoding
for N cycles,
N
buffer
must be saved
the read buffer can be designed
values are maintained
the
(e.g.,
can as
to save only
a less than 22V
is possible. a case in which
values (denoted
all register
_alue(r_))
which
23
reads
do not have
require saving
to be placed
are marked
in the read
with an "*." Since
: ,, =
¢i ovemow
ovemow I
GPR
Figure
only the required
values
In this case, however, for at least must
from memory.
count must
In the event
to memory Given
the read buffer
the instruction
N cycles.
be pushed
are saved,
14: Read buffer of size < 2N.
total
size can now potentially
be less than N.
also be saved so that the value can be maintained
that the read buffer overflows,
and a record kept so that during
a dual FIFO depth of M, memory
the oldest
rollback
value in the buffer
the value can be retrieved
would serve the function
of the remaining
N - M of the two FIFOs.
5.1
Read
Buffer
Designs
Six read buffer configurations FIFO for each source Configuration
B1 contains
the single FIFO
within
register file design the cycle
bus.
and
were studied. Configuration
Configuration
the same cycle.
A1, shown
A2 allows access
This latter
to either
in Figure FIFO
from either
that both source operands
split-cycle-save
15, has a separate
assumption
source
bus.
can be written
into
is consistent
that writes during the first half of the cycle and reads during the second
[19]. Configuration
to allow access
B2 assumes
to either
[18] was instrumented
no split-cycle-save
a simultaneous
queue from either
The read buffer was simulated C compiler
Methodology
a single FIFO and assumes
a single level dual queue to absorb design
Evaluation
operand source
at the instruction with procedure
for the six read buffer configurations.
Branch
capability.
save and configuration
24
half of
C contains
D extends
this
bus.
level.
The s-code
calls to a simulation
hazards
Configuration
with a
were removed
emitted program
by the IMPACT containing
by the compiler
models
for a rollback
$1 S2
$1 $2
Config.
A1
Comfig. A2
SI $2
SI $2
Config.
B2
of 10. Parameters
at the post-pass the simulation applications
such as which operands
C
code se_nents
on a SPARCserver
from 0 to 20 (note that 20 represents
5.2
Results
5.2.1
D
were adjusted programs
to pass this information used in the evaluations.
490 and run on a DECstation
buffer sizes ranging
Evaluation
Config.
require saving in the read buffer were determined
Table 3 lists the ten s application
were cross-compiled
B1
15: Read buffer configurations.
level and instrumentation program.
Cop.fig. SI $2
Config. Figure
distance
$1 $2
the maximum
to The
3100 with read
read buffer
size of 2N).
Detailed analysis: QUEEN
Figure 16 shows changes in performance overhead (Cycles OH) for various read buffersizesand configurationsrunning the QUEEN
application.Looking at Figure 16, configurationAt, it can
be seen that significant performance impact is incurredeven with a modest reduction in read buffersize. ConfigurationA1 was consistentlythe leastefficient of the six configurationsacross the ten applicationsstudied/ This is due to the fact that the dual FIFO's are dedicated to a singlesource bus. In many casessaving$1 willcause an overflowbecause the $1 FIFO isfull, even though thereisroom in the $2 FIFO. ConfigurationA1 does allow forsimultaneous savesof $1 and $2, given sufficient room in each, but thisfeaturedoes not compensate for the latterinefficiency. 6The
TBL
7An
efficient
application
was not included
configuration
is one
in the read buffer
with _ low performance
25
size evaluation. overhead
given
a small
read
buffer
size.
cycle:OH
cyW OH
100 80
Conf. AI: Conf. A2: Conf. BI:
t
0"]
I
I
0
I
4
I
I
I
--
I
T'-Y-
12 16 Read Buffer Size
either
A2 demonstrates
FIFO.
Configuration
application. impact
In this
configuration
It should handled
be noted that
within
the
same
Conf. C: Conf. D: Conf. B2:
I
20
4
16: Cycle overhead:
the
most
a total
I
0
the improvement
B1 was
with a 35% reduction
-I 80 100 _
l
8
Figltre
Configuration
-o..a..-
gained effident
read buffer
I
I
I
I
I
-o..,_.-.
I
I
8
12 16 Read Buffer Size
20
QUEEN.
by allowing
of the
either
source
six configurations
size of 13 would
bus access
to
for the QUEEN
produce
zero
performance
in read buffer size. configuration
cycle.
B1 assumes
If this latter
shows that no less than 9.4% performance
that simultaneous
assumption
impact
is invalid,
is achieved
saves of $1 and $2 can be Figure
regardless
16, configuration
B2,
of the read buffer size.
The
41
"leveling
off" of B2 is due to the bottleneck
the FIFO.
The fiat part
of S1 and
$2 in the
Figure
QUEEN
16, configuration
and the single
must
be saved
is empty).
$2, distributing
D, shows
bus into either while
point
and not the
requiring
depth of
simultaneous
saves
the
effects.
placed
between
The dual queue
saves over multiple
cycles.
due to cases in which the dual queue
the source
bus
can absorb a single
A nonzero
minimum
has not emptied
before
save occurs.
16, configuration
saves from either
of instructions
level dual queue
some of the bottleneck
is still present
the next simultaneous Figure
C, shows how a single
save of S1 and overhead
the percent
FIFO entry
application.
FIFO can alleviate
simultaneous performance
of the curve shows
at the single
the
Configuration
queue
the
queue. dedicated
results
of an improved
This configuration
avoids
to $2 in configuration
D also has a nonzero
minimum
26
queue
structure
stalls
in some
C is full and
performance
overhead
which cases the
permits (e.g.,
$2
other
queue
but gives
better
Table 4: Read buffer size evaluation
performance
given the ability cycle-save
RBosize
Oil_level
Program
A2[
B1
A2
QUEEN WC
14 10
12 8
1.66 0.00
1.36 2.54
QSORT CMP
16 12
15 11
2.28 0.00
0.94 0.00
GREP PUZZLE
10 10
10 9
0.18 2.87
0.18 0.32
COMPRESS LEX
12 12
12 12
2.87 2.73
1.12 1.55
YACC
16
15
1.07
0.00
CCCP
12
12
2.34
1.74
than configuration
The simulation
results
performance
for QUEEN
configuration
overhead
show that configuration configuration
overhead
5.2.2
Evaluation
of all application
Results
for the other nine application the application
results
and, in the case of configurations Table 4 summarizes configurations, tolerated made
measurements
B1 is the most efficient.
Without
resulting
performance,
and that the split-
in a minimum
A2 is the best of the dual FIFO designs
to maximize
programs
resulting
B1, B2, C, and
s
are similar to those for QUEEN
are the points B2 through obtained
at which the curve _levels
[17]. The differences
off" (i.e.,
D, at what level the performance for the ten applications
for this study that minimal
of read buffer size reduction.
at read buffer size values which produce
A2 does not level off like configuration STwo
efficient
programs
A2 and B1. It is assumed
as a result
A1 is the least
with a read buffer size of 14. For configurations
D, a total read buffer size of 13 is su_cient
between
(_ B1
D is the best of the single FIFO designs
of 4.5%, and configuration
in a 1.7% performance
[
C.
to do split-cycle-saves,
capability,
summary.
D and
For this reason,
given
does not rapidly
overhead
configuration
approach
overhead.
overhead
efficient can be
comparisons
are
Configuration
zero like configuration
must be added to each read buffer size value in C and D to account for the queues.
27
stabilizes.
the two most
performance
low values of performance
the buffer size)
B1. For
a better
comparison
where the performance RB_size
overhead
and the performance
It can be seen from application, B1).
regardless
The measurements split-cycle-save
value is referred read buffer
was achieved.
that
large performance
of the B1 curve around
applications
requirement
(configuration
B1) consistently
between studied
the
buffer
A2)
efficient
sizes
split-cycle-save overhead
assumption
6
configuration
Concluding
25_,
Of
same,
configurations
a maximum
of
A2 and no split-cycle-save of 38.0% reduction to the ultimate
the RB_size
value,
per
A2 and
Given the and an
60_,
assumption,
was achieved.
selection
small
were the most efficient.
the
single
The
of read buffer
decreases
in size can
FIFO
with
A dual FIFO with source
the split-cycle-save
the other four configurations. for minimum
stabilization was achieved
performance
value assuming with an average
for the applications.
and single FIFO
result from small changes
taken in the final selection
and
required
and the performance
read buffer
comparing
the
summary
out-performed
Up to a 55% read buffer size reduction most
to as
overheads.
show that two read buffer configurations
variances
rn|n|mnm
be taken relative
Results
(configuration
is roughly
in read buffer size is achievable.
an average
Read
to each
size
(i.e.,
For configuration
5.2.3
bus access
buffer
size value
- from 8 for WC, to 15 for QSOR.T and YACC.
B1, a
of 50%, and
care should
size requirement
reduction
and configuration
of 20%, a maximum indicate
the read buffer
to as OH_level.
assumption
dependent
show that a considerable
Given the steepness
produce
is application
assumption
measurements size.
overhead
of the split-cycle-save
of 42% reduction
a minimum
A2 and B1, Table 4 gives
value drops below 3%. The read buffer size value is referred
Table 4 that the
The size requirement
average
of configurations
configuration,
in the read buffer size.
of read buffer size in any given
There impact
between
changes
Our results indicate
the ten
capability.
of 39.5% given
It was also found significant
were moderate
no split-cycle-save reduction
capability
that
the
given
the
in the performance that care should
be
design.
Remarks
This paper has presented
a compiler-assisted
compiler-driven
manipulations
data-flow
multiple
with dedicated
28
instruction
rollback
data redundancy
scheme hardware
which combines to remove
data
hazards that resultfrom multipleinstructionrollbac.k. Experimental evaluation of the proposed compiler-assisted scheme with a maximum
rollbackdistanceof ten showed performance impacts of
no more than 6.57% and an averageimpact of 1.80%, over the elevenapplicationprograms studied. The performance evaluationindicateslower performance penaltiesthan forpreviouscompiler-only approac.hesor comparable hardware-only approac.hes.Six read bufferconfigurationswere studied to determine the minimum
sizerequirementforgeneralapplications.It was found that a 55% read
buffersizereductionis achievablewith an average reductionof 39.5%, but that additionalcontrol logicto handle read bufferoverflowsmay limitthe overallhardware savings. Future researchincludesapplicationof compiler-assisted multipleinstructionrollbackrecovery to super-scalar, VLIW,
and parallelprocessingarchitectures. Evaluationsof compiler-assisted
rollbackrecovery applied to speculativeexecution repaLrwould includemodifying compiler transformations to operate in a super-scalarand VLIW
7
environment.
Acknowledgements
The authors wish to thank C.-C. Jim Li for hishelp with the compiler aspects of thispaper, and Scott Mahlke and William Chen for theirinvaluableassistancewith the IMPACT
compiler. We
alsoexpressour thanks to Janak Patel forhiscontributionsto thisresearch.
References
[1]M.
S. Pittler,D. M. Powers, and D. L. Schnabel, "System Development and Technology Aspects of the IBM 3081 Processor Complex," IBM J. Res. Des., vol.26, pp. 2-11,Jan. 1982.
[2]Y.
Tamir and M. Tremblay, "IIigh-PerformanceFanlt-TolerantVLSI Rollback," IEEE Trans. Comput., vol.39, pp. 548-554,Apr. 1990.
[3]C.-C.
J. Li, S.-K. Chen, W.
K. Fuchs, and W.-M.
InstructionRetry," Tech. Rep. CRHC-91-31, Illinois, May 1991.
W.
Hwu,
Systems Using Micro
"Compiler-Assisted Multiple
Coordinated Science Laboratory, Universityof
[4]N.
J. Alewine, S.-K. Chen, C.-C. J. Li, W. K. Fuchs, and W.-M. W. Hwu, "Branch Recovery with Compiler-Assisted Multiple Instruction Retry," in Proc. 22th Int. Syrup. Fault-Tolerant Comput., pp. 66--73, July 1992.
[5]L.
Spalnhower, J. Isenberg,R. Chillarege,and J. Berding, "Design for Fanlt-Tolerancein
System. ES/9000 July 1992.
Model 900," in Proc. 22th Int.Syrap. Fault-TolerantComput., pp. 38-47,
29
[sl
P. M. Kogge, K. T. Trnong, D. A. Richard, sad It. L. Schoenike, "Checkpoint Retry Mechsnism." United States Patent, no. 4912707, Max. 1990. Assignee: International Business Machines
Corporation,
Armonk,
N.Y.
[7] Y.
Tsmir, M. Liang, T. Lal, sad M. Tremblay, for Self-Checking Self-Repairing Computing Comput.,
[81J.
E. Smith
IEEE
[91M.
pp. 178-185,
Comput.,
L. CiaceUi,
Tolerant
June 1991.
and A. It. Pleszkun,
Trans.
"Fault
Comput.,
"The UCLA Mirror Processor: A Building Block Nodes," in Proc. 2Ith Int. Syrup. Fault.Tolerant
"Implementing
vol. 37, pp. 562-573, Handling
pp. 9-12,
May
on the IBM 4341
June
Precise
Interrupts
in Pipelined
Processor,"
in Prac.
11th Int. Symp.
[11] G. L. Hicks, D. Howe, Jr., sad A. Zurla,
Jr., "Insrnction
ing System." United States Patent, no. 4044337, Machines Corporation, Armonk, N.Y.
Tech. J. Digital
[13] E. B. Eichelberger Proc.
l_th
Design
[14] J. S. Liptay, May 1992.
and D. Manley,
Equip.
_rhe
ES/9000
Strategy
for a Data Process-
International
Business
for the VAX 9000 System,"
Digital
High End
Processor
Compilers:
Structure
Design,"
IBM
Principles,
for LSI Testability,"
J. Res.
Techniques,
Dev.,
in
vol. 36, no. 3,
and Tools. Reading,
Graph
Theory
with Applications.
London,
England:
Macmillan
Recovery
using a Read Buffer.
1979.
[1T] N. J. Alewine,
Compiler.assisted
Tech. Rep.
for Multiple-Instruction-Issue pp. 266-275,
[19] J. L. Hennessy
Multiple
CRttC-93-06,
[18] P. Chang, W. Chen, N. Waxter,
Mateo,
Aug. 1977. Assignee:
System,"
1986.
[16] J. A. Bondy sad U. Murty,
PhD thesis,
Mechanism
8600
Corp., vol. 2, no. 4, pp. 13-24, Fall 1990.
[15] A. V. Aho, It. Serial, and J. D. Ullman,
Press Ltd.,
"Design
Retry
and T. W. Williams, "A Logic Design Aurora. Conf., pp. 462-468, 1977.
MA: Addison-Wesley,
Fault-
1981.
"Designing Reliability into the VAX [10] W. F. Brnckert and tL E. Josephson, Digital Tech. J. Digital Equip. Corp., vol. 1, no. 1, pp. 71-77, Aug. 1985.
[12]D. B. Fite, T. Fossum,
Processors,"
1988.
Instruction
University
Rollback
of Illinois
at Urbane-Champaign,
and W.-M. W. Hwu, "IMPACT: Processors," in Proc. 18th Annu.
An Architecture Syrup. Comput.
1993. Framework Architecture,
May 1991.
sad CA: Morgan
D. A. Patterson, Computer Architecture: Kaufmann Publishers, Inc., 1990.
30
A Quantitative
Approach.
San