and James B. Clary, Editors. Research Triaurgle hstitztte ..... desire, coupled with the difficulty of the validation task, illuminates the importance of developing ...
~
- . .
-
~
-?4
NASA
CP
NASA ConferencePublication
2130
Tolerant Avionics and Control Systems WorkingGroupMeeting
XI
Proceedings of a workinggroupmeeting held at LangleyResearch Center Hampton,Virginia October 3-4, 1977
m
2130 c. 1
TECH LIBRARY KAFB, NM
NASA ConferencePublication
2130
ValidationMethodsResearch for FaultTolerant Avionics and Control Systems Working Group Meeting I1
James W. Gault, Kishor S. Trivedi, and James B. Clary, Editors ResearchTriaurglehstitztte
Proceedings of a working group meeting held at LangleyResearch Center Hampton,Virginia October 3-4, 1977
National Aeronautics and Space Administration
Scientific and Technical Information Office 1980
-
~
. .. .
. ..
__ . . .
.
.. ..... . ..
,,
.
PREFACE
To e f f e c t i v e l y a d d r e s s t h e p r o b l e m s o f f a u l t - t o l e r a n t a v i o n i c s and c o n t r o l system V a l i d a t i o n , NASA-Langl eyResearchCenterhasconceived and sponsored a seriesofworkinggroupmeetingswiththeobjectiveofidentifying and addressi n gc r i t i c a li s s u e sr e l a t e dt ot h ev a l i d a t i o np r o c e s s . The f i r s t workinggroup I, was h e l d i n March 1979. m e e t i n g i n t h i s s e r i e s , WorkingGroup WorkingGroup I p r o v i d e d a f o r u m f o r t h e exchange o f i d e a s on f a u l t t o l e r a n t a v i o n i c s and c o n t r o ls y s t e m V a l i d a t i o n . The s t a t e o f t h e art i n f a u l t - t o l e r a n tc o m p u t e rv a l i d a t i o n was examined i n o r d e r t o b e g i n t h e e s t a b l i s h m e n t o f a framework f o r f u t u r e d i s c u s s i o n s o f v a l i d a t i o n r e s e a r c h f o r f a u l t - t o l e r a n t a v i o n i c s and fl i g h t c o n t r o l systems. The r e s u l t s o f Working Group I and t h ee v o l u t i o no ft h eA v i o n i c sI n t e g r a t e dR e s e a r c hL a b o r a t o r y (AIRLAB)byNASA-LangleyResearchCenterprovidedimpetus f o r a secondworking group meeting. The o b j e c t i v e o f WorkingGroup I 1 was t o i d e n t i f y , b e g i n n i n g w i t h t h e ideasprovidedbyWorkingGroup I, s p e c i f i c Val i d a t i o nt a s k sw h i c hc o u l db e n e f i t s u b s t a n t i a l l yf r o mt h ee x i s t e n c eo f AIRLAB. To p r o v i d e an i n i t i a lf o c u s , Val i d a t i o n i s s u e s s p e c i f i c a l l y r e l a t e d t o t w o f a u l t - t o l e r a n t c o m p u t e r s c u r rentlybeingdesigned and devel oped u n d e r t h e s p o n s o r s h i p o f NASA-Langl ey ResearchCenter,namely,SIFT and FTMP, w e r ec o n s i d e r e d .P a r t i c u l a rv a l i d a t i o n I1 t a s k sf o rt h e s ec o m p u t e r s were i d e n t i f i e d a t a preliminaryWorkingGroup m e e t i n gh e l da tt h eR e s e a r c hT r i a n g l eI n s t i t u t e( R T I )i n September. The t a s k s generated a t t h i s meetingserved as a s t a r t i n g p o i n t f o r t h e l a r g e r Working Group II m e e t i ng h e l d a t NASA-LangleyResearchCenter i n October. The a c t i v i t i e s o f WorkingGroup I 1 d u r i n g t h e c e n t e r e da r o u n dt h ep r e v i o u s l yd e f i n e dv a l i d a t i o nt a s k s . p a r t i t i o n e di n t ot h r e em a j o rc a t e g o r i e s : 1. 2. 3.
two-daysession i n October These t a s k s were
C o n f i r m a t i o no fS y s t e mR e l i a b i l i t y F a u lP t r o c e s s i n gV e r i f i c a t i o n F a u l tP r o c e s s i n gC h a r a c t e r iz a t i o n
WorkingGroup I 1 a t t e n d e e s e v a l u a t e d t h e p r e l i m i n a r y p r o p o s e d t a s k s i n e a c h o f t h e s ea r e a s and p r o p o s e da d d i t i o n a lt a s k s . The WorkingGroup I 1 m e e t i n g was conceivedandsponsoredbypersonnelat NASA-LangleyResearchCenter, i n p a r t i c u l a r B i l l y L. Dove and A. 0. Lupton.
iii
._.__
"
.
I
t ult
TABLE
OF CONTENTS
........................... . INTRODUCTION AND OVERVIEW . . . . . . . . . . . . . . . . . . . 1.1 M o t i v a t i o nf o rt h eP r o b l e m ................ 1.2 C u r r e n tS t a t u so ft h eV a l i d a t i o nP r o c e s s ......... 1.3 FaTlt-TxerantSystems Technology Development ...... 1 . 4F a r l t - T o l"_e r a n t Systems V a l i d a t i o nT e c h n o l o g yD e v e l o p m e n t . 1 . 4 .L1o g i c aPlr o o f s ................. . . . . . . . . . . . . . . . . .. 1.4.2 A n a l y t i c aMl o d e l s 1.4E . 3x p e r i m e n t T ae l sting ............... 1 . 4 . 4F a u l t - T o l e r a n S t y s t e m sV a l i d a t i o nT e c h n o l o g y Development Summary . . . . . . . . . . . . . . .
PREFACE 1.0
""
a nO d rggnization
"~
2.0
. . . . . . . . .
. . iii .. 1 .. 1 .. 1 .. 1 .. 2 .. 3 . . 3 .. 5
. . . . . . . . . . . . . . . . . . . .
TOWARDS A VALIDATION METHODOLOGY FOR A V I O N I C S COMPUTERS . . . . . . . . 2T. 1r a d i t i o n a l Methods ...... " 2R.e1l.i1a bVial il ti yd ao tf i o n 2.1.2 R e l i a b i l i t yV a l i d a t i o no f 2.1.3 Inadequacy oTf r a d i t i o n a l 2.2 Proposed Validation Methodology 2.2~Discussion L e a r i i y t ot h e 2 . 2 . 2D e t a i l sotfh eP r o p o s e dM e t h o d o l o g y 2.3 Future Work . . . . . . . . . .
FAULT-TOLERANT
................ ................ a Siwplex System . . . . . . . Redundant Systems . . . . . . . Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . Proposed Methodology . . . . . . . . . . . . . . . . ................ 3.0 PRELIMINARY TASKS FOR SIFT/FTMP RELIABILITY VALIDATION . . . . . . . 3 . 1C o n_f i_r_m_a t i o no S f y s t e mR e l i a b i l i t y . . . . . . . . . . . . . . . 3 . 2 ~- . . . . . . . . . . . . . . . . . 3.3 . . . . . . . . . . . . . . . . 3.4 . . . . . . . . . . . . . . . . O. t .h e.r T.a .s k.s . . . . 4.0 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.0 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX I . DEFINITIONS AND REFERENCE CODE . . . . . . . . . . . . . . . APPENDIX I 1 .WORKING GROUP I 1 TASK DESCRIPTIONS . . . . . . . . . . . . . APPENDIX I11 .TASK RATING RESULTS FROM WORKING GROUP I 1 . . . . . . . . . APPENDIX V I .WORKING GROUP I 1 ATTENDEES . . . . . . . . . . . . . . . . . TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
u
FIGURES
6 7 8 8 8 9 9 10 10 13 15
16 16 17
" I _
................................. V
19 20 22 24 28 76 77 80
92
1.0
INTRODUCTION AND OVERVIEW
1.1
M o t i v a t i o nf o rt h eP r o b l e m
I n 1979commercial a i r c a r r i e r si nt h eU n i t e dS t a t e sp a i d , on theaverage, a ni n c r e a s eo f 70% i n t h e i r annualfuel b i l lo v e rt h ep r e v i o u sy e a r .T h i s s t a t i s t i ch i g h l i g h t st h ei m p o r t a n c eo fd e v e l o p i n ge n e r g ye f f i c i e n ta i r c r a f t . Under t h es p o n s o r s h i po f NASA-Langley, t h e ACEE Energy E f f i c i e n t T r a n s p o r t Technology e f f o r t hasbeen e s t a b l i s h e d t o d e v e l o p t h e t e c h n o l o g y r e q u i r e d t o supportthiseffort. New low-dragaerodynamicstructures show g r e a tp r o m i s ef o rl o w e r i n gf u e l consumption.However,as a consequence o ft h e i rs t r u c t u r e ,w i n g and t a i l dein stability that s i g n s show s i g n i f i c a n t i n c r e a s e s i n l o a d i n g anddecreases mustbe a l l e v i a t e d b y t h e i n t r o d u c t i o n o f s e n s o r s , a c t u a t o r s , and d i g i t a l e l e c t r o n i c st od y n a m i c a l l ya l t e rc o n t r o ls u r f a c e si nf l i g h t .I nc e r t a i nf u t u r e d e s i g n st h i sa c t i v ec o n t r o ls y s t e m w i 11be c r u c i a l t o fli g h t ; i t s use w i 11not b eo p t i o n a l and nomanual o v e r r i d e o r backup w i 11 be empl oyed.Thus, t h ed e v e l opment o f h i g h i n t e g r i t y , a c t i v e c o n t r o l systemstechnology i s an i n t e g r a l partofthepresentdirectionofthe Energy E f f i c i e n tT r a n s p o r tT e c h n o l o g ye f f o r t . The Val i d a t i o np r o c e s sm u s t ,o fn e c e s s i t y ,b e a p a r to ft h i se f f o r t . 1.2
C u r - r e n tS t a t u so ft h eV a l i d a t i o nP r o c e s s
W h i l e a g r e a td e a lo f workhas gone on i n t h e a r e a o f f a u l t - t o 1 e r a n t s y s temdesign,theproblemofvalidatingultra-reliablesystemsisjustbeginning t o beaddressed. The c u r r e n t s t a t e o f t h e a r t i s g i v e n i n summary form i n S e c t i o n s 1.4.1 t h r o u g h 1.4.3 o f t h i s r e p o r t . 1.3
Fault-To1 erant Systems Techno1 ogy Devel opment
The development o f a c t i v e c o n t r o l systemshas many f a c e t s , and t h e scope o f t h i s r e p o r t i s 1 i m i t e d t o t h e d e v e l opment o f t h e t e c h n o l ogy r e q u i r e d t o supp o r tt h ef a u lt - t o l e r a n ts y s t e m ' sa s p e c to fa c t i v ec o n t r o l s . Much o f t h e work r e p o r t e dt a k e st h ee v e nn a r r o w e rv i e wo ft e c h n o l o g yd e v e l o p m e n tf o rf a u l t t o l e r a n tc o m p u t e rs y s t e m s .T h i s i s a r e a s o n a b l el i m i t a t i o ni n i t i a l l y , and it i sa n t i c i p a t e dt h a t many u s e f u le x t e n s i o n sc a nr e a d i l yb e made l a t e r . The development o f f a u l t - t o 1 e r a n t systems technol ogy in v o l ves design, assessment, v a l i d a t i o n , and m a i n t e n a n c eo f an e x p e r i e n c eb a s ef o rc a n d i d a t e systems.NASA-Langleyhas, f o rs e v e r a ly e a r s ,s u p p o r t e dt h ed e s i g no ft w o r e p r e s e n t a t i v ef a u l t - t o l e r a n tc o m p u t e r systems - SIFT and FTMP. Thesesystems representdistinctapproachestotheimplementationof a systemwhichmust " p r o v i d ef l i g h tc r u c i a lf u n c t i o n sw i t h a f a i l u r ep r o b a b i l i t yo fl e s st h a n a t 1 0 hours."
L
A p r o t o t y p e v e r s i o n o f each o ft h e s es y s t e m s w i l b ed e l i v e r e dt o NASALangley i n 1980 f o r assessmentand validation. The knowledgegainedfromthese systems w i l beusedas an experiencebase t o b e r e t a i n e d and d i sseminated f o r u n d e r s t a n d i n g i n thedevelopment o f t h e n e x t g e n e r a t i o n o f systems. It i s e n v i s i o n e dt h a tt h ea s s e s s m e n t ,v a l i d a t i o n , and experiencebase activities w il besupportedby a s p e c i a l i z e df a c i l i t y ,t h eA v i o n i c sI n t e g r a t e d Research Laboratory, which has been defined by NASA-Langleyand w i l provide the capability to:
1) d e v e l o pt h et e c h n o l o g y
and m e t h o d o l o g yr e q u i r e dt oi n t e g r a t e a v i o n i c and c o n t r o l f u n c t i o n s f o r a i r c r a f t o f t h e 1 9 9 0 ' s and beyond ,
2)
e v a l u a t e and s t u d yc a n d i d a t es y s t e ma r c h i t e c t u r e s ,
3)
Val i d a t ei m p l e m e n t a t i o nt e c h n o l o g i e s ,
4)
e s t a b l i s h a d a t ab a s eo fp e r f o r m a n c e ,r e 1i a b i l i t y , statistics.
1.4
and and experiment
F a u l t - T o l e r a n tS y s t e m sV a l i d a t i o nT e c h n o l o g yD e v e l o p m e n t
One o f t h e mostimportant and c h a l l e n g i n g a s p e c t s o f f a u l t - t o l e r a n t s y s temsdevelopment i st h ev a l i d a t i o np r o c e s s . The v a l i d a t i o np r o c e s sc o m p r i s e s theactivitiesrequiredtoinsurethe agreementof t h es y s t e mr e a l i z a t i o nw i t h t h es y s t e ms p e c i f i c a t i o n .T h i se f f o r ti ss i g n i f i c a n t and r e q u i r e st h ed e v e l o p ment o f t e c h n o l o g y i n i t s own r i g h t .
Val i d a t i o nt e c h n i q u e se x i s t and have V a l i d a t i o ni sn o t a new problem. been a p p l i e d t o many d i g i t a l e l e c t r o n i c a v i o n i c s and c o n t r o l systems p r e s e n t l y i n use,such as t h eF - I 1 1r e d u n d a n t Mark I 1 a v i o n i c s ,t h e B - I redundantavioni c s , t h e F-18 quad r e d u n d a n t d i g i t a l f l i g h t c o n t r o l , t h e F-8 t r i p l e x d i g i t a l fly-by-wiresystem, and t h e Space S h u t t l e quad r e d u n d a n ta v i o n i c s and c o n t r o l backupsystem. V a l i d a t i o n a c t i v i t i e s have t r a d i t i o n a l l y been a p a r t o f t h e d i g i t a l system l i f e c y c l e shown i n F i g u r e 1.1. A p p l i e dt oa v i o n i c ss y s t e m s( a s shown i n F i g u r e 1.2) , t h i s p r o c e s s i n c l u d e s some v e r y f a m i l i a r , c o n c r e t e , and t r u s t e d t a s k s , s u c ha sb e n c ht e s t s ," h o t - b e n c h ' 't e s t s ,g r o u n dt e s t s in the aircraft , and fl i g h t t e s t s i n e x p e r i m e n t a l c o n f i g u r a t i o n s . None o ft h et a s k so ft h ep a s tc a nb es u m m a r i l yd i s m i s s e da si n a p p r o p r i a t e t o t h e problem a t hand, b u t ' r a t h e r a more p r e c i s e ,s y s t e m a t i c ,d i s c i p l i n e d , and e x t e n s i v ea p p r o a c hm u s tb ee v o l v e dt oi n c o r p o r a t e andaugment e x i s t i n gt e c h niques. The p r o p e r t i e s t h a t d i s t i n g u i s h u l t r a - r e 1 i a b l e s y s t e m s Val i d a t i o n a r e s u m a r i z e d i n T a b l e 1.1. The s t a t e o f t h e a r t o f v a l i d a t i o n was d i s c u s s e d as a p o r t i o n o f t h e agenda o f WorkingGroup I ( r e f . 1), an e a r l i e r NASA-Langley-sponsored e f f o r t .
2
The r e s u l t s o f t h a t d i s c u s s i o n a r e summarizedhereaspreliminary work o f v a l i d a t i o n p r e s e n t e d i n S e c t i o n 2.0 o f t h i s r e p o r t .
tothe
frame-
T h i s r e c a p i s presented i n a f o r m c o n s i s t e n t w i t h t h e model s o f S e c t i o n 2.0 and, i n p a r t i c u l a r , t h e d i s c u s s i o n w h i c h f o l l o w s i s keyed t o F i g u r e 2.5 (TheProposedVal i d a t i o n Taxonomy). T a b l e 1.2shows t h et h r e ep r i m a r yc a t e g o r i e s used. i n WorkingGroup I t o d i s c u s s t h e s t a t e o f t h e a r t o f v a l i d a t i o n and r e 1 a t e s them t o t h e t h r e e c a t e g o r i e s used i n t h i s r e p o r t . 1.4.1
L o g i c aPl r o o f s
The t h e o r yo fp r o v i n gi sb e i n ga d e q u a t e l ya d d r e s s e d and i s n o t t h e p r e s e n t 1 i m i t a t i o n .A c c u r a t e and f o r m a ls t a t e m e n t so fs p e c i f i c a t i o n s and environmental a s s u m p t i o n ss u i t a b l ef o rp r o o ft e c h n i q u e s may t a k e weeks t ow r i t e .P r o o ft e c h niques are most often appl ied to the Val i d a t i o n o f s o f t w a r e a t t h e u p p e r 1 eve1 o fi t sh i e r a r c h i c a ld e s c r i p t i o n . However, some use o fp r o v i n g hasbeen made w i t hh a r d w a r e when a p p r o p r i a t e f o r m a l 1 anguage d e s c r i p t i o n s e x i s t . A u t o m a t i cp r o o fg e n e r a t i n gm e t h o d se x i s tf o rc e r t a i nr e s t r i c t e dc l a s s e so f problems. More p o w e r f u li n t e r a c t i v ep r o v i ng t e c h n i ques a r e a1 so a v a i l ab1 e b u t r e q u i r e h i g h l y s k i 11ed personnel and a 1arge commitment o f computerresources. Work o np r o o ft e c h n i q u e s i s expected t o c o n t i n u e and w i 11be u s e f u l as a t o o l i nt h ev a l i d a t i o np r o c e s s . 1.4.2
A n a l y t i c a l Models
The t e r m " a n a l y t i c a l m o d e l s " i s usedhere t o d e f i n e t h a t c a t e g o r y o f a c t i v i t i e si nt h ev a l i d a t i o np r o c e s sc o n c e r n e dw i t ha n a l y z i n go rp r e d i c t i n g s y s t e mp e r f o r m a n c eo rr e l i a b i l i t y .P r o o ft e c h n i q u e s and s i m u l a t i o n / e m u l a t i o n may r i g h t f u l l y f i t w i t h i n t h i s d e f i n i t i o n , b u t a r e c a t e g o r i z e d s e p a r a t e l y and a r es p e c i f i c a l l ye x c l u d e df r o mt h i sd i s c u s s i o n . O f i n t e r e s th e r ei st h ed e f i nition of faults (a fault model ) and t h e a n a l y s i s o r e s t i m a t i o n o f a system's response(asystem'model ) t o t h i s f a u l t model. Work i s p r o g r e s s i n g t o u n i f y p e r f o r m a n c e and r e 1 i a b i l it y c o n s i d e r a t i o n s i n t o a s i n g l e model , c r e a t i n g what i s c a l l ed a " p e r f o r m a b i l i t y " model. The o b j e c t i v ei st oe s t i m a t e a s y s t e m ' sa b i l i t yt op e r f o r mi nt h ep r e s e n c eo f faults. The m a j o r enhancementprovidedby t h i s model i s t h e d e f i n i t i o n o f " a b i l i t yt op e r f o r m "i nt e r m so fa c c e p t a b l el e v e l so fd e g r a d a t i o n ,r a t h e rt h a n t h e more common, r e s t r i c t i v e , and b y now unreasonable,pass and f a i l l e v e l s . The CARE I 1 1 r e l i a b i l i t y e s t i m a t i o n o f view:
programtakes
a more t r a d i t i o n a l p o i n t
of r e l i a b i l i t y "The o b j e c t o f CARE I 1 1 i s t h e e s t i m a t i o n for fault-tolerant avionic systems w i t h f a i l u r e p r o b a b i l itiesoflessthan 10-9 a t 10 hours."
3
I ng e n e r a l ,t h ep r o b l e mo fe s t i m a t i n gs y s t e mf a i l u r er a t e and c o n f i d e n c e i n t e r v a l ( s t a t e d as: thesystemhas a f a i l u r e r a t e o f nomore t h a n X w i t h a confidence interval of Y ) f r o m component f a i 1 u r e r a t e s and c o n f i d e n c e i n t e r v a l s i s unsolved. I na d d i t i o nt,h ef a i l u r ep r o b a b i l i t yolfe s st h a ni s so s t r i n g e n tt h a te v e r y t h i n gi si m p o r t a n t .A s s u m p t i o n s , modelapproximations,and c o m p u t a t i o n a lr o u n d - o f fe r r o r sa l ls e r i o u s l yi m p a c tt h ec r e d i b i l i t yo ft h e results ,obtained.
CARE I 1 1 p r o p o s e s t o accommodate a much more r e a l i s t i c f a u l t previous programs, in c l ud ing :
model t h a n
1) t i m e - d e p e n d e nfta i l u r er a t e s , 2) d e s i g ne r r o r s , as w e l l as, 3) i n t e r m i t t e n t and t r a n s i e n ft a u l t s . The system model a l s o accommodates a m o r em e a n i n g f u ls e to ff a u l th a n d l i n g mechanisms, i n c l u d i n g t h e d i s c r i m i n a t i o n o f :
1) 2) 3)
4)
t i m ef r o mf a u l to c c u r r e n c et oe r r o ro c c u r r e n c e , t i m ef r o me r r o ro c c u r r e n c et oe r r o rd e t e c t i o n , t i m ef r o me r r o rd e t e c t i o nt of a u l ti s o l a t i o n , t i m ef r o mf a u l ti s o l a t i o nt o systemrecovery.
and
Whilerecentimprovements i n s y s t e mm o d e l i n ga r es i g n i f i c a n t , many d i f f i c u l t i e s remain. Some o f t h e most i m p o r t a n ti s s u e si nt h em o d e l i n go ff a u l t t o l e r a n t systems w i t h u l t r a - r e l i a b i l i t y r e q u i r e m e n t s a r e :
4
1.
H a n d l i n gl a r g es t a t e f a u l t model s.
2.
F a u l t 1atency - t h i s i s an i m p o r t a n ti s s u es i n c ec o m b i n a t i o n so ff a u l t s may be much moredamagingthanany f a u l ta l o n e . A system'sresponse t of a u l t so c c u r r i n gb e f o r er e c o v e r yf r o mt h el a s tf a u l t i s complete mustbestudied.
3.
Coverage thefault
4.
Unexpectedevents - some s t r a t e g yi sr e q u i r e df o rd e a li n gw i t he v e n t s n o tp r e d i c t e d .I nt h e s es y s t e m su n p r e d i c t e de v e n t sa r en o ti n s i g n i f i cant.
5.
F a i l u r es t a t i s t i c s - a c o n t i n u e de f f o r ti sr e q u i r e dt oo b t a i ns t a t i s t i c sc o n c e r n i n ga c t u a l component and system f a i l u r e mechanisms and rates.
6.
F a u l td e s c r i p t i o n s - t h ec u r r e n tu n d e r s t a n d i n g and d e s c r i p t i o no f It i s i m p o r t a n t t o d e v e l o p faults is at the circuit and 1 o g i c 1 e v e l and understand how t h e s e l o w - 1 e v e l f a u l t mechanismscanbe faithful ly modeled i n t e r m so fh i g h e rl e v e ls y s t e mb e h a v i o r .
-
spaceswhich
r e s u l tf r o mt h e
complexsystemsand
i t i s i m p o r t a n tt ob ea b l et oa s s e s st h ed e g r e et ow h i c h model i s i n agreement w i t ho b s e r v e dr e a l ity.
.
1.4.3
E x p e r i m e n t aTle s t i n g
Inthissubsection,thetestingof a p h y s i c a ld e v i c e and t h e t e s t i n g o f a s i m u l a t e do re m u l a t e dv e r s i o no ft h e same d e v i c ei sc o n s i d e r e d .T e s t i n gi st h e singlemostfrequentlyappliedtoolof Val idationforbothhardware and s o f t ware. It i s d e f i n e d as t h ep r o c e s so fa p p l y i n g a s e to fi n p u t s( s e l e c t e dt o r e v e a lf a u l t sb e l o n g i n gt o a p r e d e f i n e d f a u l t model ) t o a u n i t and t h e n compari n gt h er e s u l t sp r o d u c e dt o a r e f e r e n c eo ft h e good response.There i s no c o h e r e n tt h e o r yo ft e s t i n g .T h a ti s ,i ng e n e r a l , we c a n n o t ,f o r an a r b i t r a r i l y d e f i n e d u n i t and f a u l t model , d e f i n e p r e c i s e l y how t o g e n e r a t e o r e v a l u a t e t e s t datatoinsure a fault-freeunit. We a r e 1e f t w i t h a g r e a t manyad hocapproaches f o r t r e a t i n g r e s t r i c t e d cases.Even when a u n i t passes a t e s t , we can o n l y make weak i n f e r e n c e s c o n c e r n i n g t h e u n i t ' s t r u e c o n d i t i o n . The c o s to ft e s t i n g dependsupon t h ec o s to f : 1) g e n e r a t i n gi n p u t e s t p a t t e r n s , 2) s t o r i n gt h e s ep a t t e r n s as references,and 3) t h e l e n g t h o f t i m e r e q u i r e dt or u nt h et e s t . The p r a c t i c a ll i m i t a t i o no fc o s ti s one o f t h e p r i m a r yd e f i c i e n c i e so ft e s t i n g ,s i n c ee v e nr e l a t i v e l ys i m p l en e t w o r k s may r e q u i r e enormousnumbers o ft e s tp a t t e r n s ." S t a n d a r dt e s t sf o rc o m m e r c i a l LSI d e v i c e s rarelyresultingreaterthan 95% c o v e r a g e( w i t hc o v e r a g eh e r em e a n i n gt h e p e r c e n t a g eo f nodes i n t h e c i r c u i t t h a t t r u l y change s t a t ed u r i n gt h ec o u r s eo f t h e t e s t ) , because t h ec o s to fh i g h e rc o v e r a g ei sp r o h i b i t i v e . S i m i l a r statementscanbe made w i t hr e s p e c tt os o f t w a r et e s t i n gp r o c e d u r e s . "C u r r e n ti n d u s trial p r a c t i c e i s t o a c c e p t a n c e - t e s t components a t 3% a c c e p t a n c e q u a l i t y l e v e l (AQL) a t 95% c o n f i d e n c e f o r s t u c k - a t f a u l t s . Anotherseriouslimitationtotestingisourinabilitytodescribefaults a t an a b s t r a c t l e v e l , w h i l e r e t a i n i n g a p r o p e ra b s t r a c t i o no ft h e i rt r u ep h y s i c a lb e h a v i o r . A t t h ep r e s e n tt i m e ,t h em o s tf r e q u e n t l ya p p l i e df a u l t model i s t h e permanentstucknodemodel.This i s n o t always a u s e f u lo ra c c u r a t e model and t h e r e i s a need t o c o n s i d e r a more r e a l is t i c f a u l t model.Thisincludes t h e need t o i d e n t i f y e q u i v a l e n t c l a s s e s o f f a u l t s i n o r d e r t o r e d u c e t h e l a r g e number o f caseswhich need t o beconsidered. The 1 i m i t a t i o n o f s o l i d f a i l u r e s i s n ol o n g e rp r a c t i c a l .T h e r e i s a p r e s s i n gn e e df o r a m e a n i n g f u la n dt r a c t a b l e model f o r i n t e r m i t t e n t and t r a n s i e n t b e h a v i o r . P h y s i c a lf a u l ti n s e r t i o n has l o n g beenused t op r o v i d ei n f o r m a t i o n and calibrationforinputtestpattern and d i a g n o s t i cp r o g r a mc o v e r a g ee v a l u a t i o n . If t h et r e n dc o n t i n u e st o w a r dh i g h e rl e v e l so fi n t e g r a t i o n and i f p h y s i c a l faultinsertionistocontinue as an a p p l i c a b l et e c h n i q u e ,t h e nu n d e r s t a n d i n g 1 o w - l e v e l f a u l t mechanisms a t t h e i n t e r f a c e o f h i g h e r l e v e l model s i s imperat i v e .S i m u l a t i o n / e m u l a t i o nh a v el o n gb e e n usedas t o o l st oa t t a c kt h ep r o b l e m ofevaluatingfaultcoveragecapabil i t y . S i m u l a t i o n / e m u l a t i o n model s t y p i c a l l y t a k e much l o n g e r t o r u n t h a n s i m i l a r t e s t i n g on p h y s i c a l u n i t s and, t h e r e f o r e , s u f f e r more s e v e r e l y f r o m t h e p r a c t i c a l l i m i t a t i o n o f l o n g e x e c u t i o n t i m e s t h a n otherformsoftesting. The most p o t e n t i a l l y f r u i t f u l work i s i n t h e a r e a o f d e s i g n f o r t e s t a b i l i t y . The i d e a i s t h a t t h e t e s t i n g problem, i n t h e l i g h t o f i n c r e a s e d complexi t y and i n t e g r a t i o n ,c a n n o t b es o l v e dw i t h o u ti n c o r p o r a t i n gt e s t i n gf e a t u r e s i n t ot h eo r i g i n a ld e s i g n . Thereare no t h e o r e t i c a lr e s u l t sw h i c hs i g n i f i c a n t l y 5
i m p a c td e s i g n and t h e same i s t r u e f o r t e s t a b i l i t y d e s i g n . l i t e r a t u r e t h e r e a r e a f a i r number o f d e t a i l e d s u g g e s t i o n s caneaseproblems i f t h e ya r ec o n s c i e n t i o u s l ya p p l i e d .
However, i n t h e and " t r i c k s " t h a t
I n summary, t h e r ea r e some u s e f u l t e c h n i q u e s f o r f i n d i n g t e s t i n p u t p a t t e r n sf o rr e a s o n a b l ys m a l ln e t w o r k s when t h e f a u l t modelused isthesol id s t u c k nodetype.Testing i s w i d e l y used, p r i m a r i l y because we have some i n t u i t i o n i n i t s use and because i t i s a v e r yc o n c r e t ea c t i v i t y .T e s t i n ga l o n e , however,provides a v e r y weak b a s i s f o r systemsVal i d a t i o n when u l t r a reliabilityisrequired. 1.4.4
F a u l t - T o l e r a n t SystemsVal i d a t i o n T e c h n o l o g y Development Summary
The p r e c e d i n gd i s c u s s i o np r e s e n t s more q u e s t i o n st h a n i t answers, and r i g h t l y so, s i n c e many o f t h e r e a l i s s u e s a r e b e i n g c l a r i f i e d o r d e f i n e d f o r t h ev e r yf i r s tt i m e .T h e r ei sl i t t l e work a v a i l a b l et h a td e a l sw i t hd e f i n i n g f a u l t s i n a meaningful way. No t r e a t m e n t i s o f f e r e d f o r v e r i f y i n g whatspecif i c a t i o n sa r ef a u l t - f r e e , and very l i t t l e h e l p i s a v a i l a b l e f o r d e a l i n g w i t h d e s i g nf a u l t s .I na d d i t i o n , i t i s c l e a r how v e r yi m p o r t a n ts o f t w a r ef a i l u r e s are, and y e t t h e r e i s o n l y t h e s m a l l e s t b e g i n n i n g b e i n g made t o d e v e l o p a model f o rt h e s ef a i l u r e s .D e s i g n i n gw i t hv e r i f i c a t i o ni nm i n d may beone o f t h e most f r u i t f u l avenues f o r work, and r e s u l t s havebegun t o appear i n t h e l i t e r a t u r e . E f f o r t wil b er e q u i r e do n many d i f f e r e n t f r o n t s i f one i s t o f i e l d t h e systems foreseen. The p r e s e n t a t i o ni nS e c t i o n 2.0 c r e a t e s a model t h a t canbeused to p l a nf u t u r ed e v e l o p e n t s .T h i s model u n i f i e s many o f t h e d i s j o i n t c o n c e r n s b r i e f l yd i s c u s s e dh e r e and g i v e s a framework f o r b e t t e r u n d e r s t a n d i n g . If thevalidationprocessforthenextgenerationofsystemsisnot diff e r e n ti ni t sb a s i cd e f i n i t i o no ri n t e n t ,t h e n how w il i t b ed i f f e r e n t ? The answercanbeframed a t t w ol e v e l s . A t t h e m o s tp r i m a r yl e v e l ,t h ed i s t i n c t i o n i s a consequence o f t h e s i g n i f i c a n t i n c r e a s e i n t h e r e l i a b i l i t y r e q u i r e m e n t s p e c i f i c a t i o n . A t y p i c a lr e l i a b i l i t yr e q u i r e m e n tf o r a p r e s e n tg e n e r a t i o ns y s tem i s :
a probabilityofcatastroph o f 10-6 a t 90 m i n u t e s
A typicalreliabilityrequirementforthenext a probabilityofcatastroph h o1u0ar st o f
c failure g e n e r a t i o no fs y s t e m si s : c failure
A second l e v e l o f d i s t i n c t i o n i n t h e c o m p l e x i t y o f t h e v a l i d a t i o n p r o c e s s i s a consequence o f t h i s i n c r e a s e i n t h e r e l i a b i l i t y r e q u i r e m e n t ; t h a t i s , t h e r e a l i z a t i o n so ft h en e x tg e n e r a t i o no f systems wil be s i g n i f i c a n t l y more comp l e xt h a ne x i s t i n g systems.Presentsystems t y p i c a l l y employfromone tofour p r o c e s s o r s i n a b a s i c a l l ys t a t i cc o n f i g u r a t i o n . The nextgenerationsystems
6
w i l employ many moreprocessors and d y n a m i cr e c o n f i g u r a t i o ns t r a t e g i e s ,a l l o w i n g a w i d ev a r i e t yo fo p e r a t i o n a lc o n f i g u r a t i o n sf o rn o r m a l , as w e l l as f a u l t y , conditions
.
These two a t t r i b u t e s , ademandingre1 i a b i l i t y s p e c i f i c a t i o n and t h e a t t e n d a n ts y s t e mr e a l i z a t i o nc o m p l e x i t y , havea tremendousimpact and compound t h e i m p o r t a n c eo ft h ev a l i d a t i o np r o c e s s . The use o f l i f e t e s t i n g i s o u t o f t h e q u e s t i o n ,s i n c ea n yr e a lf a i l u r ei s an e x t r e m e l yr a r ee v e n t . The use o f t e s t i n gf o ri n d u c e df a i l u r e s asa s t r a t e g yi sa l s oi n a d e q u a t eb e c a u s e no t e s t can b ed e s i g n e df o re v e n t st h a ta r eu n f o r e s e e n .I na d d i t i o n ,t h ec r i t i c a l i t yo f t h ei n t e n d e da p p l i c a t i o n ,p a s s e n g e rc a r r y i n gc o m m e r c i a la v i a t i o n ,c r e a t e s a s t r o n gd e s i r ef o rs y s t e m Val i d a t i o n a t a v e r yh i g hl e v e lo fc o n f i d e n c e .T h i s d e s i r e ,c o u p l e dw i t ht h ed i f f i c u l t yo ft h ev a l i d a t i o nt a s k ,i l l u m i n a t e st h e importance o f d e v e l o p i n g t e c h n o l o g y f o r f a u l t - t o l e r a n t f l i g h t c r u c i a l d i g i t a l e l e c t r o n i c systems.
1.5
Report Scope and-Organizat&
WorkingGroup I i d e n t i f i e d g e n e r a l f a u l t - t o 1 e r a n t a v i o n i c s and c o n t r o l systems v a l i d a t i o ni s s u e s .W o r k i n g Group I 1 focused on s p e c i f i cv a l i d a t i o n t a s k s and e s t a b l i s h e d a framework f o rf u r t h e rr e s e a r c h .T h i s documentdraws upon t h e raw i n f o r m a t i o np r o d u c e da tt h e s ew o r k i n gg r o u p s and s e t s as o b j e c tivesthepresentationof:
1.
a generalframework f o r t h e Val i d a t i o n o f u l t r a - r e 1 i a b l e f a u l t to1 erant digital electronic systems ,
2.
a s e to fs p e c i f i ct a s k sf o rt h ev a l i d a t i o no ft h ef i r s tr e p r e s e n t a t i v e u l t r a - r e 1ia b l e f a u l t - t o 1 e r a n t c o m p u t e r s y s t e m s - S I F T and FTMP, and
3.
a s e to fr e s e a r c ht a s k st os u p p o r t an o n g o i n ge f f o r ti nt h ed e v e l o p ment o f t e c h n o 1 ogy f o r f a u l t - t o 1 e r a n t s y s t e m s Val i d a t i on.
S e c t i o n 2.0 r e v i e w st h ee v a l u a t i o no ft h ev a l i d a t i o np r o c e s s and p r e s e n t s a generalframework f o r Val i d a t i o n t e c h n o l o g y r e s u l t i n g f r o m t h e W o r k i n g Group I 1 meeting.Thisgeneral model i s t h e framework w i t h i nw h i c ht h es p e c i f i c t a s k sf o r S I F T and FTMP v a l i d a t i o na r e summarized i n S e c t i o n 3.0. Throughout t h ep r o c e s so fd e f i n i n g and o r d e r i n g t h e Val i d a t i o n t a s k s , it was c l e a r t h a t a number o f i m p o r t a n t r e s e a r c h p r o j e c t s a r e r e q u i r e d t o s u p p o r t t h i s t e c h n o l o g y . S e c t i o n 4.0 i d e n t i f i e s r e s e a r c h t a s k s i n t w o m a j o r c a t e g o r i e s - t h o s e i n supp o r t o f v a l i d a t i o n and t h o s e i n s u p p o r t o f f a u l t - t o l e r a n t c o m p u t i n g , and recommends s p e c i f i c e f f o r t s f o r t h e f u t u r e d e v e l o p m e n t o f t e c h n o l o g y r e q u i r e d t o s u p p o r t Val i d a t i on o f f a u l t - t o 1 e r a n t systems which w i 11be p a r t o f t h e a c t i v ec o n t r o ls t r u c t u r eo fe n e r g ye f f i c i e n ta i r c r a f to ft h ef u t u r e . The Appendices o ft h i sr e p o r ti n c l u d e :( I )d e f i n i t i o n so ft e r m s ,( 1 1 )t a s kd e s c r i p t i o n s g e n e r a t e d byWorkingGroup I 1 , ( I 11) t h e r e s u l t s o f t a s k r a t i n g s made by t h ew o r k i n gg r o u pp a r t i c i p a n t s , and ( I V ) a l i s t o f WorkingGroup 11 attendees.
7
2.0
TOWARDS A VALIDATION METHODOLOGY FOR AVIONICS COMPUTERS
2.1
FAULT-TOLERANT
TraditionaM l ethods
A traditionalapproachtoreliabilityvalidation i s t h e l i f e t e s t i n g method i n whichonetakes n statisticallyidenticalcopies of t h es y s t e mu n d e rt e s t (SUT) a n dt e r m i n a t e st h et e s ta f t e r r ( 1 < r < n)systemshavefailed.Using t h ea c c u m u l a t e dt i m eo nt e s tT r yo n ec a na e r i v e a p o i n t e s t i m a t e and conMTTF, of thesystem. These s t a t i s t i c a l f i d e n c ei n t e r v a l sf o rt h e mean l i f e , o r t e c h n i q u e s a1 so a1 1ow one t o c a l c u l a t e c o n f i d e n c e i n t e r v a l s for system re1 iab i l i t y f o r a n yg i v e nm i s s i o nt i m e .F o rd e t a i l so ft h es t a t i s t i c a lt e c h n i q u e s used i n l i f e t e s t i n g , s e e ref. 2. It s h o u l db ec l e a rt h a tt h ea c c u m u l a t e dt i m e on t e s t Tr i n c r e a s e s as thereliability,or MTTF., o f t h e SUT i n c r e a s e s .F o rf i x e d r and n, t h i s impliesthatthewidthoftheconfidenceintervalsfor MTTF, o r t h e r e l i a b i l i t y of SUT, i n c r e a s e sw i t h Tr. I n o t h e r words, i f one d e s i r e s a f i x e dw i d t h o ft h ec o n f i d e n c ei n t e r v a l , onehas t o i n c r e a s e t h e number n o f systemsunder test. It f o l l ows t h a t t h e number o f s y s t e m sr e q u i r e dt ob ep u tu n d e rt e s ti n creasesmonotonicallywiththere1iabil i t y o f t h e s y s t e mb e i n gt e s t e d .F u r t h e r m o r e ,t h ev a l i d a t i o np r o b l e m i s compounded because t h e c o s t o f an i n d i v i d ualcopyofthesystem a1 so i n c r e a s e s r e m a r k a b l y w i t h i t s r e 1 i a b i l i t y . Thus , t h ec o s to fv a l i d a t i o ni n c r e a s e s more t h a n p r o p o r t i o n a t e l y w i t h t h e r e l i a b i l i t y o f t h e systemunder t e s t ( s e e F i g u r e 2.1).
2.1.1
R e l i a b i l i t yV a l i d a t i o no f
a SimplexSystem
Withthisinformationin mind, l e t us c o n s i d e r t h e r e l i a b i l i t y v a l i d a t i o n o fc o n v e n t i o n a l simp1 ex(nonredundant)systems. Such s y s t e m sa r ec h a r a c t e r i z e d byrelativelylowlevelsofreliability and,hence, thecostofvalidationis w i t h i n reason. I f oneassumes t h a tt h et i m et of a i l u r eo ft h e system i s exponentially distributed with the failure rate X ( o r MTTF = l/X) , thesystemcan bemodeled as a t w o - s t a t e Markov c h a i n as shown i n F i g u r e 2.2. N o t et h a tt h e s t a t e 1 abeled 1 imp1 i e s t h a t t h e s y s t e m i s w o r k i n g p r o p e r l y , w h i l e s t a t e 0 ind i c a t e st h es y s t e m hasmalfunctioned. The' system i s i n i t i a l i z e d t o s t a t e 1 and a f t e r a random i n t e r v a l o f t i m e , endsup i n t h e - f a i l u r e s t a t e 0. Forsuch a c o n v e n t i o n a ls i m p l e xs y s t e m ,t h ev a l i d a t i o np r o c e s sc o n s i s t so f o n l yt w os t e p s :
1)
o b t a i n i n g a p o i n te s t i m a t e and c o n f i d e n c ei n t e r v a l s f o r t h e f a i l u r e r a t e X ( o r t h e MTTF/or t h e s y s t e m r e l i a b i l i t y f o r a s p e c i f i e dd u r a t i o n ) + and
2)
testing the assumption of exponential
l y d i s t r i b u t e d 1if e t i m e s .
Bothofthesestepsarewellwithintherealmofstatisticallifetestingtechn i ques
.
8
2.1.2
ReliabilityValidation
ofRedundant
Systems
Next, 1e t us consider a redundantsystemdesigned t o provide greater 1eve l s of r e l i a b i l i t y , e.g., a two-unitstandbysparing system. One unit i s the otherunit i s keptin a standby status. For placed intooperation,while the purpose of re1 iabil ity analysis, the system can bemodeled by a three-state Markov chain a s shown inFigure 2.3. State i impliesthat i (= 0,1,2)units arein proper working order. x i s t h e f a i l u r e r a t e ofan individualunit and c i s t h e coverageparameter associatedwiththereconfiguration mechanism. The 0. If a f a u l t ocs t a r t i n g s t a t e of the system i s 2 and t h e f a i l u r e s t a t e i s system goes t o s t a t e 1. If a f a i l c u r s i n s t a t e 2 and i t i s covered,thenthe ureoccursin s t a t e ' l , thenthe system as a whole u l t i m a t e l yf a i l s .I t should be noted t h a t t h e f a i l u r e of a single unit does not necessarily imply system fail ure, and t h a t f o r a covered f a u l t , t h e system goes through two s t a t e s beforereachingthefailurestate. This standby unitfeature of a redundant system increases system re1 iabil i t y over a simplex system consisting of only one unit.
Two methods for reliability estimation (or validation)arepresently available. One method usesconventional 1 i f e t e s t i ng techniques which t r e a t t h e In particul a r , the system a s a black box, disregarding i t si n t e r n a ls t r u c t u r e . re1 i a b i l i t y model of Figure 2.3 i s not used. Becauseof the increased re1 i a b i l i t y of the system, thecost of validationincreases(seeFigure 2.1). The second validation method u t i l i z e s t h e r e l i a b i l i t y modelof Figure 2.3. To eval uate system re1 i a b i l i t y using t h i s model , one needs the values of the f a i l u r e r a t e X of an individualunit and coverage c of thereconfiguration be e s t i mechanism.These parameters ( a n d t h e i r confidenceintervals)areto (Note the distinction between mated by conducting a 1 i f e t e s t on these units. t h e l i f e t e s t of a unit and t h e l i f e t e s t of a standbysparing system composed of theseunits. I n particular,thecost of the former i s l i k e l y t o be much less t h a n t h a t of t h el a t t e r . ) To estimatethe coverageparameter, fault-insertion experimentsare used. 2.1.3
Inadequacy of Traditional Methods
To achieve ul tra-high re1 iabi 1 i t y , extensive use of standby sparing and automaticreconfiguration i s made t o increasetheprobability t h a t a system reachingthefailurestate. willpass t h r o u g h many "good" states before finally Because of the high 1eve1of redundancy, system re1 i abil i t y i s pushed t o hi t h e r t o unachievablelevels. However, applying traditionallifetestingtechniques 2 . 1 ) . Due tothe high impliesunreasonably high validationcosts(seeFigure have availcost of system design and construction, i t i s usually difficult to ablethe numberof copies needed t o o b t a i n s t a t i s t i c a l l y s i g n i f i c a n t r e s u l t s from l i f e t e s t i n g . For example, only onecopyof each.ofthe SIFT and FTMP systems i s under construction. If onewere t o use a 1 i f e t e s t f o r re1 iabil i t y validation, then the sample s i z e i s n = 1. Sincetheexpectedtimeuntil t h e f i r s t (and only) system f a i l u r e wouldbe rather long, on the order of lo8 hours , i t i s c l e a r l y i n f e a s i b l e t o conduct such a t e s t from the standpoint of time needed. Furthermore, even i f one completed such a testhypothetically, s t a t i s t i c a l confidenceintheresults of t h e t e s t wouldbe negligible due t o the small sample s i z e , n = r = 1. 9
2.2 2.2.1
Proposed Val idation Methodology
Discussion Leading t o t h e
Proposed Methodology
The above discussion clearly imp1 i e s t h a t t h e t r a d i t i o n a l 1 i f e t e s t i ng approach must be abandoned when validatingultra-highreliability systems. The problem of validating such systems i s r e l a t i v e l y unexplored, as pointedout by Hi1 1 i e r and Lieberman ( r e f . 3 ) : "Statistical estimation ofcomponent [or subsystem] r e l i a b i l i t y i s well i n hand, b u t estimation of system re1 i a b i l i t y fromcomponent d a t a i s v i r t u a l l y an unsol ved probl em .Ii I n searching for an a1 ternative technique to traditional 1 i f e t e s t i n g , we note t h a t one shortcoming of the traditional method i s i t s "black box" approach which ignorestheinternalstructure of SUT, and i s based on experimental t e s t i n g followed by s t a t i s t i c a la n a l y s i s . W i t h u l t r a - h i g hr e l i a b i l i t y systems, one cannotafford t o ignoretheinternalstructure of the system. I t i s believed t h a t a validation methodology of such systems must be based on a judicious combination of experimental testing,analytic modeling, and logicalproofs.
Having decided t h a t theinternalstructure of SUT must be an integral p a r t of thevalidationprocess, one has t o determine further the level at which the system structurewill be considered. GreenandBourne ( r e f . 4 ) suggest t h a t the system should be broken down hierarchically u n t i l a level i s reached where allthenecessary measurement data i s either available o r can be collected. I t i s f e l t t h a t an a n a l y t i c a lr e l i a b i l i t y model (e.g., a Markov model) provides just enough detail of the system s t r u c t u r e f o r ourpurposes. Such models enable one to abstract the states and t h e s t a t e transitions of a complex system i n t o a r e l a t i v e l y small and,hence, manageable g r a p h structure. I t i s also believedthatdata can be collectedtoestimatetheparameterscharacterizing such a model. T h u s , theselection of a Markov r e l i a b i l i t y model todrivethe validationprocess i s consistent w i t h thesuggestions by Green and Bourne. I t should be noted t h a t Markov modelshavebeenused f o r system r e l i a b i l i t y prediction for a long time; however, using such models for r e l i a b i l i t y v a l i d a t i o n i s bel ieved t o benew.
For the purpose of t h i s exposition, let us assume t h a t the Markov modelof Figure 2.3 i s a proper abstraction of t h e r e l i a b i l i t y behavior of the system under t e s t . This model can be characterized by two parameters: 1) t h ef a i l ure r a t e X of an individual u n i t , and 2 ) the coverageparameter c of the dynamic reconfiguration mechanism. In order t o predict system r e l i a b i l i t y ,t h e s e two parameters mustbe known. Therefore,experimentaldata (presumably from an extensive lifetesting) on t h e f a i l u r e s of theindividual u n i t must be obtained. Applying statistical techniques, p o i n t estimate and confidenceintervals on the f a i l u r e r a t e X can then be obtained.Similarly,faultinjectionsimulation experiments must be conducted t o make inferences on thecoverageparameterc.
10
More complex systems(and hence morecomplex models) require more parame t e r s t o be estimated. These parameter types can be placed i n t o two classes: 1) Parametersassociatedwiththefault-occurrencebehavior systems, e.g., the failure rate X i n Figure 2.3. 2)
of the sub-
Parameters associated with the faul t-hand1 i ng (or faul t-processi ng) behaviorofthesystem, e.g., the coverageparameter c inFigure 2.3.
If one examines a complex faul t - t o 1 erant system such as SIFT ( r e f . 5 ) or FTMP ( r e f . 6) (see Figure 2 . 4 f o r a re1 i a b i l i t y model ) , i t appears that faulthandling o f the system i s characterized by more t h a n one parameter.Faultand handling i s composed ofphases such as fault-detection, fault-location, system reconfiguration. To predict system r e l i a b i l i t y , mean detectiontime, mean location time, and mean reconfiguration time must f i r s t be estimated In addition, coverage associated w i t h each ofthese from experimentaldata. phases must also be inferred from experimentaldata. Once the above parameters havebeen estimatedwithinthedesiredconfidence l i m i t s , t h e r e l i a b i l i t y of the SUT can be estimatedusing an available 11,or CARE 111. This, package such a s , CAST, ARIES, CARSRA, SURF, CARE,CARE us with a three-step reliability estimation procedure a s inessence,provides f 01 1 ows : 1.
Estimateparameters occurrencebehavior
(i.e.,failurerates)associated of the subsystems.
with thefault-
2.
Estimateparameters(e.g.,coverage)associatedwiththefaulthandlingbehavior of the system.
3.
Estimate system r e l i a b i l i t y usingtheanalyticalreliability
model.
Furtherexamination of the reliability estimation procedureabove, however,reveals a major weakness. I t i s assumed t h a t theanalyticalreliability model i s a proper abstraction of system behavior. This may not necessarily be true. AnyVal i d a t i on methodol ogy f o r ul tra-hi g h re1 i abi 1 i t y systems should and the soidentify and c r i t i c a l l y examine a l l assumptionsintheformulation l uti onof the re1 i abi l i t y model These assurnpti ons should e i t h e r be verified by a 1 ogical proof ( i f p o s s i b l e ) , or simul ation/emul ation/physical experiments must be definedtotestthevalidity of theassumptions. I n theeventthat of experimentalevisome of these assumptions do not hold t o be true in light t o modify appropriately the reliability model. dence,preparations mustbemade
.
In general, three classes of assumptions are made intheformulation the solution of t h e r e l i a b i l i t y model.
and
1) Structural Assumptions - A r e l i a b i l i t y model i s an abstraction of system i t s e l f . Such a model struce i t h e r a 1 ower level model orthephysical t u r e i s usual l y a directed graph consi sting of a s e t of nodes and a s e t of
11
arcs. (SeeFigures 2.2-2.4.) Each node i n the model represents a s e t of s t a t e s i n a lower level model ( i .e., a projection).Similarly, each arc i n the model represents a s e t of s t a t e t r a n s i t i o n s i n the lower 1 evel model. I t must beproved t h a t these abstractions are done correctly. Wens1 ey e t a1 ( r e f . 5 ) have outlined a proofprocedure t o show t h a t a given abstract model i s a corThis proof procedure rectstructuralabstraction of another lower 1evel model i s expected t o become standard and avail able for use by the computing community as a whole. I n addition, one may look toward automata theoryforhelp. Should the answer of this proofprocedure be negative, however, then the reliability model must be modified , resulting perhaps i n a model w i t h a larger number of s t a t e s and s t a t e t r a n s i t i o n s .
.
.
2 ) Assumptions Regarding the Faul t-OccurrenceBehavior - A f a u l t model consists o f a postulated class of faults, the corresponding failure rates (which occur a s labels of certainarcs i n t h e r e l i a b i l i t y model), and the of each f a i l u r e c l a s s . An attempt must description of thestochasticprocess i n the f a u l t model This assumpbe made n o t t o miss any c r i t i c a l f a u l t t y p e s tion appears t o be neithertestable ( i n a f i n i t e f a s h i o n ) , nor provable. More discussion i s needed on thisissue. Of related importance i s whether two or more d i s t i n c t f a u l t types havebeencombined i n t o a single f a u l t c l a s s i n the model T h i s assumptionshould be provable by structural methods described under (1) above. The result of expanding one faultclassintoseveral i s an expansion i n t h e s t a t e space and the number of s t a t e t r a n s i t i o n s . The reason f o r a f i n e r c l a s s i f i c a t i o n may be due t o a significant difference i n detection, location and reconfiguration times associated w i t h the f a u l t s i n question.
.
.
The second type of assumpti on i n a f a u l t model i s the average fail ure rate o f each f a u l t class.. Such an assumption can be tested by s t a n d a r d re1 i a b i l i t y l i f e t e s t s of theassociated subsystem. However, i t i s presumed t h a t t h i s e f f o r t r e l i e s on MIL-STD-217B-type expressions, and assumption f o r f a i l u r e r a t e computations and theexperience t h a t has gone i n t o theformulation of such a standard. I f a check reveals t h a t the computation of f a i l u r e r a t e s i s i n e r r o r , t h e change can be easily absorbed i n t o a r e l i a b i l i t y model sincethe f a i l u r e r a t e s appearasparameters. The t h i r d type of assumption i n a f a u l t model i s thenature of the stochastic f a i 1 uieprocess. ' This i s usually assumed t o be a Poissonprocess (or equivalently,the assumption of an exponentialtime-to-failuredistribution).. Sincethefailure here refers t o t h e f a i l u r e of a subsystem (suchas a processor),the assumption may be s t a t i s t i c a l l y t e s t e d i f r e s u l t s of a l i f e t e s t on the subsystem are a v a i l ab1 e. 3 ) Assumptions on theFault-HandlingBehavior - As mentioned e a r l i e r , mean time and coverage of each phase of the f a u l t - h a n d l i n g process mustbe estimated;This can be done by conducting fault-injection experiments on the simulation/emulation/physical version of the SUT. Since thevalue of the coverageparameter i s known t o have a significant effect on system r e l i a b i l i t y ( r e f . 7 ) , coverage must be measured carefully and estimated w i t h i n a small confidence interval
.
Various phases of the recoveryprocess (detection, 1 ocatiori,reconfigurat i o n , etc.) have associated d i s t r i b u t i o n s sincethecorrespondingtimesare 12
random v a r i a b l e s . The usualassumption i s t h a t of an exponentialdi s t r i b u t i on. This assumptioncanbeverified by meansof measurementsconducted on the p r o t o t y p e o r on f a u l t - i n j e c t i o n - t y p e simul ationfollowed by s t a t i s t i c a l tests. If the a b o v e v e r i f i c a t i o n results i n the unfortunateconclusionof a nonexpon e n t i a ld i s t r i b u t i o n , the r e l i a b i l i t y model becomes a non-Markovianmodel. Insumnation, the results of v a l i d a t i o n t e s t s could be a more complex r e 1 i a b i l i t y model due t o e i t h e r a growth i n the s t a t e s p a c e o r a non-Markovi an model.Research e f f o r t s must, t h e r e f o r e , befocused on how t od e a l w i t h these two p o s s i b i l i t i e s . A p o s s i b l ea n a l y t i c a la p p r o a c ht o the s t a t e growthproblem i s t o l o o k i n t o the method o f s t a t e a g g r e g a t i o n ( r e f . 8), o r the t e c h n i q u e o f near-completedecomposition(ref. 9 ) . A p o s s i b l ea p p r o a c ht o the handlingof n o n e x p o n e n t i a ld i s t r i b u t i o n s is t o use the Coxianmethodsofstages(ref.10). The three techniquessuggested here a r e usedroutinely i n q u e u e i n g t h e o r e t i c models f o r computersystemperformanceanalysis(refs.11,12,13,14). 2.2.2
D e t a i l s of the ProposedFlethodology
As discussed i n the p r e v i o u ss e c t i o n , the u l t r a - h i g h r e l i a b i l i t y r e q u i r e ments o f f a u l t - t o 1 e r a n t c o m p u t e r s f o r d i g i t a l f l i g h t control appl i c a t i o n s prec l u d e the use o f t r a d i t i o n a l l i f e t e s t s f o r the purposesofvalidation. A valimust be based on a judiciouscombinationof dationmethodologyforsuchsystems l o g i c a lp r o o f s ,a n a l y t i c a lm o d e l i n g , and experimental testing.
Analyticalmodelsenable us t o a b s t r a c t the s t a t e s and s t a t e t r a n s i t i o n s of a complexsysteminto a r e l a t i v e l y smallmanageablegraphstructure. Such graphmodelscan be used t o p r e d i c t the a s p e c t s of the behavior of the system A logicalproof may beused t o show t h a t the a n a l y t i c a l model i s understudy. indeed a properabstractionof the realsystem.Both the logicalproof and the t r a c t a b l e a n a l y t i c a l model arebased on certainatomicassumptionsregarding These assumptions must be tested by e x e r c i s i n g a simul a t i o n / systembehavior. emu1 a t i o n o f t h e s y s t e m ( o r a physical prototype, i f avai 1 ab1 e ) .
T h u s , logicalproofs,analyticalmodels, and e x p e r i m e n t a lt e s t i n ga r e three c a t e g o r i e s o f a c t i v i t i e s t h a t a r e i n t e g r a l p a r t s o f a v a l i d a t i o n methodol ogy (see Figure 2.5). These three categoriesapplynotonlyin the Val idai n the v a l i d a t i o n o f o t h e r s y s t e m a t t r i b u t e s , tionofreliability,butalso suchas i t s performance,safety, etc. However, mostofourdiscussion here i s r e s t r i c t e d t o the v a l i d a t i o n o f s y s t e m r e l i a b i l i t y . The logicalproofprocedureshave been c o l l e c t e dt o g e t h e ru n d e r Task 1-2 i n the l i s t o f v a l i d a t i o n t a s k s g i v e n i n t a b l e3 . 1o f t h i s r e p o r t . As d i s cussed i n the p r e v i o u ss e c t i o n , i t needs t o be shown t h a t the chosenanalytical r e l i a b i l i t y model i s a p r o p e r a b s t r a c t i o n of the systemunderconsideration. I t i s p o s s i b l e t o g i v e a logicalproof of this f a c t a s d e m o n s t r a t e d i n reference 5. This i s d e s c r i b e d herein a s Task 1-2. Besides t h i s proof, i t i s a l s o proposed that a proof of correctness of system design (hardware/software) , as i t s design, we1 1 as the proof t h a t the system s c h e d u l e r p e r f o r m s a c c o r d i n g t o be presented .
13
W i t h i nt h ea c t i v i t yl a b e l e da n a . l y t i c a 1m o d e l s ,t h ed e v e l o p m e n t ,r e f i n e ment,and s o l u t i o n o f r e l i a b i l i t y models i s i n c l u d e d and i d e n t i f i e d asTask 1-1 i n S e c t i o n 3.0. C u r r e n t l y , Markovmodelsare i n use, b u ta l t e r n a t i v e model t y p e s ,s u c ha sP e t r in e t ss h o u l db ei n v e s t i g a t e d . The t h i r d m a j o r c a t e g o r y o f a c t i v i t y r e f e r s t o e x p e r i m e n t a l t e s t i n g on t h e simulation/emulation/physical v e r s i o n o f t h e f a u l t - t o l e r a n t s y s t e m t o be v a l i dated.Thi s c a t e g o r y i sf u r t h e rd i v i d e di n t of o u rs u b c a t e g o r i e s . The f i r s t categoryreferstothevalidationofthefault-occurrencebehaviorofthe subsystemscomprisingthesystemundertest.Examplesofsuchsubsystemsarethe i n d i v i d u a lp r o c e s s o r s , memories,devices,etc. I f adequate l i f e t e s t s canbe c o n d u c t e d ,t h e nt h ef a i l u r er a t e and t h e d i s t r i b u t i o n o f t h e t i m e s t o f a i l u r e o f a subsystem may be i n f e r r e du s i n gs t a t i s t i c a lt e c h n i q u e s .E q u i v a l e n t l y , r e 1i a b l es t a n d a r d ss u c h as MIL-STD-217B may beused. No t a s k hasbeen d e f i n e d alongtheselinesinSection 3.0. The n e x t s u b c a t e g o r y u n d e r e x p e r i m e n t a l t e s t i n g r e f e r s t o t h e Val i d a t i o n o ft h ef a u l t - h a n d l i n gb e h a v i o ro ft h es y s t e m .T h i s hasbeenadequatelycovered byTasks11-7 t o 11-13 i n S e c t i o n 3.0 o f t h i s r e p o r t . S i n c e suchVal i d a t i o n e x p e r i m e n t sr e q u i r es t a t i s t i c a l methods i nd e s i g n i n gs u c he x p e r i m e n t s( t h a ti s , p r e p r o c e s s i n g ) and i na n a l y z i n gd a t ac o l l e c t e df r o mt h ee x p e r i m e n t( t h a ti s , p o s t p r o c e s s i n g ) , a s e p a r a t e Task1-3hasbeen d e f i n e dt os u p p o r t such s t a t i s t i c a la c t i v i t i e s . Task1-3 p r o v i d e s a b r i d g eb e t w e e nt h ea n a l y t i c a lr e l i a b i l i t y model(Task1-2)andexperimentaltesting(Task Group 11). The t h i r d s u b c a t e g o r y u n d e r e x p e r i m e n t a l t e s t i n g r e f e r s t o Val i d a t i o n o f t h ef a u l t - f r e eb e h a v i o r o f thesystem(Tasks 11-1 t o 11-6). These t e s t s ,t o g e t h e rw i t ht h ep r o o fo fd e s i g nc o r r e c t n e s s( T a s k I - Z ) , i n c r e a s eo u rc o n f i d e n c e thatthesystem does n o t s u f f e r f r o m any design/documentation/implementation/ r e a l iz a t ion fl aws
.
Since we a r e c o n s i d e r i n g u l t r a - h i g h r e 1 i a b i l it y systems regarding which we should expect many l l s u r p r i ses .I' Exalmost no p r a c t i c a l e x p e r i e n c e e x i s t s , p l o r a t o r yt e s t i n g( T a s k Group111) i s ,t h e r e f o r e ,p r o p o s e dt ou n c o v e rf u t u r e s u r p r i ses. The s e t o f s t e p s needed i n t h e Val i d a t i o n o f s y s t e m r e 1 i a b i l sented i n f l o w c h a r t f o r m i n F i g u r e 2.6.
it y are pre-
As p o i n t e d o u t e a r l i e r , t h e v a l i d a t i o n p r o c e d u r e i s d r i v e n by t h e r e l i a b i l i t y model. The r e 1 i a b i l i t y model s t r u c t u r e i s o b t a i n e d f r o m t h e s y s t e m d e s c r i p t i o n ,w h i c hi t s e l fi st h eo u t p u to ft h es y s t e md e s i g np r o c e s s .F a u l t o c c u r r e n c eb e h a v i o ro ft h es u b s y s t e m s i s t h e second s e t o f i n p u t s needed f o r t h es o l u t i o no ft h er e l i a b i l i t y model. A c h a r a c t e r i z a t i o n o f t h i s f a u l t o c c u r r e n c eb e h a v i o r may be i n f e r r e d f r o m l i f e t e s t s c o n d u c t e d on thesubsystems. A l t e r n a t i v e l y ,s t a n d a r d s ,s u c h asMIL-STD-2176 o rp a s te x p e r i e n c e , may beused t oc h a r a c t e r i z et h ef a u l to c c u r r e n c eb e h a v i o r . A t h i r ds e to fi n p u t s needed t o exercisethereliability model i s t h e c h a r a c t e r i z a t i o n o f t h e f a u l t h a n d l i n g b e h a v i o ro ft h e s y s t e m .T h i si n v o l v e sc o n d u c t i n gf a u l t - i n j e c t i o nt y p ee x p e r i ments and a n a l y s i so fr e s u l t i n gd a t au s i n gs t a t i s t i c a lt e c h n i q u e s . The charac-
14
t e r i z a t i o n of the fault-handling behavior encompasses estimationofcoverage and mean times associated w i t h variousphases of the fault-handling process (e.g.,detection,isolation,reconfiguration,etc..). Since the re1 i abil i t y model i s an abstraction of the dynamic behavior of a proof t h a t the model i s a the system, i t i s extremelydesirabletopresent proper abstraction of the system behavior.
The postulated fault model may not have covered all possible fault types. Exploratorytesting i s needed t o uncover any surprises and t o observe system response to inputs outside the design envelope. Once the three sets of inputs are available for the r e l i a b i l i t y model, i t can be eval uated numerically using avai 1 ab1 e re1 iabil i ty analysis packages , such as CARE 111, when i t becomes available. The resultingreliabilitypredict i o n needs t o be checked againstthedesignrequirements.Ifthepredicted r e l i a b i l i t y i s found working, a system redesign needs t o be undertaken. 2.3
From thediscussioninSection needed areclearlyevident.
Future Work 2.2, three topics
on which future work i s
(A)
Investigation of r e l i a b i l i t y models other t h a n theclassical Markovian model s.
(B)
Solution of r e l i a b i l i t y models w i t h a l a r g es t a t e
(C)
Dealingwithnonexponential d i s t r i b u t i o n of times t o f a i l u r e and similarly nonexponential distributionsinthe phases of f a u l t handlingprocess(e.g.,detectiontime,locationtime,etc.).
I n addition,thefollowingtopics
space.
need furtherresearch.
(D)
Most Markovian model s assume t h a t successiveeventsare independent. Methodsof t e s t i n g t h i s assumption and methodsof modifying the reliabi 1 i t y model , in case of dependence, are needed.
(E)
Quantitative methodsof tion are needed.
(F)
A Val idation methodology t h a t not onlyaddressesthequestion of r e l i a b i l i t y Val idation, b u t also encompasses the validation of s a f e t y , performance, and economics of the system i s needed ( r e f s . 15,16).
Further discussion re port.
softwarereliabilityestimation
of these topics is
and predic-
presented i n Section 4.0 of t h i s
15
3.0
PRELIMINARY TASKS FOR
SIFT/FTMP RELIABILITY VALIDATION
A primaryobjectiveofWorkingGroup I 1 was t o i d e n t i f y s p e c i f i c t a s k s whichshouldbeconducted i n s u p p o r to ff a u l t - t o l e r a n tc o m p u t e r Val i d a t i o n research. I no r d e rt op r o v i d e a f o c u sf o rt h e s ee f f o r t s ,S I F T and FTMP were usedasexample f a u l t - t o l e r a n tc o m p u t e r s( r e f s . 17,18). T h i ss e c t i o nd e s c r i b e s t h es e to fs p e c i f i ct a s k sw h i c h were i d e n t i f i e d by w o r k i n gg r o u pp a r t i c i p a n t s . No c l a i m sa r e made as t o t h e s u f f i c i e n c y o f t h e s e t a s k s . However, t h e need f o r t h e s e t a s k s was a f f i r m e d b y o n e o r more p a r t i c i - p a n t s - o f WorkingGroup 11. The proposedtaskshave
been c l a s s i f i e di n t of o u rm a j o rc a t e g o r i e s .
These
are:
1. 2. 3.
4.
C o n f i r m a t i o no f System R e l i a b i l i t y F a u lP t r o c e s s i n gV e r i f i c a t i o n F a u lPt r o c e s s i n gC h a r a c t e r i z a t i o n Other Tasks
The f o l l o w i n g s e c t i o n s o f t h i s r e p o r t d e s c r i b e t h e p a r t i c u l a r o b j e c t i v e s o f t h e proposedtasks i n t h e s e c a t e g o r i e s and b r i e f l y d e s c r i b e t h e t a s k s p r o p o s e d . These t a s k s a r e r e l a t e d t o t h e v a l i d a t i o n t e c h n i q u e s d i s c u s s e d i n S e c t i o n F i g u r e 2.5. 3.1
2.0,
C o n f i r m a t ion o f System Re1 ia b i 1 it y
The u l t r a - h i g h r e 1 i a b i li t y r e q u i r e m e n t s o f f a u l t - t o 1 e r a n t c o m p u t e r s f o r d i g i t a lf l i g h tc o n t r o la p p l i c a t i o n sp r e c l u d et h e use o f t r a d i t i o n a l l i f e t e s t s f o rt h ep u r p o s e so fv a l i d a t i o n . The method o f Val i d a t i o n mustbebased on a c r i t i c a le x a m i n a t i o no fm o d e l sf o rr e 1i a b i l i t y p r e d i c t i o n .I np a r t i c u l a r , an a t t e m p ts h o u l db e made t o examine a l l t h e a s s u m p t i o n s made i n t h e f o r m u l a t i o n and t h e s o l u t i o n o f t h e model , e x p e r i m e n t sm u s tb ed e s i g n e dt ot e s tt h ev a l i d i t y o ft h ea s s u m p t i o n sw h e r e v e rp o s s i b l e( o r a p r o o f mustbegiven, i f possib l e ) , and f i n a l l y , p r e p a r a t i o n s mustbe made t o change t h e r e l i a b i l i t y model i f t h e -i n i t i a la s s u m p t i o n sa r en e g a t e db ye x p e r i m e n t a lr e s u l t s . T h r e em a j o ra s p e c t so f separately:
a r e l i a b i l i t y model a r e i d e n t i f i e d
and t r e a t e d
1) Re1 i a b i l i t y Model S t r u c t u r e 2)
F a u l t Model ( t y p e so ff a i l u r e ,a s s o c i a t e df a i l u r er a t e s distributions)
F3a) u l t
and
H a n d l i n gB e h a v i o r( c o v e r a g e ,d e t e c t i o nl o c a t i o n , r e c o n f i g u r a t i o nt i m e s ,e t c . )
ion model s t r u c t u r e i s a p Veri fi c a ti o n t h a t t h e r e l i a b i l i t y . r o .p e ra b s t r a c t o ft h ea c t u a ls y s t e m may be performed-using a c o m b i n a t i o n o f l o g i c a l p r o o f and experimentproof. The f a u l tp r o c e s s i n gv e r i f i c a t i o ns u b g r o u pc o n c e n t r a t e s on e x p e r i m e n t st ov e r i f y and p a r a m e t e r i z et h ea s s u m p t i o n s inthefaulthandling b e h a v i o rf o r a f i x e d( g i v e n )f a u l t model. The f a u l tc h a r a c t e r i z a t i o ng r o u p attemptstoidentify new f a u l t c l a s s e s and, h e n c e ,e x t e n dt h ef a u l tm o d e l . 16
Thus, a1 1 t h e ' t h r e e c l a s s e s o f t a s k s ( r e p r e s e n t e d b y t h r e e g r o u p s ) a r e s t r o n g l y r e l a t e d .F i g u r e 3 . 1s u m n a r i z e st h ep r o p o s e dv e r i f i c a t i o np r o c e s s .
As shown i n F i g u r e 3.1, t h e r e l i a b i l i t y p r e d i c t i o n i s based on a r e l i a b i l i t y model (e.g., a Markovmodel). The f o r m u l a t i o n and s o l u t i o n o f t h i s model i s c l e a r l y t h e f i r s t s t e p , i d e n t i f i e d asTask 1-1 i n Table 3.1. A reliab i l i t y t h e o r i s t f o r model f o r m u l a t i o n and a n u m e r i c a la n a l y s tf o r model s o l u t i o n s h o u l d be a v a i l a b l e . S i n c e model p r e d i c t i o n sa r e based on assumptionsregardingsystembehavior,theseassumptionsmustbecritically examined f o r v a l i d a t i o n . The s t r u c t u r e o f t h e r e l i a b i l i t y model w i l beconfirmedby a p r o o fw h i c h w i l show t h a t t h e a c t u a l system i s a p r o p e r r e f i n e m e n t o f t h e r e 1 i . a b i l it y model (Task1-2). The c o m p l e x i t yo ft h es y s t e mu n d e rc o n s i d e r a t i o n w i l dict a t e t h a t t h e above p r o o f be s t r u c t u r e d i n t h e f o r m o f a h i e r a r c h y ofproofs. The a s s u m p t i o n sr e g a r d i n gt h ef a u l t - h a n d l i n gb e h a v i o r w i l be v a l i d a t e db y c o n d u c t i n ge x p e r i m e n t so ne i t h e rt h ea c t u a ls y s t e mo r on an e m u l a t i o n o f t h e system. S t a t i s t i c a ld e s i g no ft h ee x p e r i m e n t s and a n a l y s i so ft h er e s u l t i n g e x p e r i m e n t a ld a t a w i l beperformed i n Task 1-3. The r e s u l t i n gd a t a may r e q u i r e a change i n t h e r e l i a b i l i t y model and subsequently a new p r e d i c t i o n o f system re1 ia b i 1 ity. The taskssummarized
i n Tab1 e 3.1 have been recommended byWorkingGroup
I I a t t e n d e e s f o r Val i d a t i n g u l t r a - r e 1 i a b l e f a u l t - t o l e r a n t c o m p u t e r s , s u c h a s SIFT and FTMP. These p a r t i c u l a rt a s k sa d d r e s sc o n f i r m a t i o no fs y s t e mr e l i a b i l i t y t h r o u g h :1 )r e l i a b i l i t ym o d e l i n gs t r u c t u r e s , 2 ) f a u l tm o d e l i n g , and 3 )f a u l th a n d l i n gb e h a v i o ra n a l y s i s . 3.2
kit
P r o c e s s i n gV e r i f i c a t i o n
T h i ss e c t i o nd e s c r i b e st h ee x p e r i m e n t sd e s i g n e df o rf a u l tp r o c e s s i n gv e r i f i c a t i o n( t a b l e3 . 2 ) . The e x p e r i m e n t si d e n t i f i e da r e basedon t h ef o l l o w i n g system classifications: appl ication software
I
e x e c u t i v es o f t w a r e
I
mu1 ti p r o c e s s of ru n c t i o n a l
ity
m a c h i n eo p e r a t i o n( I S P ) *
i
h a r d w a r e( l o g i c ,
*
a p pi cl a t i osno f t w a r e on uni processor
I
e x e c u t i v es o f t w a r e on uni processor
powersupply,etc.)
InstructionSetProcessor 17
The correctness of eachof the sections mentioned above (e.g., application software,executive, .) canbe investigated by t e s t i n g and formal procedures. An idealcombination of t e s t i n g and formal procedure i s t o prove t h e design and test the implementation. A t thepresenttime, proof of correctness ofadesign i s i n i t s infancy; testing for design and implementation verification and alsospecification measures i s suggested.
..
There a r e t h i r t e e n experiments whichhavebeen identified to test the system.These thirteen experiments f a l li n t o two classes: 1.
FunctionalTesting: The idea behind t h i ss e t of t e s t s i s t o exercise each integral p a r t of the system (e.g.,singleprocessor,single processorexecutiveroutine,etc.). This willprovide a certain degree of confidencethatthe system i s performing i t s basicfuncThis set of t e s t si ss t r u c t u r e d to tionaloperationscorrectly. revealdesignerrors(e.g. , wrong s p e c i f i c a t i o n ) , physical f a u l t s (e.g., short p i n s ) ,e t c .
2.
FaultProcessingTests:Thisset of t e s t s i s designed t o exercisethe f a u l t hand1 ing capability of the system within i t s design objectives. There are two sources t h a t can expose the system (or different parts o f the system) t o e r r o r :
a.
hardware f a u l t s - I t i s proposed t h a t f a u l t s be injected a t the f o l l owi ng hardware 1 evel s:
- singlestuckpins
-
stucklogic
- whole chip
commonmode power supply - clock The type of f a u l t s i n j e c t ed in t hle f i rst three 1 evel s are:
-
solid
- intermittent
b.
transient design error
softwaredesignerrors
- These types of errorsinclude:
stress time i nconsi s t e n t d a t a wrong d a t a . The f i r s t sixexperimentsexplainedin t i onal i t y t e s ting and the nextexperiments ficationarea.
18
t h i s s e c t i o n belong t o the funcbelong t o t h e f a u l t process i ng veri-
3.3
FaultProcessi ng Characterization
Exploratory testing i s requiredinorder t o establ ish the bounds within which the system can possiblyoperate. I t i s notexpected thatthe system will t o l e r a t e a l l of the test conditions of t h i s c l a s s . On thecontrary,the system will be frequentlydriveninto anomalous conditions. I t i s c h a r a c t e r i s t i c of the tasks enumerated i n this section (table 3 . 3 ) t h a t provision must bemade f o r the machine or systemunder t e s t t o be r e i n l t i a l i z e d a f t e r a nonsurvivableevent, and f o r a data f i l e t o be off-loaded from i t s memory i n t o a f a c i l i t y d a t a bank. The d a t a f i l e s would containsequences ofsyndromes perceived by the system during the course of t h i s t e s t i n g , and the concomitant configuration changes. Theirvalue wouldbe to exhibit the consequences of the test conditions as seen a t the various abstract levels at whichsyndrome d a t a i s available. Another c h a r a c t e r i s t i c of these tasks is the abil ity to modulate theint e n s i t y o r severity of t h e t e s t s in some respect. This is desirable in order t o permit some quantification of the vulnerability of theunit under t e s t . Metaphorically, this would yield a logical shmoo plotfortheprobableoperating region. The t e s t r e s u l t s must be carefully scrutinized i n order t o detect lapses in the design, either because thedesign may be too f r a g i l e , o r because the designintent i s not met. The l a t t e r s i t u a t i o n corresponds t o what hasbeen referred t o as a "surprise." I t i s e a s i e r t o findsurprises when the system does not otherwise f a i l . Here instead,the system c o n s i s t e n t l yf a i l s , and surprises mustbe searched f o r by winnowingpostmortem d a t a . Excessive f r a g i l i t y , of future systems. meanwhile, may feed back tothedesign I t may be noted t h a t most of the tasks of t h i s c l a s s c l o s e l y resemble othertasks in which the system recoveryhypothesis i s being substantiated. I t i s believed, however, t h a t the diversity in object.ives between such other tasks and thesetaskssufficientlywarrantstheseparatecategories. The probing natureoftasksinthiscategoryallowsdegrees of freedom not usefulinthe other, and which might therefore be overlooked i f t h e two categories were t o be combined f o r convenience. I t i s alsotrue t h a t t h e s et e s t s need not be as exa rough characterization of the haustive o r as accurate as the others, for shmoo i s q u i t e adequate. 3.4
Other Tasks
During the course of Working Group I I , several tasks were proposed which of thefirstthreecategories. In some cases, do not c l e a r l y f i t i n t o e i t h e r In these tasks spanned more t h a n one of thepreviouslydefinedcategories. othercases,the recommended tasks deal w i t h indirectly related, b u t highly relevant,areas such a s instrumentation. These tasks are summarized inTable 3.4.
19
4.0
SUMMARY
The p r e c e d i n g c h a p t e r s o f t h i s r e p o r t e s t a b l i s h t h e i m p o r t a n c e , b r e a d t h , and d i f f i c u l t y o f t h e systems v a l i d a t i o np r o c e s s .F o rs e v e r a ly e a r s , NASALangleyResearchCenter has p r o v i d e dl e a d e r s h i p and f u n d i n g i n anongoing e f f o r tt od e v e l o pt h et e c h n o 1 ogy r e q u i r e dt os u p p o r tt h i sp r o c e s s . The working g r o u pm e e t i n g sr e p o r t e dh e r ea r et h em o s tr e c e n ts t e p si nt h i sd e v e l o p m e n t effort. The c h a l l e n g e o f v a l i d a t i n g t h e n e x t g e n e r a t i o n o f u l t r a - r e 1 i a b l e systemscanbestbemet i f t h o s ew i t hr e l e v a n te x p e r i e n c e and u n d e r s t a n d i n g o f v a l i d a t i o no fp r e s e n t systemscanbeusedasresources t o addressthequestions:
1) What i s t h e s t a t e o f t h e a r t i n systems Val i d a t i o n ? 2) What a r et h ei m p o r t a n tu n a n s w e r e dq u e s t i o n s i n systems Val i d a t i o n ? 3 ) How can w most e f f e c t i v e l yp r o c e e d ? While i t i s c l e a r t h a t t h i s d i v e r s e community o f p r e s e n t e x p e r t s i s a powerfulasset, i t i s n o t c l e a r how t h e yc a nb e s tb ed i r e c t e dt op r o d u c e a usef u lp r o d u c t . The workinggroupformat was s e l e c t e d as t h ev e h i c l e most 1 i k e l y to utilize this asset effectively. The w o r k i n gg r o u pa c t i v i t y ,a l o n gw i t hi n d i v i d u a lf o l l o w - u pe f f o r t , has Val i d a t i o n producedsignificantcontributionstothedevelopmentofsystems technology. The most i m p o r t a n to ft h e s ec o n t r i b u t i o n sa r er e p o r t e di nt h i s document. They a r e :
1.
The i d e n t i f i c a t i o n ,c r i t i c i s m ,r e v i e w , and r e f i n e m e n to f a v a l i d a t i o n framework and r e l i a b i l i t y v a l i d a t i o n p r o c e d u r e ( S e c t i o n 2.0).
2.
The i d e n t i f i c a t i o no ft a s k s and f a c i l i t i e sr e q u i r e dt ov a l i d a t et h e u l t r a - r e 1 i a b l e specimen, f a u l t - t o l e r a n t c o m p u t e r s , S I F T and FTMP ( S e c t i o n 3.0 and Appendix11). A s i g n i f i c a n t number o ft h e s et a s k s werereviewedbytheworkinggroupattendees(Appendix111).
The taxonomy o f t h e Val i d a t i o n p r o c e s s w i t h i t s t h r e e p r i m a r y c l a s s e s o f a c t i v i t y , p r o o f methods;analyticmethods; and e x p e r i m e n t a lm e t h o d s( F i g u r e 2.5) , a l o n g w i t h t h e c o m p r e h e n s i v e v i e w o f t h e r e l i a b i l i t y Val idation process ( F i g u r e 2.6) a r ei m p o r t a n tc o n t r i b u t i o n s . These modelsare a b a s i sf o rb o t h presentunderstanding and . f u t u r ep l a n n i n ge f f o r t s . It i s s i g n i f i c a n t t h a t a d i v e r s eg r o u po fp r o f e s s i o n a l s has had t h e o p p o r t u n i t y t o see and r e a c t t o t h i s m o d e l ,t h u sp r o v i d i n g an i m p o r t a n tf o r mo fp e e rr e v i e w .T h i sr e v i e wp r o c e s s is one o f many r e q u i r e d t o s u b s t a n t i a t e t h e v a l i d i t y o f t h e s e m o d e l s . I n a d d i t i o n , a c a n d i d a t es e t o f s p e c i f i ce x p e r i m e n t s( 2 3i n number) havebeenreviewed and evaluated; and w h i l e no f o r m a lv o t eo fc o n f i d e n c e was t a k e n , t h e c r e d i b i l i t y and i m p o r t a n c eo ft h e s ee x p e r i m e n t s i s e l e v a t e db yt h e i re x p o s u r et or e v i e w and informalassessment. The c a n d i d a t ee x p e r i m e n t ss t i m u l a t e da d d i t i o n a tl a s k s i d e n t i f i e d by t h ep a r t i c i p a n t s . The t o t a lc o l l e c t i o no ft a s k sp r o v i d e s a subs t a n t i a l and r e a s o n a b l eb a s i sf o rf u t u r ep l a n n i n g . They a r et h ep r i m a r yb a s i s f o r t h e recommendations made i nt h en e x ts u b s e c t i o n .
20
The development of a validationtechnology i s an ongoing,neverending, a c t i v i t y and concepts requiring future devei opment surfacedin abundance d u r i n g the working group. I t became apparent t h a t even the mostmundane and wellunderstood validation tasks often depend upon assumptions and estimationsthat need furtherresearcheffort.Specialattention was given by many people t o the identification of specificresearchprojectsrequired t o support future validationtechnology development. The proposed e f f o r t s vary a greatdealin t h e i r importance and level of e f f o r t . Many of thetasksfocus on faulttolerant computing technology,whileothersaddress Val idationtechnology; s t i l lo t h e r sa r e a mixture of the two. Any editingactivity by necessity applies the bias of the editor. The original , uneditedtask recommendations are, therefore, givenin Appendix I 1 so t h a t they may be preserved beyond the abstractions necessary for this report.
21
5.0
REFERENCES
1.
" V a l i d a t i o n Methods f o r F a u l t - T o l e r a n t A v i o n i c s and C o n t r o l Systems , I ' WorkingGroupMeeting No. 1, 12-14March1979,SponsoredbyNASA-Langley ResearchCenter, Hampton, VA.
2.
Bain, L. J., S t a t i s t i c a l A n a l y s i s New York, NY: Marcel Dekker, Inc.,
of R e l i a b i l i t y and L i f e - T e s t i n g Models. 1978.
3.
H i l l i e r , F. S. and G. J. Lieberman,OperationsResearch. CA: Holden-Day,Inc. , 1974.
San Francisco,
4.
Green, A. E. and A. J. Bourne, Re1 i a b i l i t y Technology. W i l e y - I n t e r s c i e n c e , 1972.
5.
Wensley, J. e t a l . , "SIFT:Design and A n a l y s i s ofa F a u l t - T o l e r a n t CompuProceedings of t h e IEEE, October1978, PP. terforAircraftControl 1240-1255.
New York, NY:
,I'
6.
Hopkins, A L. e t a l . process for Aircraft 1221-1239.
, "FTMP ,I'
-
A H i g h l y Re1 i a b l e F a u l t - T o l e r a n t M u l t i P r o c e e d i n g s - o f - t h e IEEE, October 1978, pp.
7.
o f Coverageand I t s E f f e c t on t h e R e l i a b i l i t y Arnold, T. F., "TheConcept Model of a R e p a i r a b l e System," I E E E T r a n s a c t i o n s onComputers.Vol. C-22, March1973, pp. 251-254.
8.
Chandy, K. M. and C. H. Sauer,"ApproximateMethods f o rA n a l y z i n gQ u e u e i n g ACM ComputingSurveys,Vol.10, No. 3 NetworkModels o f ComputerSystems," (September1978) , pp. 281-318.
9.
C o u r t o i s , P. J., Decomposability:Queueing and ComputerSystem t i o n s . New York, NY: Academic Press, 1977.
Applica-
10.
Cox, D. R. , "The Use of Complex P r o b a b i l i t i e s i n t h e T h e o r y o f S t o c h a s t i c Processes,"Proc.Cambridge.PhilosophicalSociety,Vol. 51- (1955), pp. 313-319.
11.
Chandy, K. M. and R. T. Yeh, CurrentTrends i n ProgrammingMethodology, Vol. 111: Software Modeling. Englewood C l i f f s , New J e r s e yP: r e n t i c e Hal 1 , 1978.
12.
K1 e i n r o c k , L. , vol umes (1975,
13.
Kobayashi , 1978.
14.
T r i v e d i , K. S. October 1978.
22
New York, NY:
H., Modeling and Analysis.Reading,
, " A n a l y t i cM o d e l i n g
W lie y - I n t e r s c i e n c e , MA:
2
Addisson-Wesley,
of ComputerSystems,"
I E E E Computer,
15.
T r i v e d i , K. S., "DesigningLinearStorageHierarchies So As To Maximize R e l i a b i l i t y S u b j e c t t o CostandPerformanceConstraints,"Proceedings1980 I n t e r n a t i o n a l Symposium on Computer Architecture, La Baule,France.
16.Meyer, J., "Computation-Based R e l i a b i l i t yA n a l y s i s , IEEE T r a n s a c t i o n s on Computers,Vol. C-25, No. 6 , June 1976, pp. 578-584. 17.
Hopkins , A. L., e t a l . , "FTMP - A Highly Re1 i a b l e Faul t - T o l e r a n t Mu1 t i p r o c e s s o rf o rA i r c r a f t , "P r o c e e d i n g so f the IEEE, pp. 1221-1239,October 1978.
18.
Wensley, J . , et a l . , "SIFT:Design and AnalysisofaFault-Tolerant Computer f o r A i r c r a f t C o n t r o l Proceedingsof the IEEE, pp. 1240-1255, October1978. ,I'
23
APPENDIX I - DEFINITIONS AND REFERENCE CODE REFERENCE CODE Note: The f o l l o w i n g i s a p r e l i m i n a r y ,n o n e x h a u s t i v ec o l l e c t i o no ft e r m sw h i c h r e l a t et of a u l t - t o l e r a n t c o m p u t e rv a l i d a t i o n .B o t hh a r d w a r e - o r i e n t e d and s o f t w a r e - o r i e n t e dd e f i n i t i o n sa r ei n c l u d e df o rt h ep r e s e n tt i m e .
I E E E Std.100-1972
IEEE S t a n d a r d D i c t i o n a r y o f E l e c t r i c a l and E l e c t r o n i c Terms.
IEEE/FTC
I n t e r i m I E E E TechnicalCommittee on Faul t-Tolerant Computing Dictionary o f Terms.
SET
" S o f t w a r eE n g i n e e r i n gT e r m i n o l o g y " D r a f t , 23 March1978by R. Postonand H. Hecht - TerminologyTaskGroup Subcommittee on S o f t w a r eE n g i n e e r i n g Standards - TechnicalCommittee on S o f t w a r eE n g i n e e r i n g , I E E E Computer Society.
DACS
" D a t aa n dA n a l y s i sC e n t e rf o rS o f t w a r e (DACS) G l o s s a r y - A B i b 1 i o g r a p h y o f S o f t w a r eE n g i n e e r i n g Terms,"Compiled by Ms. S h i r l e y G l o s s - S o l e r , Rome Air D e v e l o p m e n tC e n t e r / I S I S I ,G r i f f i s s AFB, N.Y.
s PS
S t r u c t u r e d ProqramminqSeries,Vol. 15, Val i d a t i o n a n d - V e r i f i c a t i o n S t u d y , R. L. Smith, May 1375, RADCTR-74-300. DEFINITIONS
C o r r e c t n e s sP r o o f
Technique o f p r o v i n g m a t h e m a t i c a l l y t h a t a givenprogram i s c o r r e c t w i t h a g i v e n s e t o f s p e c i f i c a t i o n s . The processcan be accomplishedbymanualmethods o r by program v e r i f i e r s r e q u i r i n g manual i n t e r a c t i o n . (SPS)
Error(HardwareGenesis)
Any discrepancybetween a computed, o b s e r v e d ,o rm e a s u r e dq u a n t i t y and t h e timespecified,ortheoreticallycorrect v a l u eo rc o n d i t i o n .( I E E ES t d .1 0 0 - 1 9 7 2 )
24
Error(SoftwareGenesis)
An e r r o r i s an a c t i o n which results i n s o f t w a r ec o n t a i n i n g a f a u l t . The a c to f makingan e r r o r includes o m i s s i o n o r misi n t e r p r e t a t i o n of user requirements i n the s o f t w a r e s u b s y s t e m s p e c i f i c a t i o n . Incorrecttranslationoromissionof a requirement on t h e d e s i g n s p e c i f i c a t i o n andprogramming e r r o r s . (SET)
Fai 1ure
The t e r m i n a t i o n o f the a b i l i t y o f an item . t o perform i t s required f u n c t i o n . (IEEE S t d . 100-1972)
Fault(Hardware
Genesis)
A p h y s i c a lc o n d i t i o nt h a tc a u s e s a device, component, o re l e m e n tt of a i lt op e r f o r m i n a required manner; f o r example, a s h o r t - c i r c u i t o r a broken wire. (IEEE S t d . 100-1972)
Fault(Software
Genesis)
A f a u l t i s a manifestationof
an e r r o r i n programcode. The f a u l t i s e v i d e n t when e n t r y of some i n p u t d a t a r e s u l t s i n the program f a i l i n g t o perform the r e q u i r e d f u n c t i o n . Note: f a u l t and bug a r e the same t h i n g . (SET)
Faul t To1 erance
The capacityof a computer,subsystem,or program t o w i t h s t a n d the e f f e c t s o f intern a lf a u l t s ; the number of errorsproduci n g f a u l t s a computer,subsystem,orprogram canendurebefore normal f u n c t i o n a l c a p a b i l i t y i s impaired. (IEEE/FTC)
I n t e r m i t t e n tF a u l t
A t e m p o r a r yf a u l t .
StuckFault/StuckFailure
A f a i l u r e i n whicha
(IEEE S t d . 100-1972)
d i g i t a ls i g n a l is permanentlyheldinoneof itsbinary s t a t e s . (IEEE S t d . 100-1972)
25
Validation
The process o f d e t e r m i n i n g w h e t h e r e x e c u t ingthesystem(i.e.,software,hardware, userprocedures,personnel i n a user e n v i r o n m e n t )c a u s e sa n yo p e r a t i o n a ld i f ficulties. The p r o c e s si n c l u d e se n s u r i n g thatspecificprogramfunctions meet t h e i r r e q u i r e m e n t sa n ds p e c i f i c a t i o n sV. a l i d a t i o na l s oi n c l u d e st h ep r e v e n t i o n ,d e t e c t i o n ,d i a g n o s i sr e c o v e r y and c o r r e c t i o n o f e r r o r sE. d i t o r i a l Comment: V a l i d a t i o n i s more d i f f i c u l t t h a n t h e v e r i f i c a t i o n processsince i t i n v o l v e sq u e s t i o n s o f thecompleteness o f t h e s p e c i f i c a t i o n andenvironmentinformation.Thereare bothmanual-andcomputer-basedvalidat i o nt e c h n i q u e s . (SET)
Verification
Computerprogram v e r i f i c a t i o n i s t h e iterativeprocessofdeterminingwhether ornottheproductof eachstep o f t h e c o m p u t e rp r o g r a ma c q u i s i t i o np r o c e s sf u l f i l l s a l l r e q u i r e m e n t sl e v i e db yt h ep r e v i o u ss t e p . These stepsaresystem s p e c i f i c a t i o nv e r i f i c a t i o n ,r e q u i r e m e n t s v e r i f i c a t i o n ,s p e c i f i c a t i o nv e r i f i c a t i o n andcode v e r i f i c a t i o n (SET)
NOTE TO READER: "failure,""fault
There i s a l a c k o f c o n s i s t e n c y i n i n t e r p r e t a t i o n s o f t h e t e r m s ,'I and " e r r o r . " C o n s i d e rt h ef o l l o w i n g :
(1) t h a t a f a i l u r e i s an event, ( 2 ) t h a t a f a u l t i s a c o n d i t i o n( o rs t a t e ) , ( 3 ) t h a t an e r r o r i s a datum. Then t h e f o l l o w i n g s t a t e m e n t s a p p l y t o b o t h h a r d w a r e
and and s o f t w a r e :
A failureistheevent when somethingcauses a d e v i c e , component,system, a l g o r i t h m ,e t c .t o change i t s s t a t e f r o m one i n which i t performs i t s i n t e n d e df u n c t i o nt o one i n which i t doesnot. The somethingwhichcauses t h e change may o r may n o t be known. A f t e r t h e f a i l u r e , t h e d e v i c e , component,etc. i s c a l l e d a f a i l e do rf a u l t yd e v i c e , component, etc. Any h i g h e r l e v e l system o fd e v i c e s ,e t c . ,w h i c hc a n n o tp e r f o r mi t sf u n c t i o nb e cause a subdevice,etc. isfailed,isalsocalledfailedorfaulty. A fault is the particular condition or f l a w i n a f a i l e dd e v i c e ,e t c . w h i c hd i f f e r e n t i a t e s i t from i t s u n f a i l e d s t a t e .
26
When the function or output of a device, etc. differs from i t s intended functionoroutput,thatdifferenceiscalledtheerror. In dataprocesscan be ingsystems,error meansbad or wrong d a t a . An e r r o r i s a l l t h a t detectedinternallyto a computing system. A higherlevel system, which contains a failed device, etc. emitting errors yet continues to perform i t s function, i s s a i d t o be f a u l tt o l e r a n t . An accumulation of errors may well be thecause of a f a i l u r e of a higherlevel system.
Thus, a physicaldevice f a i l s when i t ''breaks down." Thereafter i t contains a f a u l t . A system designerorsoftwareprogramer can create a design o r s o f t programmer, not the ware containing a f a u l t . In t h i s sensethedesigneror design o r software,failed. A f a u l t may o r may not be active; when i t i s , one A f a u l t i s latent,transient,intermittentor permanent o r more errorsresult. A software bug may not dependent upon the manner i n which i t generateserrors. surfaceuntil some time a f t e r a system has been inoperation; i .e., i t may be t o spelatent. A bug may cause a d a t a e r r o r onlyoccasionallyinresponse cific,infrequent i n p u t d a t a patterns and, thus, may appear intermittent. Customari l y , a software bug i s regarded as a permanent f a u l t , remaini ng in the it is system even a f t e rt h e moment of i t s c r e a t i o n by a programmer.However, possible f o r a b u g , having given r i s e t o a d a t a e r r o r , t o disappear from an operational system, i n which case i t appears as a transient. The resulting bad d a t a may o r may not be attenuated i n further processing.
27
APPENDIX I 1
-
WORKING GROUP I 1 TASK DESCRIPTIONS
CONFIRMATION OF SYSTEM RELIABILITY Tasks 1-1 t h r o u g h 1-3 Proposedby PreliminaryWorkingGroup I1 Participants
J. Go1 dberg (Chai rman) W. C a r t e r K. T r i v e d i R . A1 b e r t s Tasks1-4through WorkingGroup
1-11
Proposedby I 1 P a r t i c i p a n t s As I n d i c a t e d
TASK 1-1 T i t l eR: e l i a b i l i t yA n a l y s i s Objectives: 1. Develop and r e f i n em a t h e m a t i c a m l odels o f system r e l i a b i l i t y . 2. E s t i m a t er e l i a b i l i t yc h a r a c t e r i s t i c su s i n g assumed and e x p e r i m e n t a l l y - d e r i v e df a i l u r es t a t i s t i c s . Procedure: A mathematical model o f s y s t e m r e 1 i a b i 1 it y f o r a subject computer w i 11be a c q u i r e do rd e v e l o p e d . The model shouldcomprehend a l l i n t e n d e df a u l t hand1 i n gb e h a v i o r a t an a p p r o p r i a t e l e v e l o f a b s t r a c t i o n , and i t shouldbe t r a c t a b l ef o rn u m e r i c a le v a l u a t i o n . The f a u l t s assumed shouldbemodeledby t y p e (e.g., permanent, t r a n s i e n t ,i n t e r m i t t e n t ) ,d i s t r i b u t i o n and r a t e . The f a u l th a n d l i n gb e h a v i o rs h o u l di n c l u d ef a u l td e t e c t i o nt i m e ,f a u l tl o c a t i o n t i m e and r e c o n f i g u r a t i o nt i m e , and o t h e ra p p r o p r i a t ec h a r a c t e r i s t i c s ,s u c h as t h ec o v e r a a e sa s s o c i a t e dw i t ht h et h r e e Dhases. A s e t o f svstemmodelsand o r e f 1 e c t v a k y ing t r a d e - o f f s o f - f i d e l it y and compuf a u l t modeis may beneeded t a t i o n a l t r a c t a b1i it y
.
~
t h e model i s t o p r e d i c t t h e r e l i a b i l i t y o f t h e The p r i m a r yp u r p o s eo f i n i t i a l number o f g i v e n s y s t e m f o r a r a n g e o f assumed c o n f i g u r a t i o n s (e.g., p r o c e s s o r s ) and f o r a range o f assumed f a u l t l e v e l s, i n o t h e r words, t o s u p p o r t i t y . As a means t o t h i s end, t h e model w il be a claim for mission reliabi used f o r twosecondarypurposes:
1.
28
s p e c i f i c a t i o no fd a t at ob eo b t a i n e df r o me x p e r i m e n t s on t h ea c t u a l i l beused t o i m p r o v et h e c o m p u t e r( o ra ne m u l a t i o no f i t ) which w qualityofassumptions on f a u l t o c c u r r e n c e s and f a u l t - h a n d 1 i n g c h a r a c t e r i s t i c s o f t h e system,
2.
d e f i n i t i o no fr e q u i r e m e n t s on t h ed e s i g no ft h ec o m p u t e r ,i n a form thatallowsproofthatthedesignisconsistentwiththereliability model.
The r e s u l t s o f t h e t a s k s on P r o o f o f C o r r e c t n e s s and ExperimentalValidat i o n o f System F a u l t H a n d l i n g w i l beanalyzed t o d e t e r m i n e i f t h e s t r u c t u r e o f t h e model i s a t r u er e p r e s e n t a t i o no ft h e system. I f it i s n o t ,t h e nt h e model w i l beanalyzed may need t o b er e v i s e d . The r e s u l t so ft h ee x p e r i m e n t a lt e s t s by t h eD a t aA n a l y s i s Task (Task1-3) inordertoobtain more r e a l i s t i c c h a r a c t e r i z a t i o n so ff a u l t s and systemresponses. New r e l i a b i l i t y p r e d i c t i o n s w il b eg e n e r a t e do nt h eb a s i so ft h e s ec h a r a c t e r i z a t i o n s . Facilities: 1. Computer Support a. I n t e r a c t i v e computersystem t os u p p o r t model development, i n c l udi n g s p e c i f i c a t i o n and programming (e.g., TOPS20). b. P o w e r f u sl c i e n t i f i cc o m p u t e rt os u p p o r t model e v a l u a t i o n( w i t h h i g h speed and h i g ha c c u r a c y ) (e.g., CYBER, DEC 10). 2. Software Support a. E x i s t i n gr e l i a b i l i t ya n a l y s i s packages, e.g., CARE 111. b. Program development environment. c. S t a t i s t i c a ln a l y s i s l d e s i g n . d. L o g i c aal n a l y s i s( p r o o cf h e c k e r t, h e o r e mp r o v i n g ) . e. L o g i cd e s i g na n a l y s i s( s i m u l a t o r t, e s g t enerator).
~"
Personnel: Re1 i a b i l it v T h e o r i s t N u m e r i c a lh a l y s t - P r o g r a m e r L e v e lo fE f f o r t : 1 @ l / Z - t i m ec o n t i n u i n g 1 @ full -time continuing P r i o r i tHy i:g h e s t TASK 1-2
T i t l e :C o n f i r m a t i o n
and Use o f DesignProofs
Objective: 1. M a i n t a i n t h ei n t e g r i t yo fd e s i g n - c o r r e c t n e s sp r o o f st h r o u g hs y s t e m m o d i f i c a t i o n s and p r o o fr e v i s i o n . 2. Use r e s u l t so e f x p e r i m e n t a tl e s t on t h es u b j e c ct o m p u t e tr oc o n f i r m assumptions made i n t h e p r o o f . 3. Use t h ep r o o ft og u i d ee x p e r i m e n t a tl e s t s . Procedure: A f o r m a lp r o o f o f t h e c o r r e c t n e s s o f t h e d e s i g n o f t h e s u b j e c t c o m p u t e r wil b ea c q u i r e d .T h i sp r o o f w i l l i k e l y have t h e f o r m o f a h i e r a r c h y ,t h et o p levelofthehierarchybeingthe Val i d a t i o n o f t h e r e l i a b i l i t y model s t r u c t u r e . Efforts w i l b ea p p l i e d ,t ot h ee x t e n tp r a c t i c a l ,t oc o m p l e t et h ep r o o fw i t h i n
29
i t s intendedscope. design.
The proof wi 11 be revised t o ref1ect changes inthe
system
Test resultswill be analyzed t o confirmassumptions made intheproof, such as the functions performed a t the lowest level of the proof (e.g., machine instructions) and timingcharacteristics of scheduling,synchronizations and reconfiguration. The proof will be analyzed t o help plan experimental t e s t s . For example, classes of f a u l t s w i l l be distinguished t h a t are equivalent with respect t o some system state, in order t o economize on the numberof t e s t s needed t o cover system f a u l t behavior. Facilities: 1. Computer Support a. ' A powerful symbol-manipul ation computer (e.g. , DEC 10). Software Support a. Proof-checking tools (high b. Proof-of-correctnesstool
2.
priority). s (1 ower p r i o r i t y )
.
Personnel : Logician - Program-Correctness Theorists - One Senior - One Junior Level of Effort: Full-time - two years each Priority:
Highest TASK 1-3
Title:
Design of Experiments and Analysis of Experimental Data
Objective: Estimateparameters and distributions of variousaspects hand1 ingbehavior. The parameters of interestare:
30
( f o rt h o s ef a u l t s t h a t arecovered) faultdetectiontime, faultlocationtime, reconfigurationtime.
1.
Estimate a. mean b. mean c. mean
2.
Distributions of the above.
3.
Estimate a. faultdetection coverage, b. f a u l t 1 ocationcoverage, c. reconfiguration coverage.
o f thefault-
4.
Suggestchanges tothestructure on theparameterization of the r e l i a b i l i t y model according t o the results of the data analysis.
Procedure: 1. Supply experimental designs t o thefault processingverification g r o u p , incl uding the number and .type (factorial, block, e t c . ) of experiments needed, and the set of d a t a t o be obtained from each experiment. 2.
Afterthedata i s received from the fault processingverification group, use s t a t i s t i c a l procedures t o perform the above threeobjec1 i s a mean-value estimation problem. Confidence tives.Objective intervals shouid be obtainedusingstandard s t a t i s t i c a l methods. Objective 3 i s the problemof estimatingproportions. Once again, standard s t a t i s t i c a l methods areavailableforthis purpose. Objecfunctionalestimation. The usual hypothesis t i v e 2 i s the problemof t h a t the respective distributions are exponential can be tested using s t a t i s t i c a l methods such asthe Kolmogorov-Smirnov t e s t .
3.
Evaluate validity of assumptions made i n theconstruction on theresults of t h e s t a t i s t i c a l t e s t s . model,based
Facilities: 1. Hardware - A mainframe computer i s needed forcarrying analysis. 2.
Software
-
of the
out the d a t a
S t a n d a r d s t a t i s t i c a l packages are needed:
- S t a t i s t i c a lt a b l e s
of chi-square, normal, student-t and other distributions are needed.
Personnel : One S t a t i s t i c i a n of the fault-processing verification One Team Member
group
Level of Effort: S t a t i s t i c i a n a t 1/2-timefor one year Fault-tolerant System Designer a t full-time for one t o two years Priority:
Highest TASK DESCRIPTION WORK SHEET
P a r t i c i p a n t ' s Name John M. Myers Task Number: Task T i t l e :
1-4
-Alternati ve- Model i ng Techniques
" "
Problem: Present modeling of computer r e l i a b i l i t y r e l i e s p r i m a r i l y on s t a t e models(Markov for stochastic or combination for deterministic behavior). State m o m a v e the followingshortcomings: 31
a)
No p r o v i s i o nf o rc o n c u r r e n to p e r a t i o no fs p a t i a l l ys e p a r a t e ds u b s y s tems;andhence, an a r t i f i c i a l l a r g e i n c r e a s e i n a ' ' s t a t e space.''
of c l e a r e x p o s i t i o n o f t h e t i e b)Lack i d e n t i f i a b l es t r u c t u r a lf e a t u r e s .
between a l i n k betweenstates
and
D i s c u s s i o n :P e t r i - n e tm o d e l sc a np o r t r a yc o n c u r r e n to p e r a t i o n and r e l a t ef u n c t i o nt os t r u c t u r e .U n t i lr e c e n t l y ,t h e y have n o t b e e nm a t h e m a t i c a l l yt r a c t able.Recentadvances i nt r a c t a b i l i t ya r ed e m o n s t r a t e df o rd e t e r m i n i s t i c modeling i n t h e a n a l y s i s o f t h e FTMP c l o c kn e t w o r k . The use o f n e t s as a s t r u c t u r e onwhich t o c a l c u l a t e p r o b a b i l i t i e s a1 so appearspromising. P r o p o s a l :S u r v e yp a r t i c i p a n t sf o rc a n d i d a t ea l t e r n a t i v e st o (andviewson) M a r k o vm o d e l s ;d e v e l o pP e t r i - n e t - b a s e dm o d e l i n gf o rt h ea n a l y s i so fc o m p u t e r reliability. Personnel : S e n i o rA n a l y s t
1 man-year
L e v e ol fE f f o r t : Priority:
TB D
TASK D E F I N I T I O N WORK SHEET P a r t i c i p a n t ' s Name
J . F. Meyer
Taskswhichyoufeelareimportant may beproposedon t h i s worksheet.
and n o ti n c l u d e di nt h ep r e l i m i n a r yr e p o r t
Task Number:
1-5
Task T i t l e :
Model S o l u t i o n Methods
C a t e g o r y( c h e c k )R: e l i a b i l i t yc o n f i r m a t i o n F a u l tp r o c e s s i n gv e r i f i c a t i o n F a u l tp r o c e s s i n gc h a r a c t e r i z a t i o n Other (Spec if y )
..
".
.
~
X
Objective: Determinemethods f o r s o l v i n g s t o c h a s t i c p e r f o r m a n c e , r e 1 i a b i l i t y , and p e r f o r m a b i l i t y model(e.g., s t a t e b u m p i n g ,d e c o m p o s i t i o no fs o l u t i o n si nt i m e andspace,approx.solutions,etc.)
.
P r o c e d u r:e 1) I d e n t i f py r o b l e ma r e a s . 2) C l a s s i f i c a t i o no fs o l u t i o nt e c h n i q u e s . 3 ) I n v e s t i g a t i o no fp a r t i c u l a r methods. Facilities: I n t e r a c t i v ec o m p u t e r system. Program development envi ronment
32
.
Personnel : A t least 2 reliabilitytheorists(interaction here).
among personnel i s necessary
Level o f E f f o r t : 2 @ 1/4 - 1 / 2t i m e Priority: High TASK DEFINITION WORK SHEET P a r t i c i p a n t ' s Name
Herb Hecht ~
~~
Taskswhichyoufeelareimportant may beproposed on t h i s worksheet. Task Number:
and n o t i n c l uded i n t h e p r e l m i in a r y r e p o r t
1-6
Task T i t l eE: v a l u a t i oonRf e l i a b i l i t y - R e q u i r e m e n t s C a(t ceRhgeolcricykao)b:ni lf i trym a t i o n F a u l tp r o c e s s i n gv e r i f i c a t i o n F a u l tp r o c e s s i n gc h a r a c t e r i z a t i o n O t h e r( S p e c i f y )
X
Objective: To keep t h e o b j e c t i v e of t h e r e l i a b i l i t y c o n f i r m a t i o n i n l i n e w i t h ( 1 ) c u r r e n tt e c h n o l o g y , ( 2 ) c u r r e n tr e g u l a t i o n s , (3) o b s e r v e di n c i d e n t s( a i r c r a f t a c c i d e n t s and component f a i l u r e p a t t e r n s ) . Procedure: E s t a b li s h a f u n c t i o n f o r k e e p i n g t r a c k o f ( 1 ) i n t o AIRLAB t a s k s o r m o d i f i c a t i o n s o f these.
-
( 3 ) above, t r a n s 1a t et h e s e
Facilities: O f f i c e and l i t e r a t u r e . Personnel : 1 SeniorSystemEngineer L e v e lo fE f f o r t : F u l l - t i m e( t h i si n d i v i d u a l may a l s o be a b l e t o AIRLAB a g a i n s t c u r r e n t r e q u i r e m e n t s ) .
make f o r m a le v a l u a t i o n s
of
Priority: H i ghest TASK DEFINITION WORK SHEET
P a r t i c i p a n t ' s Name
N i c h o l a s D. Murray
33
T a s k sw h i c hy o uf e e la r ei m p o r t a n t may beproposedon t h i s worksheet.
and n o t i n c l u d e d i n t h e p r e l i m i n a r y r e p o r t
Task Number: 1-7 Task TPi tel er f:o r m aCnocnef i r m a t i o n
.~
C a (t ec R hg eo crl iykca)ob:ni lfiitrym a t i o n F a u l tp r o c e s s i n gv e r i f i c a t i o n F a u l tp r o c e s s i n gc h a r a c t e r i z a t i o n O t h e(rS p e c i f yP) e r f o r m a n c ec o n f i r m a t i o n
-
~
I
-
~ = -._ .- .
X
Objective: Using a s t a t e model f o r r e l i a b i l i t y a n a l y s i s , a d e f i n i t i o n mustbe between"good"states and " f a i l e d "s t a t e s .T h e r ea r et w od r i v e r sf o rt h i s definition:
1)
The o p e r a t i o n a lb e h a v i o r of t h es y s t e mu n d e rf a u l tc o n d i t i o n s (i.e. , voting,comparing,etc.).
2)
S u f f i c i e n tr e s o u r c e sa v a i l a b l et os e r v i c et h ef l i g h t - c r i t i c a la p p l i c a tions.
made
The r e l i a b i l i t y modelneeds t o be augmented t o r e f l e c t s u f f i c i e n t p e r f o r m a n c e ( o r l a c k of performance).Forinstance,theSIFThas a model o ft h es c h e d u l e / a l l o c a t i o nr o u t i n et h a ts u p p o r t st h er e l i a b i l i t y model. A1 so, i t w o u l da p p e a rt h a t threshold.
AIRLAB c o u l d be a t o o l t o f i n d t h e p e r f o r m a n c e
TASK DEFINITION WORK SHEET P a r t i c i p a n t ' s Name
A. Hopkins
T a s k sw h i c hy o uf e e la r ei m p o r t a n t may beproposedon t h i s worksheet.
Task Number:
and n o ti n c l u d e di nt h ep r e l i m i n a r yr e p o r t
1-8
Task T i t lPee: r f o r m a n cAen a l y s i s Category(check):
Re1 i a b i l i t yc o n f i r m a t i o n F a u l tp r o c e s s i n gv e r i f i c a t i o n Fault processing characteri zation (O Sp tP heecri f oy r) m a n c e
X
Objective: Develop a s t r u c t u r a l model o f p e r f o r m a n c e c a p a b i l i t y i n c o n t e x t w i t h a p p l i c a t i o n sr e q u i r e m e n t s . Procedure: A n a l y s i so f scheduler.
34
sample a p p l i c a t i o n and t h ef a u l t - t o l e r a n tc o m p u t e r ' s
Facilities: Computation Personnel: Computer S c i e n t i s t - 6 man-months - 1 man-month F l i g h tC o n t r o lE n g i n e e r LevelofEffort: Computer S c i e n t i s t - 6 man-months F l i g h tC o n t r o lE n g i n e e r - 1 man-month
Prio r i t y : High TASK DEFINITION WORK SHEET P a r t i c i p a n t ' s Name
Melliar-Smith ~
T a s k sw h i c hy o uf e e la r ei m p o r t a n t and n o t i n c l u d e d i n t h e p r e l may beproposedon t h i s work sheet. Task Number:
m i in a r y r e p o r t
1-9
Task T i t l e : E x e c u t i v e ~~
Category (check):
Implementat-con Proof
Re1 i a bci ol int yf i r m a t i o n F a u l tp r o c e s s i n gv e r i f i c a t i o n F a u l tp r o c e s s i n gc h a r a c t e r i z a t i o n Other (Speci fy)
X
Objective: To e s t a b l i s h a t AIRLAB t h e c a p a b i l i t y t o f o r m a l l y v e r i f y t h e i m p l e m e n t a tionoftheexecutive (and a s s o c i a t e d )p r o g r a m sa g a i n s tt h e i rs p e c i f i c a t i o n . P r o o fo fh i g h - l e v e ll a n g u a g e and machine i n s t r u c t i o n wil be r e q u i r e d . Procedure : C o l l a b o r a t i o nw i t h academic and r e s e a r c hl a b o r a t o r i e s t o o b t a i n a proof system and t o become f a m i l i a r w i t h i t s use.Development o fg u i d e l i n e sf o r in d u s t r i a l imp1 ementers t o f a c i l it a t e p r o o f . Facilities: L a r g em u l t i - a c c e s sc o m p u t e r ,p r e f e r a b l y Personnel: 1 p l us Computer
DEC system 20.
Scienti st
L e v e lo fE f f o r t : Continuing Priority: Urgent because of very 1i m i t e d c u r r e n t c a p a b i l difficulty of recruiting staff.
i t y o f NASA and because o f
35
L
TASK DEFINITION WORK SHEET P a r t i c i p a n t ' s Name
Melliar-Smith
T a s k sw h i c hy o uf e e la r ei m p o r t a n t may beproposedon t h i s worksheet. Task Number:
and n o t i n c l u d e d i n t h e p r e l i m i n a r y r e p o r t
1-10
Task T i t l eA: p pi lc a t i o nR e q urie m e n t A s nalysis
-
C a (t e cR hg eo crl iykca)ob:ni lfiitrym a t i o n F a u l tp r o c e s s i n gv e r i f i c a t i o n F a u l tp r o c e s s i n gc h a r a c t e r i z a t i o n O t h e r( S p e c i f y )
- -. - ...
"
. .
X
Objective: To developmethods f o r f o r m a l l y v e r i f y i n g t h e s p e c i f i c a t i o n s o f t h e a p p l c a t i o nt a s k sa g a i n s tt h eu n d e r l y i n ga e r o d y n a m i c and s t r u c t u r a lr e q u i r e m e n t s .
i-
Procedure: C o l l a b o r a t i o nw i t h academic and r e s e a r c hl a b s and w i t h NASA and i n d u s t r i a l designteams t o d e v e l o p c a p a b i l i t y w i t h i n a fewyears. Facilities: Large mu1 ti-access computer. Personnel : 1 p l us Computer S c i e n t i s t 1 p l u sM a t h e m a t i c i a nw i t hb a c k g r o u n d
i n aerodynamics and s t r u c t u r e s .
L e v e lo fE f f o r t : Continuing Priority: Unless a1 r e a d y u n d e r w a y e l s e w h e r e ( a t r e q u i r e d l e v e l o f f o r m a l i t y ) , urgent because o f l o n g l e a d t i m e , p o l i t i c a l d e l a y s , and d i f f i c u l t y o f r e c r u i t i n g staff. TASK DEFINITION WORK SHEET
P a r t i c i p a n t ' s Name
Melliar-Smith
Taskswhichyoufeelareimportant may beproposedon t h i s worksheet. Task Number:
and n o t i n c l u d e d i n t h e p r e l i m i n a r y r e p o r t
1-11
Task T i t l eA:p p l i c a t i oPnr o g r a m Proof C a (t ec R hg eo crl iykC a)bo: inl if ti yr m a t i o n F apur lot c e s svi ne gr i f i c a t i o n
36
X
-
F a u l tp r o c e s s i n gc h a r a c t e r i z a t i o n Other (Speci fy) Objective: To devel op, o r s u p p o r t t h e d e v e l o p m e n t o f , o r t o become f a m i l i a r w i t h researchprogramsaimingatverificationofthecorrectnessofapplication programsbymathematicalanalysis. Procedure: C o l l a b o r a t i o n w i t h academic and r e s e a r c h l a b s t o d e v e l o p c a p a b i l i t y w i t h i n a few years.Developprogrammingstandards t oa l l o wp r o o fo fp r o d u c t i o nf l i g h t c o n t r o l programs. Facilities: L a r g em u l t i - a c c e s sc o m p u t e r ,p r e f e r a b l y
DEC system20
Personnel : 2 p l u s Computer S c i e n t i s t Level o f E f f o r t : Continuing Priority: Urgentbecause staff.
o f 1ong l e a d t i m e
and because o f d i f f i c u l t y o f r e c r u i t i n g
FAULT PROCESSING VERIFICAT ON Tasks11-1through11-13 Proposedby P r e l i m i n a r y W o r k i n g Group I 1 P a r i c i p a n t s D.
Siewiorek(Chairman) J . Abraham J. C1ar.y R. Joobbani
Tasks11-14through11-21 WorkingGroup
Proposedby I 1 P a r t i c i p a n t s As I n d i c a t e d
37
TASK 11-1
T i t l eI: n i t i a l
Check-Up (Diagnostic)
Objective: Verify t h a t thesingleprocessor
perform s t hle b asicfuncti ons.
Procedure: Run the standard diagnosticsupplied Facilities: 1. Hardware 2. Software
-
singlesubject singlesubject
by the manufacturer.
processor. processor diagnostic programs.
Personnel : Software Technician Level of Effort: Assume the diagnostic t o be r u n 5 times a day f o r 15 days, 1/2 man-month Priority:
High TASK 11-2
Title:
Programmer's Manual Val
idation
Objective: 1. To look fordesignerrors. To make surethe machine performs the documented in programfunctionsaccording t o its specification as mer's manual. 2.
To investigateincompletelydescribed machine features and f u l l y , I/O, Interface,Interrupts). characterizethosefeatures(e.g.
Procedure: 1. Perform thefunctions their correctness. 2.
documented inprogrammer's
manual and validate
Investigatethe system response t o situations not completely specified in the programer's manual and record the responses.
Facilities: 1. Hardware - singlesubjectprocessor(e.g., BDX-930) - any peripheralordevicesinterfaced t o the system t h a t might beused later(e.g.,sensors,actuators) - test/monitor computer 2.
38
Software
-
programmer's manual program development envi ronment
-
o n - l i n ed e b u g g e r e x p e r i m e n t a lr e s u l t sr e c o r d i n g
and assessmentsoftware
Personnel : C h i e fE x p e r i m e n t e r S o f t w a r eT e c h n i c i a n L e v e lo fE f f o r t : Assume 100 i n s t r u c t i o n s , each i n s t r u c t i o n checkedwith 10 d i f f e r e n t t e s t cases o r v a l u e s (e.g., f o r d i f f e r e n t a d d r e s s mode v a l i d a n d / o r i n v a l i d ) . I f t h e r ea r ea b o u t 10 i n s t r u c t i o n s p e r t e s t and 20 i n s t r u c t i o n s p e r canbeexecuted,then a t o t a l o f 2 t e s t sp e rd a ya r e done,
day
The t o t a l manpower f o r i n s t r u c t i o n t e s t i n g i s
100 in s t r . * 10 t e s t cases = 500 man-days 2 testslday ~"
"N
1.5 man-years.
About 6 man-months a r e a1 so r e q u i r e d f o r i n v e s t i g a t i n g t h e s y s t e m r e s p o n s e t os i t u a t i o n sn o tc o m p l e t e l ys p e c i f i e di nt h e Programmer's Manual. The t o t a l r e q u i r e d
manpower t h e n wil beabout
2 man-years.
P r i o r iH t l i: g h TASK 11-3
T i t l e :E x e c u t i v eR o u t i n e ,I n c l u d i n gE r r o r Mangement ( R e c o n f i g u r a t i o n ) Routines, V a l i d a t i o n ( D e s i g n E r r o r s ) Objective: 1. To v a l i d a t et h a te x e c u t i v er o u t i n e sr e s p o n d as s p e c i f i e d . 2. Search f o rd e s i g ne r r o r s( o rl a c ko fs p e c i f i c a t i o n )i ne x e c u t i v e r o u t ines. Procedure : 1. T r e a te x e c u t i v er o u t i n e s Check response t o
"
a. b. c.
as b l a c kb o x e sw i t ho n l yi n p -u t s
and o u t p u t s .
expected data, o u t - o f - b o u nddasitnad, i v i d u a l i n c o nsst iednat trao.u t i n e s
Data may b eg e n e r a t e df r o me x a m i n i n gt h es o f t w a r e ,f r o ms p e c i f i c a t i o n o f t h e s o f t w a r e module, o r randomly. Make s u r ee v e r yp a t ht h r o u g h i n d i v i d u a lr o u t i n e si se x e r c i s e da tl e a s t once, i n c l u d i n g e r r o r r e t u r n paths.Validateconsistencyofresponseby mu1 t i p l e experiments.
39
2.
Check e x e c u t i v e r o u t i n e i n t e r a c t i o n ( p i p e l e x e c u t i v er o u t i n e sw i t h a. b. c.
expected data, o u t - o f - b o u n d sd a t a i n c o n s i s t e ndt a t a .
,
Determine(measure)routineresponsetime.
4.
Check system e r r o rr e t u r n - r e p o r t i n gp a t h s( i . e . hierarchy)
2.
.
Software a. b. c.
Feed a " s t r i n g " o f
Val i d routine sequences
3.
Facilities: 1. Hardware
ine).
-
s i n g l es u b j e c pt r o c e s s o r testmonitorcomputer
-
programdevelopmentenvironment
,
throughsystem
on-1 i n e d e b u g g e r 1 ike ODT, 6-12 e x p e r i m e n t a rl e s u l t sr e c o r d e r and assessmentsoftware a u t o m a t e de x e c u t i v er o u t i n ee x e r c i s e st h a to n l y need t o know i n p u t / o u t p u ta r e a
Personnel : C h i e fE x p e r i m e n t e r S o f t w a r eT e c h n i c i a n LevelofEffort: Assume 5 i n p u t s , 5 o u t p u t sp e rm o d u l e - eachmodule 100 assembly 1 anguage in s t r 20 PASCAL 1 i n e s 1000 1 ines PASCAL/20 -50 r o u t ines 250 i n p u t s x 20 experimentseach -5000 experiments 4 r o u t i n e s / d a y-3 man-weeks c r o s sp r o d u c t - e a c h r o u t i n e t a l k s t o 2 others 100 r o u t i n e c o m b i n a t i o n s -6 man-weeks 3 man-mont hs
-
P r i o r i tHyi:g h
TASK 11-4 T i t l e :M u l t i p r o c e s s o rI n t e r c o n n e c t i o n Objective: 1. To
2.
40
Val
Val i d a t i o n
i d a t et h ei n t e r c o n n e c t i o n sb e t w e e nt h ep r o c e s s o r s .
To v a l i d a t et h ef u n c t i o n a l i t yo ft h es y s t e mw i t hr e s p e c tt ot h ei n t e r c o n n e c t i o n ( i .e. , communication, protocol hand1 i n g and re1 ated processor effects).
Procedure: 1. Design and r u n d i a g n o s t i c r o u t i n e s t h a t c h e c k t h e i n t e r c o n n e c t i o n s t h e p r o t o c o 1 ( o b v i o u s l y ,t h e r ei s no need t o d u p l i c a t e d i a g n o s t i c s p r o v i d e d by t h e m a n u f a c t u r e r ) .
and
2.
Run s i n g l ep r o c e s s o rd i a g n o s t i c on eachand/or and o b s e r v e t h e e f f e c t s on o t h e r p r o c e s s o r s .
3.
Make a1 1 t h ep r o c e s s o r st a l kt oe a c ho t h e r and o b s e r v et h eb e h a v i o r (such as bus contention, m i s’si ng messages, p r o t o c o l hand1 ing, p r i o r i t y c o n f l i c t s ,e t c . ) .
4.
Determine(measure)responsetime i n t e r r u p t s f o r eachprocessor.
Facilities: 1. Hardware
2.
-
a l lo ft h ep r o c e s s o r s
t o communicationsetup,messages
and
s u b j e cm t ultiprocessor busmonitorhardware t e s tm o n i t o rc o m p u t e r
Software - programdevelopmentenvironment - on-linedebugger - e x p e r i m e n t a lr e s u l t sr e c o r d e r
and assessmentsoftware
Personnel : C h i e fE x p e r i m e n t e r S o f t w a r eT e c h n i c i a n Hardware Technician L e v e lo fE f f o r t : Assume t h e r ea r es i xD r o c e s s o r sc o n n e c t e d t o each o t h e r( f u 1 l . yc o n n e c t e d ) ; . onone r u n n i n g t h e multiprocessor’diagnostic and o b s e r v i n g t h e i n t e r c o n n e c t i o n p r o c e s st a k e sh a l f a man-month, so s i xp r o c e s s o r s and c r o s s ,p r o d u c tt a k e s 1/2 * 6 = 3 man-months
.
TASK 11-5
Title:
Mu1ti p r o c e s s o r E x e c u t i v e R o u t i n e , I n c l u d i ( R e c o n f i g u r a t i o n ) R o u t i n e s , Val i d a t i o n
ng E r r o r Management
Objective: 1. To v a l i d a t et h a t h ee x e c u t i v er o u t i n e s( m u l t i p r o c e s s o rs y s t e me x e c u t i v e and s i n g l ep r o c e s s o re x e c u t i v e )r e s p o n d as s p e c i f i e d .
2.
To s e a r c hf o rd e s i g ne r r o r s( o rl a c ko fs p e c i f i c a t i o n )i ne x e c u t i v e routines.
Procedure: 1. T r e a te x e c u t i v er o u t i n e s Check r e s p o n s et o : a.
as b l a c kb o x e sw i t ho n l yi n p u t s
and o u t p u t s .
expected data,
41
b.
’
2.
c.
out-of-bound data, and inconsistentdata.
Make sureeverypaththroughindividualroutines i s exercised a tl e a s t once,includingerrorreturn paths. Validateconsistency of response by mu1 t i pl e experiments.
3. Check
executiveroutinesinteraction.
a.
feed a string of executivesubroutines - expected d a t a - out-of-bound d a t a - inconsistent d a t a
with
4.
Check thesideeffects of executivesubroutines ferentprocessors a t t h e same time.
5.
Check the schedul i n g and taskassignment.
6.
Determine (measure)routineresponsetime.
7.
Check system errorreturn-reportingpaths hierarchy).
when running i n d i f -
( i .e. , t h r o u g h system
Facilities: 1. Hardware - subject mu1 t i processor - t e s t monitor computer 2.
Software
-
software development environment on-1 i ne debugger experimental resultsrecorder and assessmentsoftware
Personnel : Chief Experimenter SoftwareTechnician Level of Effort: Same as Task 11-3except t h a t since the executive runs on different processors (assume 6 processors) and the system executiveis added,then we need 6*3man-months = 18 man-months. Priority:
High TASK 11-6
Title:Application Base1 i ne
Program Val idation (Design Errors) and Performance
Objective: 1. To verify t h a t applicationsoftwareresponsesareasspecified. 2. 42
Search fordesignerrors.
3.
Measuresystemresponseparameters(timeconstants)withtimevarying inputs.
Procedure: 1. Treatsystemas response t o a. b. c. d. e.
a b l a c k box w i t ho n l yi n p u t s
expected data, o u t - o f - b o u n dds a t a , i n c o n s i s t e ndta t a , random d a t aa tb o u n d a r i e s , i.e., o nt h e edge o f c o n t r o l , where flighttransitions(ortransitionstoothersoftware), sequences odf a t a , - expected - out-of-bounds - in c o n s i s t e n t - random.
2.
Make s u r ee v e r yp a t ht h r o u g hs o f t w a r e
3.
Check s y s t e me r r o rr e t u r n
i s exercised(wholesystem).
and r e p o r t i n g .
Facilities: 1. Hardware
-
t o t a l systemunder
Software
-
same as Task 11-3 e x e c u t i v ee r r o rr e p o r t i n g
2.
and outputs. Check
test
Personnel : C h i e fE x p e r i m e n t e r S o f t w a r eT e c h n i c i a n L e v e lo fE f f o r t : A p p r o x i m a t e l y 4 t i m e s Task11-3,assuming 11-13 i s 4 t i m e s morecomplexthanexecutive. 1 man-year P r i o r itl:
a p p l i c a t i o nd e s c r i b e di n
Task
High TASK 11-7 " -
Title:
Si m u l a t i o no fI n a c c e s s
ib l e P h y s i c a l F a i 1u r e s
Objectives: 1. To e n h a n c eu n d e r s t a n d i n go ft h er e l a t i o n s h i pb e t w e e np h y s i c a lf a u l t s and t h e i r e r r o r m a n i f e s t a t i o n s .
"
2.
TO p r o v i d e a databasewhich may beused c a lf a u l ti n j e c t i o ne x p e r i m e n t s .
t os u p p l e m e n t / s u p p o r tp h y s i -
43
Procedure: I n a c c e s s i bel
-+
A c c e s s i bel S i n g l e Processor-
t-
Mu1 t i p r o c e s s o r
jDevice~-~Gate-~~-~~-~~PWS'-Interface User
*uni processor and/or mu1 t i p r o c e s s o r R-T ( R e g i s t e r - T r a n s f e r ) ISP ( I n s t r u c t i o nS e tP r o c e s s o r ) PMS (Processor Memory S w i t c h )
1.
D e v e l o p / o b t a i nr e q u i r e ds i m u l a t i o ns o f t w a r ew i t hf a u l t - i n j e c t i o n b i l it y a t each o f above 1 e v e l s.
2.
Provideinterfacesbetweenthepackages be simulated at different 1evel s.
3.
S i m u la t ef a u l . t - f r e es y s t e m a t v a r y i n g( a p p r o p r i a t e )l e v e l s i t y t o Val i d a t e t h e s i m u l a t i o n s o f t w a r e .
4.
S i m u l a t es y s t e mw i t hi n j e c t e df a u l t s .
5.
O b s e r v er e l a t i o n s h i pb e t w e e ni n j e c t e df a u l t s ,i n p u td a t a m a n i f e s t a t i o nf r o mS t e p 4.
6.
C h a r a c t e r i z er e l a t i o n s h i p sb e t w e e ni n j e c t e df a u l t s and t h e i rm a n i f e s t a t i o n and a t t e m p t t o a b s t r a c t i n f o r m a t i o n i n t o e q u i v a l e n c e c l a s s e s .
7.
Repeatabove
Facilities: 1. Hardware
2.
Software
-
-
44
systemcan of complex-
and f a u l t
f o r mu1 t i p 1 e processorsystem.
1 o r more computers(dependentonwhethersimulationsare QM-1 p l u s DEC-10) s p e c i a l l yd e v e l o p e d ) - (e.g.,Nanodata s i m u l a t i o ns o f t w a r e descriptionoftarget machine a t a p p r o p r i a t e l e v e l d i a g n o s t i cs o f t w a r ef o rt a r g e tm a c h i n et ov a l i d a t et h e simul ation software - simulationexecutivewithevaluationsoftware - m o n i t o r and a p p l i c a t i o n s o f t w a r e o f t a r g e t m a c h i n e
-
Personnel: 4-8 man-years 1 Engineer, 3 Programmers P r i o r i tHyi:g h
so t h a tp a r t so f
capa-
-
( 1 - 2y e a r s ) *
*NOTE:
Once requiredsimulationsoftwareexists,additionalexperimentswill require much l e s s time.
TASK 11-8 Title:
Physical Fault Insertion: Single Processor Manifestation Understanding and Preliminary Characterization Histograms
Objective: 1. Establishfaultclasses(e.q.,manifestation number) t o cut down . complexity of fault injection a t higher system leveis. "
t
2.
Generate "representative" system levelhistogram tion and reconfigurationtimes.
P roced u re : 1. Physicallyinjectfaults
"sol i d " l l i ntermi ttentll "transient"
on a singleprocessor
of detection,isola-
implementat on :
power
2.
Use diagnostics t o see what p o r t i o n of the machine ( a s def ned by the diagnostic)does n o t work.
3.
Map physical f a u l t s i n t o "memory" o r higherlevelmanifestation wherever possible.
4.
Automaticallylog sor.
each experiment and i t sr e s u l t sf o rs i n g l e
proces-
Facilities: 1. Hardware - m o n i t o r i n g computer - mag tape f o r records - high-speed d a t a logger (if desperate) - t e s t j i g for insertingphysicalfaults 2. Y
Software - diagnostics - instruction t o executivesoftware t h a t processor i s a v a i l ab1 e - broken diagnosticanalyzerplusstate dump - modify executive ( i f necessary) t o report errors t o moni t o r computer - some support from executivetocoordinatefaultinjection with system s t a t e
Personnel-: 1-2 Engineers - 3 s h i f t s of technicians 10000/50/day = 200 days ( M 1 year) 4-5 man-years worst case-2 timeslonger
45
P r i o r i tHy i: g h TASK 11-9 T i t l e :P h y s i c a lF a u l tI n s e r t i o n
(Mu1t i p r o c e s s o r )
Objective: E s t a b l i s hf a u l tc l a s s e s (map p h y s i c a lf a u l t si n t om a n i f e s t a t i o n s ) .T h i s wil r e d u c et h e number o f f a u l t i n j e c t i o n s r e q u i r e d a t h i g h e r s y s t e ml e v e l s . Procedure: 1. Use t h e memory m a n i f e s t a t i o no fp h y s i c a lf a u l t sc l a s s i f i e di n 11-8as a b a s i s f o r i n j e c t i n g f a u l t s a t t h e m u l t i p r o c e s s o r l e v e l . I n s e r tp h y s i c a lf a u l t sf o ro t h e rc a s e si ne a c ho ft h es i n g l ep r o c e s s o r s ( i . e . ,u n c l a s s i f i e df a u l t s ) .
2.
I n s e r tp h y s i c a lf a u l t s a t t h ei n t e r c o n n e c t i o n betweeneachtwoprocessors pins "so1 i d " gate manifestation ~ ~ ~ n ~ ~ ~ ~ e laintermittentl1 $ "transient" power
3.
Map t h em a n i f e s t a t i o no ft h ef a u l ti n s e r t i o n i n s t e p s 1. and 2. i n t o t h e l'memory'l m a n i f e s t a t i o n whenever p o s s i b l e .
4.
A u t o m a t i c a l l yl o ge a c he x p e r i m e n t t h eo t h e rp r o c e s s o r sa r en o ta f f e c t e d
5.
Repeat t h ee x p e r i m e n t a small number o ft i m e s( e h i s t o g r a m sf o r : - f a u l tp r o p a g a t i o nt i m e - d e t e c t i o nt i m e - is01 ation time - r e c o n f i g u r a t i o nt i m e .
Facilities: 1. Hardware 2.
Software
-
and i t sr e s u l t s( v e r i f yt h a t b yt h ef a u l tp r o p a g a t i o n ) . .g.,
Performance Confirmation
To develop a s t a t e model for re1 i ai tbyi l analysis can which di s t i ngui sh between "good" s t a t e and "failed"state.
(*>
Performance Analysis
To develop a structural model of performance capabi 1 i t y
Task T i t l e
(Murray)
1-8 (Hopki ns)
1-9 (*> (Me1 1 i ar-Smi t h )
Executive Imp1 ementati on Proof
1-10 (*> (Me11 i ar-Smi t h )
Appl i cati on Requi rements Anal ysi s
(* 1 1-11 (Me11 i ar-Smi t h )
Application Program Proof Develop methods
(*)
Indicatestask
not rated by Working Group 11.
Task Objective
.
To establish a t AIRLAB the capability to the verify formally implementation of the executive (and associated programs) a g a i nst thei r specification. To develop methods forformallyverifying the specifications of theapplication tasks agai nst the underlyi ng aerodynamic and structural requirements. verifying for correctthe ness of appl i cati on programs by mathematical analysis.
Table 3.2Proposed Task No. 11-1 functions. 11-2
WG-I I Rating
V a l i d a t i o n Tasks Summary
-
Category 11:
Task T i t l e
F a u l tP r o c e s s i n gV e r i f i c a t i o n Task O b j e c t i v e
8.2 basic
I n i t i a1 System Check-Out (diagnostic)
8 .O
P r o g r a m e r ' s Manual V a l i d a t i o n
Vsetihrnpieagfryo tl ec e spseor rf o r m s
1- To i d e n t idf ey s i ge nr r o r s . To ensure the machineperformsfunctionsaccording to speci f ic a t ions in p r o g r a m e r ' s manual 2- To i d e n t i f y and f u l l y c h a r a c t e r i z e i n completely described machi ne f e a t u r e s .
.
11-3
8.3
S i ngl e Processor Executive Val i d a t i o n .
1- To determine i f executiveroutinesrespond as speci f i ed. 2- To i d e n t i f y d e s i g n e r r o r s and i n c o m p l e t e l y specifiedexecutiveroutines.
11-4
8.7
Multiprocessor InterconnecVal ti on id processors. a t ion
1- To Val i d a ttehien t e r c o n n e c t i o n s
between
2- To Val i d a t e mu1t i processor functional i t y .
a3
w
11-5
8.7
Mu1t i processorExecutive and E r r o r Management Val id a t ion
11-6
7.6
A p p l i c a t i o n ProgramVal
1- To determi ne i f mu1t i processor and s i n g l e processor executives respond as speci f i ed. 2- To i d e n t i f y d e s i g n e r r o r s o r i n c o m p l e t e specification. idation
1- To v e r i f yt h a ta p p l i c a t i o ns o f t w a r e responds as s p e c i f i e d . 2- To i d e n t i f y d e s i g n e r r o r s . 3- Measuresystemparameters w i t ht i m ev a r y ing i n p u t s .
03
Table 3.2
P
Task No.
Proposed V a l i d a t i o n Tasks Summary - Category 11: ( c o n t inued)
W G II Rating
F a uPl tr o c e s s i nVge r i f i c a t i o n
Task O b j e c t i v e
Task T i t l e IS na i mc cuoelfastsi oi bnl e P h y s i c a l F a i 1u r e s
1- To e n h a n c eu n d e r s t a n d i n go fr e l a t i o n s h i p b e t w e e np h y s i c a lf a u l t s and t h e i r e r r o r manifestations. 2- To p r o v i d e a d a t a base t o supplement/ s u p p o r tp h y s i c a lf a u l ti n j e c t i o n experiments.
11-7
7.1
11-8
8 .O
P h y s i c a lF a u l tI n s e r t i o n : S i n g l e Processor Mani f e s t a t i o n U n d e r s t a n d i n g and P r e l i m i n a r yC h a r a c t e r i z a t i o n H i stograms
11-9
8.4
PI hnFysaseuirclt tiaol n : Mu1t i processor
11-10
5.7
E x eR c uotui vt ien e Response CharacterizationUnder S i n g l e P h y s i c a l F a i 1u r e Conditions
1- To c l a s s i f y e x e c u t i v e r o u t i n e r e s p o n s e s t o hardware f a i 1 ures 2- To a b s t r a c t f a i l u r e m a n i f e s t a t i o n s f o r higher system level s i n o r d e r t o r e d u c e number o f f a u l t i n s e r t i o n e x p e r i m e n t s .
11-11
8.2
Mu1ti processor Executive Faul t Hand1 ing Capabi 1 iti es
1- To c l a s s i f y mu1 ti processi ng response t o hardware f a i 1 ures. 2- To measure reconf ig u r a t i on schedul ing a1 g o r i thm time constant.
1- To estab 1 i s h f u n c t i o n a l f a u l t c l a s s e s t o reduce number o f f a u l t s i n j e c t e d a t h i g h e r system 1eve1 s. 2- Generate " r e p r e s e n t a t i vel'system l e v e l o f d e t e c t i o n and i s o l a t i o n h i s t orams g times. E s t a b l i s h f a u l t c lasses f o r mu1ti processo r s
.
.
Table 3.2
Proposed V a l i d a t i o n Tasks Summary - Category 11: (continued)
F a u l tP r o c e s s i n gV e r i f i c a t i o n
WG-I I Task
No.
11-12
Rating Mu1 7.3
Task O b j e c t i v e
Task T i t l e
ti processor Appl i c a t i o n Program F a u l t Handl ing
1- To c l a s sai fpyp l i c a t isoonf t w ar rees p o n s e t o hardware f a i 1ures. 2- To measuresystemresponseparameters f a i 1u r e s i t u a t i o m .
for
11-13
7.1
Mu1t i processor Mu1ti appl ica1- To characterizesystemscheduler i n f a i l u r e t i o n Program F a u l t Handl si ni tgu a t i o n s . 2- To c l a s s i f y a p p l i c a t i o n programs and system s o f t w a r e t o h a r d w a r e f a i 1ures.
11-14
(* 1
Software Rel, ia b i 1 it y Research
1- To i d e n t i f y s e v e r i t y o f m a n i f e s t a t i o n s s o f t w a r e f a i 1ures. 2- To measure s o f t w a r e f a i l u r e s ofstress.
(Hecht)
Measurement o S f ynchronization f u cnlcotci ok nsob i.fnegh a v itohreC l o c k s o f
1- To make
of
as a f u n c t i o n
an experimentaldeterminationof
of 2- To developinstrumentationcapable d i s t i n g u i sng h i r a r e c l o c k f a i 1 ures from instrumentationfailures.
11-16
(*I
Fai 1u r e S e v e r i t y A n a l y s i s
To o b t a i n o r d e v e l o p a c a t e g o r i z a t i o n o f f a i l u r e syndromes a g a i n s t t h e s e v e r i t y o f t h e i r consequences.
A p p l i c a t i o n Program V a l i d a t i o n and Performance Base1 i n e f o r a System
To v a l i d a t e a p p l i c a t i o n s o f t w a r e f o r t h e system. (Note: Task 11-6 addresses computer-onlyvalidation.)
(Me1 1 i a r - S m i t h )
11-17 (Moses)
(*I
: a3 cn
Table 3.2
Task No.
Proposed V a l i d a t i o n Tasks Summary - Category 11: (concl uded)
W G II Rating
11-18 (*1 t i oanpa pf fle c t e dt o(B. V esrSori fom fitrcw isat hat i)roen
F a uPl tr o c e s s i nVge r i f i c a t i o n
Task T i t l e
Task O b j e c t i v e
Software t-Containment Faul
To s t r seyssst e amcb'oistlnoi t ya i n programs.
11-19 (*) (Me1 1 iar-Smi t ht o)
L o gAi cnaD al leo yssf iigsn Reduce STiozedfeset s i gt lnha,er g e necessary Sets Data
To r el odbguayiccneatahll eo y sf i s
To d e f i neev ,a l u a t e and employ benchmark a p pi lni tcseaogftfoirtoarwnt aesrde a v i o n i c s and f l i g h t c o n t r o l .
11-20 (Mu1 t i co an r e )
(*I
D e f i n i t i o n and Use o f Appl i c a Benchmark Programs
11-21 (Abraham) System
(*I
V e r i f i c a t ion Oo pf e r a t i o n a l D i a g n o s t i c Coverage
(*) I n d i c a t e st a s kn o tr a t e db yW o r k i n g
Group 11.
number toef s t cases f o r coverage veri f ic a t ion.
To v e r itfhyfaat u lat sdr ee t e c t ewdi t h s. a g n po esd ir i o d i c
Table 3.3 Task No.
Proposed V a l i d a t i o n Task Summary
-
Category 111:
W G II RTaiat sli enkg.
F a u l tP r o c e s s i n gC h a r a c t e r i z a t i o n Task O b j e c t i v e
I11-1
Fa17.3
I11-2
6.7
Memory A1 t e r a t ion
S e l e c t i v e memorya1 t e r a t i o n c a n beused t o c h a r a c t e r i z et h er o b u s t n e s so f a faultt o l e r a n t system when c o n f r o n t e d w i t h a)software and d e s i g ne r r o r s ,b )l a t e n t f a u l t s ,c )c o r r e l a t e df a u l t s , and d )c o n f u s i o n by divergence.
111-3
8 .O
C o n f i g u r a t i o nC o n t r o l Mani p u l a t i o n
To f i n d t h e 1 i m i t s o f t h e system when r e s p o n s e s a r e a r b i t r a r i l y removed.
I11-4
6.6
Cold and Warm S t a r t Manipulation
To e x p l o r e t h e b e h a v i o r o f t h e i t s" i n s t i n c t i v e "i n i t i a ls t a r t b i 1 it i es are stressed.
111-5
7.9
Common Mode Tests
Characterize the responses of interconn e c t i o n buses and i n p u t / o u t p u t buses t o commonmode n o i s e and signalsapproaching and/orexceeding t h e i r boundaries i n eitherthetimeoramplitude domain.
I11-6
7.5
ClockModificationTests
se I nIpnuf ot r m a t i o n
Noise and Margin
To determi ne t h e f a u l t - t o l e r a n t c o m p u t e r system'sresponse t o i n p u t i n f o r m a t i o n which i s p l a u s i b l e (e.g., passesbound c h e c k s )b u ti n c o r r e c t .
system when capa-
To determinetheclockfrequency, shapeand s y n c h r o n i z a t i o np o i n t sa tw h i c ht h e system f a i 1 s.
co aY
Table 3.3
Task No.
I 11-7
Proposed V a l i d a t i o n Task Summary
W G I I Rating
Mu1 7.6
-
Category 111:
Task T i t l e
F a uP l t r o c e s s i nC gharacterization
Task O b j e c t i v e
t i pF laeIunl jte c t i o n s
To c h a r a c t e r"tibhzreee a kpi nogionf t " f a u l t - t o 1 e r a n tsystem under mu1ti p l e faults.
a
I 11-8 (Masson)
(*I
MultipleS , e q u e n t i a Il n j e c tionofFaults a t Potentially Vu1 nerabl e Times
To d e t e r m i n e t h e v u l n e r a b i l i t y of faultt o l e r a n t systems t os e q u e n t i a lf a u l t sa t c r i t i c a lt i m e s , such as duringreconf ig u r a t i on.
111-9 (Myers)'
(*>
S u r p r i se Assessment
To a v o i d some c a t a s t r o p h i e bs ey a r l y a p p r e c i a t i o no fs u r p r i s e s and t o promote new approachesby a c t i v e ' ' h a r v e s t i n g " o f s u r p r i ses.
Methodology
I 11-10 (*> (Li ndl er)
E v a l u a t i ooSnft r a t e gf oy r Hand1 ing T r a n s i e natn d / o r I n t e r m i t t e n tF a u l t s
(* 1
ToleranceofSensitivityto HardwareParameter V a r i a t ion
I 11-11 (Seacord)
1- V e r i fpyr e d i c t e d system response. 2- Characterize system performance.
Determinethechange i n performanceor (more l i k e l y ) f a u l t r e a c t i o n t h a t w i l accompany d i f f e r e n c e s i n hardware c h a r a c t e r i s t i c s due t o e i t h e r p i e c e - p a r t to1 erance or environment.
Table 3.3
Task No.
I 11-12
Proposed V a l i d a t i o n Task Summary - Category 111: (concluded)
WG- I I Rating
(*I
er) ndl (Li
F a u l tP r o c e s s i n gC h a r a c t e r i z a t i o n
Task T i t l e S e n s i t i v iot fy ePhase r a nt ot 1L a g
Task O b j e c t i v e System t o
phase 1ag t h e system w i l (i.e., gwi vhphase ei cnah t l a g f u n c t i o ni sn o tp e r f o r m e ds a t i s f a c t o r il y )
Determi maximum ne
.
I 11-13 cracked
(*>
E f f eM cotafss s iFv ae i l u r e s
To a n a l yezfef em cotafss s ifvaei l u r e s t r i k e s , l i g h t n i nPC g boards, etc.
(De Feo)
(*) I n d i c a t e s t a s k s n o t r a t e d b y N o r k i n g
Group 11.
1i k e
a
Tab1 e 3.4
0
Task No.
WG-I I Rati ng
(*I
Proposed Val idati on Task Summary - Category IV: Task T i t l e
TheoreticalLimits To1 erance
Other Tasks Task Objective
of F a u l t
Demonstrate theoretical limits t o what f a u l t s can be tolerated by a faultto1 erant computer.
-
Proving Timing Correctness
Provide programmi ng and checki ng di sci plines t o guaranteecorrecttiming of program execution.
Trade-off o f Diagnostic/ Mai ntenance/Fai 1 ure LowLevel Is01a t i o n Versus Re1 i abi 1 i t y
To provide a generalidea of system re1 i abi 1 i t y degradation resulting from increased visibility of d e t a i l s of f ai 1 ures
IV-4 (Hopki ns)
Instrumentation for Timing Tests
Developmethodology t o observetiming re1 a t i onshi ps on a non-i nterferi ng basi s.
IV-5 (Masson)
Sensitivity Analysi s/Measurea System t o Faults mentof
To determine if there are measures of f a u l t to manifestation t h a t canbeused describe how sensitive a system i s t o
I V-3 (Li ndl e r )
.
faults. IV-6 (Masson)
Fault-TestInterrelationship Characteri z a t i on
To understand compl ex i nterrel ationshi ps among f a u l t s and t e s t s . To determine what faults tests cover/detect and what faultsinvalidatetests.
Table 3.4
Task No.
IV-7 (Masson).
Proposed V a l i d a t i o n Task Summary - Category I V : Other Tasks (concl uded) Task T i t l e
Task O b j e c t i v e
Composite Val i d a t i o n
To i d e n t i f y and assess v a l i d a t i o n t e c h niqueswhich when c o l l e c t e d t o g e t h e r p r o v i d es t r o n g( i r r e f u t a b l e )d a t a / evidence regardi ng re1 iabi 1it y
.
IV-8 (Masson)
ErrorPatternClassification of Generic Anal og/Physi cal F a u l t Mechanisms
To determine what various subsystems a c t u a l l y do when disturbed. To determine i f c e r t a i n modes o f f a u l t y o p e r a t i o n a r e consistent.
IV-9 (Masson)
Establ is h i ng Generic Fault C1 asses
To d e t e r m i n e t h e l e v e l t o which we c o l l apse f a u l t so p e r a t i o n . To d e s c r i b et h i s f a u l t so p e r a t i o n .
IV-10 (Myers)
Instrumenti ng Faul t-Tolerant Computers
To d e t e r m i n es p e c i f i c a t i o n s and i n s t r u mentationrequirements so t h a t f a u l t t o l e r a n t computerscanbe effectively monitored. I
IV-11 (Masson)
E r r o r P a t t e r n s on Buses as a H i gh-Level Fault Manifestation
( * ) IndicatestasknotratedbyWorking
Group 11.
To determi ne h i gh-1 eve1 f a u l t mani f e s t a t i o n s , suchas e r r o r p a t t e r n s on buses.
All steps are iterated until frozen. VALIDATION ACTIVITIES
Goals and Requirements
Functional Specifications
Does functional specification meet the goals and requirements?
Design Specifications
Does the design meet the functional specifications?
I -
+Implementation Prototype System
Production System
Fielded System
I
Does the implementation meet the design specifications?
Does the production system agree with the prototype?
Does the fielded system's behavior agree with the production system behavior?
Remainder of Life Cycle
Figure 1.1.- Digital system l i f e c y c l e .
92
Goals and
I
Design Specifications Engineering
I
BenchTest, Simulation, Model Analysis
Implementation
System
Fielded System
"Hot-Bench" Tests, Ground Tests in the Aircraft, Experimental Flight Tests
,
-
*
Certification Procedures
Well-Defined Maintenance, Trouble Reporting, and Logging Procedures
J Remainder of Life Cycle
Figure 1 . 2 . - Digital system l i f e c y c l e a p p l i e d t o a i r c r a f t
systems.
93
COST OF VALIDATION
-
”””_
I J
I I I 1
1 1
-
SYSTEM RELIABILITY
ULTRA-HIGH RELIABILITY
Figure 2.1.- Cost o f validation as a function o f system reliability assuming a conventional lifetesting approach.
94
x
"Good" State
"Failure" State
Figure 2.2.- A two-state Markovmodel o f re1 i a b i l i ty.
Figure 2.3.- Markovmodel o f a two-unitstandby sparing system.
95
ft e e.
e..
fd
e..
fh
e e.
fh
fh
ft e e.
*e.’”
Transitions: ft fd detection fh it
--- fault occurrence --- fault fault handling “ -
& 2.22
e..
v
double fault
Figure 2.4.- A Markov reliability model o f S I F T system.
96
VALIOATION TECHNIQUES
LOGICAL PROOFS
RELIABILITY PROOF THAT MOOEL IS A PROPER ABSTRACTION OF THE SYSTEM I-2*
I CORRECTNESS OF OESIGN (HAROWARE/SOFTWARE) 1-2
EXPERIMENTAL
ANALYTICAL TESTING (SIMULATION/EMULATION/PHYSICAL)
MARKOV MOOEL 1-1
SCHEOULER CORRECTNESS 1-2
VALIOATION OF FAULTOCCURRENCE BEHAVIOR BEHAVIOR Lifetesting of Subsystems kg., processors, memories)
ALTERNATIVE MOOELS Research
VALIDATION OF FAULTHANOLING 11-7, 11-13 1-3
EXPLORATORY TESTING Ill
*Notation refers t o validation tasks summarized in Section 3.0 of this report.
Figure 2.5. - The proposed Val idation taxonomy:
VALIDATION OF FAULTFREE BEHAVIOR 11-1, 11-6 1-2
Tree form.
I I
LIFETEST ON SUBSYSTEMS, EXPERIENCE, STANDARDS OF USE
I
I FAULT OCCURRENCE BEHAVIOR
SYSTEM DESCRIPTION
I r
FAULT HANOLING
L
1
r
1.2
I
EXPLORATORY TESTING 111 I
L
RELIABILITY MODEL
1-1'
I ' 1-1 MOOELSOLUTION
EXPERIMENTAL VERIFICATION 11-7 TO 11-13
II
RELIABILITY PREOICTION NO
71 ANALYSIS
J
PROOF OF STRUCTURE
1
I'
*Notation refers to validation tasks summarized in Section 3.0 of this report.
Figure 2.6.- Reliability validation procedure.
98
KEYTASKS
1234-
PREDICTIONS
JI.-* MODEL
MODEL
I
P-
FAULT CHARACTERIZATION
ASSUMPTIONS ON FAULT OCCURRENCE AND RECOVERY BEHAVIOR
1
I
Construct and Refine Reliability Model Validate Consistency of Model and Design of Computer Observe RecoveryBehavior PredictReliability
LOGICAL PROOF THAT COMPUTER IS CONSISTENT WITH THE
1-1
I
I
EXPERIMENTAL PROOF THAT COMPUTER IS CONSISTENT WITH
I
4
CONFIRM ASSUMPTIONS
OBSERVE RECOVERY OBSERVATION
I
BEHAVIOR
Figure 3.1.- General scheme for confirmation o f fault tolerant computer reliability.
I
1 I
Figure D . l . 100
.
.
~
.. . . - .
..
IMPORTANCE TOTAL
Working group I 1 assessment o f the preliminary validation tasks.
2. Government Accession No.
1. Report No.
3. Recipient's C a t a l o g No.
NASA CP-2130