The reliability of programming systems - Springer Link

0 downloads 0 Views 836KB Size Report
H. Gerstmann, H. Diel, and W. Witzel,. IBM Germany, Boeblingen. 1. ABSTRACT. The reliability of a programming system is not only determined by the number ...
The

Reliability of Programming Systems

H. Gerstmann, H. Diel, and W. Witzel, IBM Germany, Boeblingen

1.

ABSTRACT

The reliability of a programming system is not only determined b y the number of errors to be expected, but also by its behavioar in error situations. An error must be kept local to identify its origin and annul its effects at a n tolerable expense. This paper discusses a uniform approach to the limitation of error propagation, the identification of the process in error, and the provision for error rezovery.

2.

THE MODEL

The concepts of reliability are described for a model of a programming system which consists of three basic types 3f objects:

(I) procedures, languages

which m a y

be

nested

(2) a state space, represented within the procedQres (3) processes, operations.

which

are

higher

the variables

by

the

as in

units

of

level

declared

asynchronous

~he notions used are taken from reference [1]. The resources of the system, also called objects, ~re represented by variables. All variables constitute the variables state set R =

{xl, x2 ...,

xn}.

assignment of values to all the variables in the state variable set defines a state of the system. The set of possible states is the state space. With each variable xi a type is associated which defines the set Vi of values it may assume. In these terms the state space can be written as An

S =

Vl

x

V2 x

...x

Vn.

88 The set of

p~ocesses

{Plg is p a r t i a l l y

ordered

P2,

can

Pn}

by a p r e c e d e n c e P1 < Pk

which

...~

be i l l u s t r a t e d

relation

,

in the

form of a

diagram

< Pk

have

(figure

I). All p r o c e s s e s Pi~ sach that Pi b e f o r e Pk can be initiated.

must

completed

Each p r o c e s s P uses a s u b s e t Rp of the set R of resources. These o b j e c t s d e f i n e the s u b s p a c e of the s y s t e m state space in ~hich the a c t i o n s of the p r o z e s s take place. Resources utilized by mdre than one process are s h a r e d resources. Input r e s o u r c e s Rip to P are s h a r e d r e s o u r c e s set by other p r o c e s s e s which are r e f e r e n c e d by P, output r e s o u r c e s Nop t h o s e set by P and r e f e r e n c e d by other processes. The input state Si of p r o c e s s P is d e f i n e d by the state of each input r e s o u r c e at process initiation, c o r r e s p o n d i n g l y the s t a t e of the o u t p u t r e s o u r c e s at process t e r m i n a t i o n d e s c r i b e s its o u t p u t s t a t e So. {si} is a s s o c i a t e d With each process a set of input states for which a mappin~ Fp to ouput s t a t e s [So} is defined (figure 2). From a f u n c t i o n a l point of view Fp is a p a r t i a l function. No action is d e f i n e d in case P is i n i t i a t e d in a state S {Si}. The next s e c t i o n is devoted to this exceptional situation.

3.

EXCEPTION

HANDLING

It has long been r e c o g n i z e d by e n g i n e e r s that i n s t r u c t i o n s perform p a r t i a l functions. To cope with them, exceptions have been introduced, the ZERODI~IDE and 9VERFLOW c o n d i t i o n s are t y p i c a l examples. H i g h e r level l a n g u a g e s e i t h e r ignore this property or just s u p p o r t e x c e p t i o n s on the i n s t r u c t i o n level as in the case of PL/I. There is, however, no c o n s i s t e n t treatment of e x c e p t i o n s at the level of procedures. The a r g u m e n t that such a f e a t u r e is not n e e d e d goes as follows: E x c e p t i o n s at

89 the procedure level hardware exceptions.

either

can be programmed

or reduced

to

Although this is a true statement, it expresses a narrow attitude with respect to the purpose of a language. The semantic distinction between functional and exceptional actions should als3 be reflected in the syntax of the language. As indicated in figure 3 the functional action Fp expresses the function of the procedure as long as its arguments are in [Si}. If this is not the c~se the exzeptional action Ep maps the invalid state into an exception description. ro support this PL/I, extensions

property in a programming language of the following kind are required:

(I) The values a scalar constrained by appending

such as

variable may assume can be a range to its data attribute.

(2) The values of structured variables (including arrays) can be constrained by imposing relations between its subcomponents. (3) A built-in-function depending on wheter

RANGE which returns '|'B 3r *0'B the argument lies in its range 3r

not.

(~) A built-in-function ON_ERROR DESCR which returns error description in the form ~fJa structure:

as

I ERRDR_DESCR 2 E~ROR_TYPE 3 E~ROR_MAIN_TYPE 3 ERROR_SUB_TYPE 2 SrATEeENT_NO 2 STATEMENT_LABEL 2 AFFECTED_VARIABLES 3 VARIABLE| O @ O

Figure ~ shows the use of these language constructs to define exceptional actions. The language elements introduced

90 should {lot be c o n s i d e r e d a s a proposal to extend PL/Io Its puEpose is to indicate the direction in which exte,sions a~e needed to separate the functional part of a procedure from its e x c e p t i o n a l part. A c o n s i s t e n t solution casnot neglect type a t t r i b u t e s as provided in PASCAL [2,3].

81 ~.

EBROR

ISOLATION

The c a p a b i l i t y to i s o l a t e e r r o r s in a s y s t e m does h o w e v e r not only d e p e n d on the r e a l i z a t i o n of the p a r t i a l f u n c t i o n concept. Additional system properties are required to a t t r i b u t e an error u n e q u i v o c a l l y to a certain process: At each p o i n t of t i m e only one p r o c e s s may update a shared resource. To this end the use of s h a r e d r e s o u r c e s

must be restricted.

Two d i f f e r e n t c a s e s are to be c o n s i d e r e d : Case

I

The r e s o u r c e is s h a r e d by p r o c e s s e s which lie on a path t h r o u g h the s y s t e m (figure 5). S i n c e PI < P2 it is a l w a y s p o s s i b l e to a l l o c a t e the r e s o u r c e R in such a way that deallocate

(R,PI)

< allocate

D u r i n g the e x e c u t i o n of p r o c e s s e s a l l o c a t e d to at most one process.

Case

(B,P2). at

any point of time R is

2

The r e s o u r c e path through

is s h a r e d by p r o c e s s e s the system {Figure 6|.

which do not lie

on a

In this s i t u a t i o n the direct a c c e s s to the r e s o u r c e is p r e v e n t e d by e s t a b l i s h i n g an i n t e r f a c e b e t w e e n the p r o c e s s e s and the resource. A s s u m i n g that the p r o c e s s e s e i t h e r want to read or to u p d a t e (read and write) the resource, they have to i n i t i a t e separate atomic processes READ (R, Pi) or UPD (E, Pj) which are associated with R and obey the c o n s t r a i n t s i n d i c a t e d in f i g u r e 7. An empty c i r c l e r e p r e s e n t s any other p r o c e s s i n c l u d i n g read and update. D e p e n d e n t on the i n t e n d e d use of the r e s o u r c e R the p r o c e s s e s Pi are d e c o m p o s e d into s u b p r o c e s s e s . A c c o r d i n g to above c o n s t r a i n t s , d i s r e g a r d i n g symmetry, this r e s u l t s in one of the t h r e e t y p e s of d i a g r a m s shown in f i g u r e 8. By m e a n s of this d e v i c e c a s e 2 is reduced to c a s e 1. The p r o c e s s a d m i n i s t e r i n g the r e s o u r c e H and the a s s o c i a t e d set of p r o c e s s e s {READ(R, Pi)} U [UPD(R, Pj}} in a c c o r d a n c e with a b o v e c o n s t r a i n t s is c a l l e d a r e s o u r c e manager. T h e r e is an i n t e r e s t i n g p a r a l l e l i n t r o d u c e d by B r i n c h S a n s e n [4].

to the c o n c e p t

of m o n i t o r s

92 Due to the use of r e s o u r c e m a n a g e r s a unique path of serial p r o c e s s e s can De a s s o c i a t e d with the state c h a n g e s of each s h a r e d r e s o u r c e (figure 9). For the purpose of e~ror i s o l a t i o n each process including those c o n t r o l l e d by r e s o u r c e m a n a g e r s is r e q u e s t e d to check its input states.

W h e n e v e r an error is d e t e c t e d by a p r o c e s s Pk it must have been caused by some process Pi < Pk on the path for the resource concerned. In any case, Pk will accuse its i m m e d i a t e p r e d e c e s s o r Pk-1 of h a w i n g made an error based on the f o l l o w i n g c o n s i d e r a t i o n : Either Pk-1 c a u s e d error in a c c e p t i n g

the e r r o r during its e x e c u t i o n an e r r o n e o u s input state.

or made an

As i n d i c a t e d in figure 10, going the path b a c k w a r d s in this way, the p r o c e s s o r i g i n a t i n g the error can be identified. lhe p r o c e s s Pk may wrongly a c c u s e Pk-1 to have supplied faulty input. To s e t t l e this case, it is n e c e s s a r y that obligatory specifications d e t a i l i n g the i n t e r f a c e s between p r o c e s s e s have been e s t a b l i s h e d b e f o r e the i m p l e m e n t a t i o n . The language features described for input c h e c k i n g were introduced in the previous section. Their use is ngw described. C o n s t r a i n t s i m p o s e d on the state space are e i t h e r p r o c e s s or s y s t e m specific. P r o c e s s specific constraints define the a d m i s s i b l e input s t a t e s of a process. Formal parameters are to be s p e c i f i e d with ranges. Dependencies between global variables and/or formal parameters are c h e c k e d as indicated. System specific constraints are properties of shared E e s u u r c e s r e p r e s e n t e d by global variables. To m a i n t a i n their i n t e g r i t y r a n g e s are appended. Since the s e q u e n c e in which a shared resource will be use~ by the processes is u n d e t e r m i n e d , the ranges must e x p r e s s i n v a r i a n t properties, i.e., the c o n d i t i o n s i m p o s e d on its state before and after p r o c e s s e x e c u t i o n must be the same (figure 11). The e m b e d d i n g of o n - u n i t s in s u b j e c t of the next section.

process

hierarchies

is

the

93

5.

EERO~

~ECOVER¥

The concepts developed for error s u f f i c i e n t for the p u r p o s e of recovery. the f o l l o w i n g e x a m p l e : Process Case

PI uses r e s o u r c e

R to p r o v i d e

isolation are not T h i s can be s h o w n by

i n p u t to p r o c e s s

P2.

1

The r e s o u r c e R is used by the s e r i a l p r o c e s s e s PI and P2 (figure 12). After providing i n p u t to P2 the p r o c e s s PI terminates. P2 detects an input error. Recovery must c o m p r i s e p r o c e s s PI which is no l o n g e r in e x i s t e n c e .

Case 2 The resource R is used by the p a r a l l e l p r o c e s s e s Pl and P2 (figure 13). After providing input to P2, process PI continues to e x i s t and discovers an error affecting R. P r o c e s s PI has to r e t u r n to a p r e v i o u s state and r e c a l l the data s u p p l i e d to P2. T h e r e f o r e r e c o v e r y must also include p r o c e s s P2. Although in this situation the resource manager is r e s p o n s i b l e for the i n p u t c h e c k and errors violating the constraints imposed on R are detected before P2 is initiated, the subprocess P12 may consider the values supplied to R as i n c o n s i s t e n t according to t h e internal s e m a n t i c s of the program. Thss the c o n c e p t of input v a l i d a t i o n as d e s c r i b e d for the p u r p o s e of error i s o l a t i o n must be e x t e n d e d for the p u r p o s e of r e c o v e r y . T w o s t r a t e g i e s which s u p p l e m e n t e a c h o t h e r are discussed - a discipline with respect ( c o m m m i t m e a t discipline) - a hierarchical recovery.

structure

to

data

communications

of p r o c e s s e s with

respect

to

The i n t e n t of the c o m m i t m e n t d i s c i p l i n e is to e n f o r c e that no data is c o m m i t t e d outside a process uoless e i t h e r it is e n s u r e d that t h e r e will n e v e r be a need to r e c a l l the d a t a or there is a m e c h a n i s m a v a i l a b l e to do it, As d e s c r i b e d in s e c t i o n 2, a p r o c e s s makes use of input and output resources. For the p u r p o s e of commitment v a l u e s s u b m i t t e d to o u t p u t r e s o u r c e s are c l a s s i f i e d as:

94 uncommitted

committed

precommitt~d-

- Data the r e c e i v i n g p r o c e s s c a n n o t rely on. It will not be rezalled in case the s e n d i n g p ~ o c e s s fails. -

Data t~e sending consistent to the C o n s e q u e n t l y it will the s e n d i n g p r o c e s s .

process commits as receiving process. not be r e z a l l e d by

Data that can exist in one of three states: 'OPEN', ' R E C A L L E D ' or ' C D M M I T T E D ' . The i n i t i a l s t a t e is 'OPEN'. At recall ~r commitment the state is changed to IECALLED' or ' C O M ~ I T T E D ' .

This distinction is i n t r o d u c e d in r e f e r e n c e applyi~;g d i f f e r e n t t e r m i n o l o g y and s e m a n t i c s .

[5],

however,

Output is c o m m i t t e d by the supplying process when it is considered consistent. The term 'zonsistent' remains u n d e f i n e d . It is up to the i n d i v i d u a l p r o c e s s to e s t a b l i s h appropriate criteria. They should, however, at least g u a r a n t e e valid output. The c o m m i t m e n t i m p l i e s for c o m m i t t e d data the r e l e a s e to o t h e r p r o c e s s e s and for p r e c o m m i t t e d data a s t a t e t r a n s i t i o n from 'OPEN' to 'COmMiTTED'. The data must be committed b e i o r e the highest level ~rocess terminates to ~ h i c h the o u t p u t v a r i a b l e s are n o n - l o c a l . C o m m i t t e d and u n c o m m i t t e d data do not i ~ t r o d u c e d e p e n d e n c i e s b e t w e e n p r o c e s s e s w h i c h have to be ~ o n s i d e r e d for r e c o v e r y . The sitsatioll is d i f f e r e n t for precommitted data. This notion allows to e x t e n d the szope of in-process recovery, which is based on the tact that the p r o c e s s to be r e c o v e r e d is s t i l l in e x i s t e n c e . F i g u r e 14 s h o w s the s t a t e d i a g r a m for p r e c o m m i t t e d data. The s e n d i n g p r o c e s s sets the data in the i n i t i a l s t a t e 'OPEN'. The a s s o c i a t e d r e s o u r c e manager g u a r a n t e e s that the d a t a are not c h a n g e d by any r e c e i v i n g p r o c e s s as long as they are in the s t a t e 'OPEN'. Only the s e n d i n g p r o c e s s is e n t i t l e d to c h a n g e the s t a t e to ' C O M M I T E D ' or 'RECALLED'. The r e c e i v i n g p r o c e s s e s are not p e r m i t t e d to terminate before the s t a t e ' C O M M I T T E D ' is entered. In a d d i t i o n they must not commit o u t p u t which d e p e n d s on i n p u t not yet c o m m i t t e d . This m e c h a n i s m e n s u r e s that ~ii dependel~t p a r a l l e l p r o c e s s e s are s t i l l in e x i s t e n c e in c a s e data have to be recalled. It therefore a l l o w s to apply i n - p r o c e s s r e c o v e r y to several p r o c e s s e s . For back out e a c h of them :an be reset to the

95 initial state which w~s kept ~t process initiation. Figure 15 illustrates the commitment discipline. Process PI precommits data to the resource R and sets its state to 'OPEN'. The data can be read but not updated until the end of P21. Before P2 is permitted to update R and/or terminate its execution it must wait for the commitment of R by PI. The commitment discipline cannot be applied to serial processes. To guarantee the existence of a process that can perform the ~ecovery, the system should be designed as a hierarchy of processes. In cases where this hierarchy canngt be predefined measures for post-process recovery have to be introduced in the direction as described in reference [6]. Figure 16 shows an idealized system meeting above design constraints. The system is structured in three processes Pl, P2 and P3. Each process Pi =onsists of subprocesses Pij. Recovery situations affecting only parallel processes such as P22 and P23 or P32 and P33 can be handled by means of the the process commitment discipline. In any other situation detecting the error has to escalate it to the next level in the process hierarchy. To achieve this on-units providing for recovery must also be ordered hierar:hically. Figure 17 indicates a way how e~rors can be escalated to higher level on-units for recovery.

6.

DISCUSSION

The concepts of reliability described in the preceding sections require a system partitioned into modules. Following Parnas £7] a system is considered well structured in case the interfaces between modules contain little information, where interfaces are the assumptions modules make about each other. To minimize the information being transferred, interfaces must be raised to a higher level of abstraction. In this context the features discussed in the paper offer tools to enforce abstractions. As abstractions represent design decisions independent from the program £1ow, the features can only partially be provided automatically by a compiler. A compiler handles one external procedure at a time, whereas module interfaces comprise more than one external procedure. Also, not every external procedure declaration constitutes an abstract interface and an abstract interface may contain assumptions which cannot be expressed in terms of parameters. The en[orcement of interfaces requires additional effort at execution time. Sometimes performance reasons are pretended to reject an approach of this kind. There are at least two

96 reasons which show that the argament is not stringent. PASCAL [8] has demonstrated that dynamic range checking can be implemented efficiently. Ranges, as proposed here, are sore complex since they can be defined by any>relation. On the other hand, they need not be checked at any reference but just at the interface. This leads to the second reason: It is the designers responsibility to minimize the information passed across interfaces.

7.

SUMMARY

The preceding sections presented an attempt to handle errors systematically. Error isolation a n d ~ecovery were treated under one aspect~ Error isolation led to the realization of the p a r t i a l fanction concept and the provision of resource manager~. Error recovery in addition necessitated the i n t r o d u c t i o n Of a c o m m i t m e n t discipline in conjunction with a h i e r a r c h i c a l structure of processes.

REFERENCES

I.

J.J. Ho~ning and B. Eandell: Process Computing Surveyse Vol. 5, No I, March 1973

2.

N. Wirth: Informatica

.

The Programmin 9 I, 35-63 (1971)

N. Habermann: Language Pascai,

Language

Structuring,

Pascal,

Acta

3ritical Zomments on the Programming Acta Informatica 3, ~7-57 (1973)

P. Brinch Hansell: Concurrent Programming Computing Surveys, Vol. 5, No 4, December 1973 5~

Ch. T. Dawies, Jr.: Recovery Semantics System, 1973 P r o c e e d i n g s of the ACM

6.

L.A. Bjork: Proceedings

recovery Scenario of the ACM

for

a DB/DC

for

Conceptsr

a

DB/DC

System,

1973

7. D. L. Parnas: Software Engineering or Methods for the Multi-Person Construction of Multi-Version Programs, published in these proceedings 8. N. Wirth: The Design of 1, 309-333 (1971)

a P A S C A L Compiler~

Software,

Vol

97

Precedence Relation between Processes

Fig.

I

98

Mapping of Input States to Output States defined by a Process

l~p = {X+,X2} d

X2 Xl

~ig. 2

99

Distinction between Functional and Exceptional Actions of a Process I

~ig. 3

IIII

I

II

I

100

Use of Rangesand Error Descriptions

Example 1: 1 X(IXl