System-Level Fault Diagnosis - IEEE Computer Society

Large multiprocessor networks require system-level fault diagnosis. Researchers have established and extended a model for such diagnosis.

System-Level Fault Diagnosis Arthur D. Friedman The George Washington University Luca Simoncini Italian National Council of Research

With the advent of inexpensive microprocessor elements, the design of large multiprocessor computing networks becomes feasible. However, many interesting and important problems must be solved before such systems can be utilized in a meaningful way. These involve many aspects of computation, system structure, and reliability, and necessitate trade-offs among * parallel algorithms which allow the full exploitation of the intrinsic level of parallelism present in large computing networks, * new architectures which address the cost of the intercommunication network (one of the main parameters in overall system cost), and * efficient tools which allow an early detection of faults, an automatic diagnosis of faulty units with consequent automatic system reconfiguration, and a recovery of the programs, providing safe operation and thus enhancing system availability. Because of the potentially large number of interconnected units in a system, solutions to the above problems have tended to emphasize a system-level, rather than a logic-circuit-level, approach. A formal model of system-level fault diagnosis has been developed (and subsequently generalized to analyze desirable design characteristics for a system-level self-diagnosis). This model is helpful for automatic, user-transparent reconfiguration and recovery in multiprocessor systems. Here, we will examine the model and the large body of research founded on it and discuss trends in system-level fault diagnosis. A model for self-diagnosis The fundamental model for distributed systemlevel diagnosis, developed by Preparata, Metze, and March 1980

Chien, ' is based on the following assumptions: * A large system is partitioned into units, each of which can individually test another unit. * On the basis of the test responses, the test outcome is classified as "pass" or "fail," i.e., the testing unit evaluates the tested unit as either fault-free or faulty. * The test evaluation is always accurate if the testing unit is fault-free; otherwise, it is unreliable. These assumptions allow a diagnostic system to be modeled as a directed graph (called the diagnostic graph), in which each node vi corresponds to a unit in the system, and an arc from vi to vj (denoted by vij) corresponds to the existence of a test by which unit vi evaluates unit vj. The test outcome associated with vij will be aij = (0,1), where aij is 0 if both vj and vi are fault-free, and aij is 1 if vi is fault-free and vj is faulty. If the testing node vi is faulty, the test outcome is unreliable and aij can assume either of the values 0 or 1 (denoted by X) regardless of the status of vj. The possible situations are depicted in Figure 1. The model also assumes that only solid (i.e., permanent rather than intermittent) faults are considered, and that the tests which unit vi applies to vj detect all possible faults in vj. In this model the faulty units can be identified by decoding the syndrome, the set of test outcomes aij in the system. Faults in units vi and vj are distinguishable if the syndromes associated with them must be differe4F. The-two faults are indistinguishable if the syndromes associated with two different faults could be the same. These definitions may be directly extended to distinguishable and indistinguishable sets of faults called fault patterns. Figure 2 depicts a system with five units. If v, is faulty, the test syndrome will be as shown in line a. If

0018-9162/80/0300-0047$00.75 ( 1980 IEEE

47

v2 is faulty, the syndrome is as shown in line b. They are distinguishable since the value a5l is different. The multiple fault pattern {vl,v2} has the syndrome shown in line c, and since it may be the same as

the syndrome for the fault { vI } (depending on the unpredictable values of a12 and a23), { v1 } and { v1,v2 } are potentially indistinguishable. Two measures of diagnosability are one-step diagnosability and sequential diagnosability. A system of n units is one-step t-fault diagnosable if all faulty units within the system can be identified without replacement, provided the number of faulty units present does not exceed t. A system of n units is sequentially t-fault diagnosable if at least one faulty unit can be identified without replacement, provided the number of faulty units present does not exceed t. In a sequentially diagnosable system, the identification of all the faulty units may involve a multistep procedure. In the first iteration, at least one faulty unit is identified; after the replacement of this unit, the test is rerun, and additional faulty units may be identified. This process is reiterated until all faulty units have been identified and replaced. Sequential diagnosability requires the identification of the exact set of faulty units in the system. For this model, it has been shown that the number of units n must be at least 2t+ 1 for a system to be onestep t-fault diagnosable. Furthermore, in a one-step t-fault diagnosable system, a unit has to be tested by at least t other units. From these necessary conditions an optimal one-step t-fault diagnosable system can be defined for which n=2t+1, with each unit tested by exactly t other units (i.e., the system has the minimal number m=nt of tests). A system having a testing link from vi to vj (if and only if j-i=dm (mod n) and m assumes the values 1,2, ., t) is defined as a Ddt system. In Figure 3, a D22 connection is shown for

Figure 1. Assumed test outcomes in the Preparata-MetzeChien model.

Figure 2. A system and associated test outcomes. 48

n =5. For one-step t-fault diagnosability, the connections Ddt are optimal, with 6 and n relatively prime numbers. - Sequential diagnosability is more difficult to characterize. For t=2b+A (A10,1), a single-loop system with n units is sequentially t-fault diagnosable if

n>1 +(b+1)2+A(b+1). For general systems, the existence of an m=n+2t-2 class of designs that is sequentially t-fault diagnosable has been shown. Figure 4 shows

such a system for n = 14 and t = 6. The study of the Preparata-Metze-Chien model developed along two main lines-the analytical derivation of necessary and sufficient conditions for a system being either one-step or sequentially t-fault diagnosable, and the synthesis of sequentially t-fault diagnosable systems.2-6'27 This simplified initial model could not be directly applied to actual systems. However, it laid the groundwork for subsequent research which generalized the model and attempted to add more realistic constraints associated with actual systems. This research can be categorized along the following lines: * generalization of the graph model; * generalization of the possible test outcomes that units can yield when testing each other; * generalization of the measures of diagnosability; * generalization to intermittent faults; and * generalization to a model of concurrent computation and diagnosis, in which some of the units in the system are not involved in diagnostic action. These lines of development will be surveyed below.

Generalization of the'graph model The Preparata-Metze-Chien model is limited to systems in which each unit alone can test some other unit, and in which different failure rates for the units in the system are not characterized. Russell and Kime7'8 generalize this model-rather than viewing certain subsystems as testing others, their model formalizes faults and tests the relationships between them. It broadly interprets the concepts of fault and test; what constitutes a fault or a test depends on the level of the diagnostic analysis being done. In this way two situations can be modeled-one in which complete testing of a unit requires the combined operation of more than one unit, and one in which a unit (being tested in some other way) is known to be fault-free at the beginning of the diagnosis. The description of the diagnosis of the IBM System/360 Model 50,9 for example, can be outlined with a GDG-generalized diagnostic graph (Figure 5). The faults fl,. . f5 are associated with the units vl, . , V5 where v, is the main storage, v2 is the ROM control, V3 is the ALU, V4 is thelocal storage, and V5 is the channel. The test t, for the fault ft is not dependent for its validity on the absence of other faulty units, and therefore the GDG has no edge labeled tl. The two edges labeled t3 indicate that the presence of COMPUTER

either fault f, or fault f2. or both, is sufficient to invalidate t3. An algebraic approach to digital system fault diagnosis is presented by Adham and Friedman.10 Here, a set of fault patterns is described by a boolean expression in the variables { fl,f2,f }, where f/ is I if vi is faulty, or 0 if vi is not faulty. Any test tk is associated with two boolean functions-the invalidation I(tk) and detection D(tk) functions. The invalidation function of a test tk, I(tk) is a completely specified boolean function in the variables {fj,f2..jj that has the value 1 for every fault pattern in which the test outcome is unpredictable. The detection function of a test tk, D( tk) is an incompletely specified f such boolean function in the variables { f1nf2. }j, that for each fault pattern in which a test is not invalidated, the detection function has value 1 if the test should fail and value 0 if the test should pass. For patterns under which the test is invalidated, the detection function of the test is unspecified. With the use of boolean expressions, it is possible to define a generalized model in which a system's diagnostic structure, with n faults and p tests, is represented by p invalidation expressions and p detection expressions. Since D(tk) and I(tk) can be any boolean expressions, there are fewer restrictions on the specification of test fault relations. This model can be applied to many situations that cannot be handled by graph-based models-the failing of a test ti indicates the presence of the fault fi, for example, but the passing of the same test does not necessarily imply the absence of fi. In this case, the boolean expressions associated with tjareD(t,) = 0 and I( ti) = fi. To be applied to large systems, this approach requires the development of tools for efficiently manipulating the simple expressions derived for a large number of variables. Another generalization of the Preparata-MetzeChien model has been developed by Maheshwari and Hakimi.6 They attempt to take into account the probabilistic nature of fault occurrence. Considering that a given test outcome can be generated by several fault patterns with different probabilities, a probabilistically one-step t-diagnosable (p-t-diagnosable) system can be defined as one for which there exists at most one fault pattern Fj such that P(Fj > t). In this sense, Fj is the fault pattern most likely to have generated the given set of test outcomes. Maheshwari and Hakimi6 give a necessary and sufficient condition for one-step p-t-diagnosable systems. Fujiwara and Kinoshita," working on the generalized model introduced by Maheshwari and Hakimi, present the problem of finding a connection for a system S, such that S is one-step or sequentially p-t-diagnosable. They first give necessary and sufficient conditions for the existence of such a system, given the failure probabilities of the units in the system. They then consider the design of p-t-diagnosable systems, provided that such conditions hold. Two design classes are identified-one-step p-t-diagnosable systems and sequentially p-t-diagnosable systems. In general, the problem is that of identifying a subset of the set of nodes in the graph (base set) which March 1980

satisfies the condition for the existence of a p-t-diagnosable system. Fujiwara and Kinoshita complete the study of the Maheshwari-Hakimi generalized model by giving a necessary and sufficient condition for sequentially p-t-diagnosable systems. A more easily tested condition, also given by Fujiwara and Kinoshita, 15 is useful for designing various p-t-diagnosable systems.

Generalization of the possible test outcomes Another limitation of the Preparata-Metze-Chien model is that it considers only one set of possible test outcomes (Figure 1). The modeling of possible test

Figure 3. An optimal two-fault diagnosable system.

V12l

Figure 4. A sequentially six-fault diagnosable system.

t4 \

Ia

/ t

t40

14)

f

Figure 5. Diagnostic graph of IBM System/360 Model 50.

49

outcomes may indicate the usefulness of faulttolerant techniques, both in module design and in system diagnosability. Kavianpour presents a complete tabulation of possible test outcomes under various models.'2 An outline of all the nonequivalent models is shown in Table 1. The different models may be interpreted as follows: A,. is a "perfect tester," since the test outcomes correspond to a perfect diagnosis of faulty units. A,, and A are a "1-fail-safe tester" and a "0-fail-safe tester," since these models never have an incorrect 0 (1). AR is a model in which (due to testing unit and test complexity) a faulty unit will never incorrectly diagnose another faulty unit.13 A,a and A A are included primarily for completeness. Apis the Preparata-Metze-Chien model. ApTis the "partial tester, " meaning that there is a possibility that a fault-free unit cannot correctly diagnose a faulty unit, due to the complexity of the units and the fact that tests in reality may not be complete. Finally, AO is the "0-information tester," which provides no reliable test outcome. Kavianpour12 compares these models and discusses some of their more interesting properties. The modelApT, the "partial tester," is examined by Simoncini and Friedman.16 They consider the problem that the system tests may not be complete, i.e., that a fault-free unit may be able to detect only a percentage p ( t and n > s. A measure of the diagnosability of the system is the average, or expected, value of s-f, over all fault patterns with < t faulty modules. Karunanithi and Friedman21 completely characterize single-loop systems for one-step tis diagnosability and determine a one-step tis repair procedure for single-loop systems. This procedure is based on an iterative scanning of the set of test outcomes to identify particular subsequences pointing to the (most probable) faulty modules. Sequential diagnosis enabling a more accurate diagnosis and requiring fewer replacements is also considered. Finally, the researchers give results for one-step and sequential diagnosability of DOA systems which are a generalization of Dot systems.1 Kavianpour and Friedman22- study the case of onestep t/s diagnosability in which s = t. Here, all t fault patterns are exactly diagnosed, but patterns consisting of f < t faults may be inexactly diagnosed. Such systems can be designed with n = 2t + 1 modules, where each module is tested by fewer than t modules. This reduces the number of testing links required for t-fault diagnosability.1 If n>>t, each module must only be tested by [(t+ l)/2] other modules, a saving of almost half the testing links needed by the PreparataMetze-Chien model.22 Optimal designs for onestep tit diagnosability are identified. For any pattern of f< t faults, at most one nonfaulty module is ever replaced for one-step repair.

intermittently faulty, a fault-free unit will never be diagnosed as faulty, and the diagnosis will be at worst incomplete. In general, the fact that a system is t-fault diagnosable does not necessarily imply that it is also ti-fault diagnosable with tp = t, (the subscriptp stands for permanent). Mallela and Masson also give a necessary and sufficient condition for one-step ti-fault diagnosability. Each unit must be tested by at least ti + 1 other units. They give bounds on ti as a function of tp for systems which are one-step tp-fault diagnosable, and show that in systems in which no two units test each other, ti < tp, while ti may be equal to tp only in systems in which bidirectional testing links are present. Finally,they present a procedure for the determination of ti. Kavianpour12 addresses the same problem using the concept of a tis diagnosability measure. The t/s measure of diagnosability reduces the effect of incorrect diagnosis, since fault-free units as well as faulty units can be replaced. Two classes of systems, D1L systems and single-loop systems, are considered. Strategies for repairing such systems-which may have- intermittent faults-are given, and worst case fault patterns are identified for both classes. *Using k'-step t/s'-fault diagnosability, the parameters k' and s' derived for these strategies are shown to be larger, generally, than the k and s parameters derived by using repair strategies in the presence of permanent faults alone. Hence, diagnosis in the presence of intermittent faults generally requires additional repetitions of the system test routines. Furthermore, the number of units which may be replaced to restore fault-free operation is larger for intermittent faults than for permanent ones.

Concurrent fault diagnosis

Generalization to intermittent faults

The Preparata-Metze-Chien model studied the problem of system-level fault diagnosis, assuming The major part of the work on self-diagnosability of that normal operation is interrupted, and diagnosis systems has assumed that only solid (permanent) started, on a systemwide basis. When the number of faults can be present; all the results so far described modules in the system is large, some of them will be are limited to this assumption. Consideration of in- idle at a given moment. Hence, the system may be termittent faults is generally difficult, since it re- able to utilize this "slack" by having the nonbusy quires a modeling of the behavior of these faults in a modules perform diagnostics. In this way, computasystem and also requires iterative testing strategies tion and diagnosis are performed concurrently in real time, system diagnosis can be transparent to the to detect the faults. Mallela and Masson23 consider the effects of inter- user, and automatic reconfiguration procedures are mittent faults in diagnosable systems. The existence feasible. Nair, Metze, and Abraham24 introduce the concept of both permanent and intermittent faults in a system, for example, affects the test outcome which is re- of "roving diagnosis" (Figure 6). In this approach, ceived after repeated applications of the test one part of the system diagnoses a second part, while routines. This outcome may generate an incomplete the remainder of the system continues normal operadiagnosis of faulty units, since not all the faulty units tion. The part most recently diagnosed as fault-free in the system may be detected. This points out a ma- then takes its turn in diagnosing other parts. Thus,jor problem-if an incomplete diagnosis cannot be there appears to be a subsystem of diagnosing and avoided, an incorrect diagnosis of a fault-free unit as diagnosed units which "roves" through the system faulty must be avoided in the case of intermittent until no part of it remains undiagnosed. However, faults. Therefore, a ti-fault diagnosable system is de- roving diagnosis must ensure that the first diagnosis fined as a system in which, if no more than ti units are will produce unique, identifiable results. A system is March 1980

51

V2

0 0

measure. A system S is concurrently sequentially diagnosable if, after the removal of B busy modules from the directed graph, it is possible by the application of the test to identify the status (faulty or faultfree) of at least one of the remaining n-B modules, without interrupting any of the busy modules. For DIL systems, the researchers give necessary and sufficient conditions for the identification of at least one fault-free module, without the interruption of any busy module and using a random scheduling algorithm for the assignment of busy modules. By optimally assigning busy modules, they show that the degree of parallelism is only slightly affected by increasing the number of interconnections of the directed graph, and is bounded by the value p < [(n-2t-1)ln] 100 percent.

Figure 6. A system with roving diagnosis.

designed, therefore, in which a specific initial good module can always be diagnosed. If, at most, t faults are present in the system, 2t units test unit v2t+ 1. Also, all 2 t units form a complete graph (i.e., there is a testing link between every two modules). In this case, at least one good module from the set of 2t + 1 modules can be identified. The good modules are then used for diagnosing other parts. In this approach, the complete set of 2t + 1 modules is involved in the first testing iteration. The remainder of the system can continue computation. The same fixed set of modules participate in testing for the initial step. This approach requires an irregular system of high complexity. Saheban and Friedman29 address concurrent fault diagnosis by generalizing the Preparata-MetzeChien model through the introduction of the idea of unit status-a unit is busy if it has been assigned to computation, and nonbusy if it has been assigned to the diagnostic part of the system. This assignment is dynamic, in the sense that every module will go into a nonbusy status to provide a complete system diagnosis. The model of the busy/nonbusy module is made by removing the busy module and its associated arcs from the directed graph representing the diagnostic characteristics of the whole system. The requirements are: * The diagnostic subgraph obtained by removing the busy module must maintain diagnostic capabilities. * The complexity of the scheduling algorithm (i.e., the system component which dynamically assigns modules to be busy or not busy) has to be considered. * Optimal D1L systems must be derived (i.e., ones which have the maximum number of busy modules and which maintain the maximum diagnostic capabilities in the rest of the system, using the measure of one-step t-fault diagnosability).25 Saheban and Friedman26 consider concurrent fault diagnosis using a sequential t-fault diagnosability

Technological advances have demanded a higher level of abstraction in the modeling of many problems associated with system design. Fault diagnosis is one of these problems. Through generalization of the original PreparataMetze-Chien model, constraints encountered in real systems have been introduced. However, future extensions and modifications may be necessary to make the model more applicable to such systems. The assumption that a unit can test other units requires complete access to the tested modules. This is difficult to obtain, unless a complex interconnection network is present. Also, testing links represent logical connections, and the directed graph is a logical representation of the diagnostic capabilities of a system. Consequently, a directed graph is not necessarily a simplified representation of a system's data flow structure. It must still be determined what types of internal system organizations can most efficiently support the diagnostic procedures implied by a directed graph. Another problem requiring attention is that of modeling, not only of the diagnostic phase in a system, but also of normal system operation. An integrated approach to the modeling of fault-tolerant computing systems must consider both normal and diagnostic operations as well as reconfiguration. R

Acknowledgment This work has been supported by the convention between Selenia S. p. A. and the Italian National Council of Research, and by the National Science Foundation under Grant MCS77-21569.

References 1. F. P. Preparata, G. Metze, and R. T. Chien, "On the Connection Assignment Problem of Diagnosable Systems," IEEE Trans. Electronic Computers, Vol. EC-16, No. 6, Dec. 1967, pp. 848-854. 2. S. L. Hakimi and A. T. Amin, "Characterization of Connection Assignment of Diagnosable Systems," IEEE Trans. Computers, Vol. C-23. No. 1. Jan. 1974. pp. 86-88.

COMPUTER

3. F. J. Allan, T. Kameda, and S. Toida, "An Approach to the Diagnosability Analysis of a System," IEEE Trans. Computers; Vol. C-24, No. 10, Oct. 1975, pp.

1040-1042.

4. P. Ciompi and L. Simoncini, "Analysis and Optimal Design of Self-Diagnosable Systems With Repair," IEEE Trans. Computers, Vol. C-28, No. 5, May 1979, pp. 362-365. 5. P. Ciompi and L. Simoncini, "On the Diagnosability with Repair of Digital Systems," IEEE Computer Society Repository. 6. S. N. Maheshwari and S. L. Hakimi, "On Models for Diagnosable Systems and Probabilistic Fault Diagnosis," IEEE Trans. Computers, Vol. C-25, No.3, Mar. 1976, pp. 228-236. 7. J. D. Russell and C. R. Kime, "System Fault Diagnosis: Masking, Exposure, and Diagnosability without Repair," IEEE Trans. Computers, Vol. C-24, No. 12, Dec. 1975, pp. 1155-1161. 8. J. D. Russell and C. R. Kime, "System Fault Diagnosis: Closure and Diagnosability with Repair, " IEEE Trans. Computers, Vol. C-24, No. 11, Nov. 1975, pp. 1078-1088. 9. F. J. Hackl and R. W. Shirki "An Integrated Approach to Automated Computer Maintenance," Conf Record-1965 IEEE Conf Switching Theory and Logical Design, Oct. 1965, pp. 298-302. 10. M. Adham and A. D. Friedman, "Digital System Fault Diagnosis," J. Design Automation & Fault Tolerant Computing, Vol. 1, No. 2, Feb. 1977, pp. 115-132. 11. H. Fujiwara and K. Kinoshita, "Connection Assignment for Probabilistically Diagnosable Systems," IEEE Trans. Computers, Vol. C-27, No. 3, Mar. 1978, pp. 280-283. . 12. A. Kavianpour, "Diagnosis of Digital System Using t/s Measure," PhD Thesis, University of Southern California, June 1978. 13. F. Barsi, F. Grandoni, and P. Maestrini, "A Theory of Diagnosability of Digital Systems," IEEE Trans. Computers, Vol. C-25, No. 6, June 1976, pp. 585-593. 14. C. Berge, "Graphs et' Hypergraphes," Paris, Dunod, 1970. 15. H. Fujiwara and K. Kinoshita, "Some Existence Theorems for Probabilistically Diagnosable Systems," IEEE Trans. Computers, Vol. C-27, No. 4, Apr. 1978, pp. 379-384. 16. L. Simoncini and A. D. Friedman, "Incomplete Fault Coverage in Modular Multiprocessor Systems," Proc. ACM Ann. Conf, Washington, D.C., Dec. 4-6, 1978, pp. 210-216. 17. M. L. Blount, "Probabilistic Treatment of Diagnosis in Digital Systems," Proc. Seventh Ann. Int'l Conf Fault-Tolerant Computing, Los Angeles, Calif., June 28-30, 1977, pp. 72-77.* 18. M. B. Baskin et al., "PRIME-A Modular Architecture for Terminal Oriented Systems," AFIPS Conf Proc., Vol. 40, 1972 SJCC, pp. 431-437. 19. M. L. Blount, "Modeling of Diagnosis in Fail Softly Computer Systems," Technical Note No. 123, Center for Reliable Computing, Stanford University, Sept. 1978. 20. A. D. Friedman, "A New Measure of Digital System Diagnosis,"Digest ofPapers-1975Int'l Symp. FaultTolerant Computing, Paris, France, June 18-20,1975, pp. 167-169.* March 1980

21. S. Karunanithi and A. D. Friedman, "System Diagnosis with tls Diagnosability," Proc. Seventh Ann. Int'l Conf Fault-Tolerant Computing, Los Angeles, Calif., June 28-30, 1977, pp. 65-71.* 22. A. Kavianpour and A. D. Friedman, "Efficient Design of Easily Diagnosable Systems," 3rd USA-Japan Computer Conf Proc., San Francisco, Calif., 1978, pp. 14-1 to 14-17. 23. S. Mallela and G. M. Masson, "Diagnosable Systems for Intermittent Faults," IEEE Trans. Computers, Vol. C-27, No. 6, June 1978, pp. 560-566. 24. R. Nair, Gi Metze, and J. Abraham, "Design Con-

25'. 26.

27. 28.

29.

siderations for Fault-Tolerant Distributed Digital Systems," unpublished manuscript. L. Simoncini and A. D. Friedman, "Concurrent Diagnosis in Parallel Systems," Proc. 1979 Int'l Conf Parallel Processing, Aug. 1979, pp. 279-286.* F. Saheban, L. Simoncini, and A. D. Friedman, "Concurrent Computation and Diagnosis in Multiprocessor Systems," Digest of Papers-Ninth Ann. Int'l Symp. Fault-Tolerant Computing, Madison, Wisc., June 20-22, 1979, pp. 149-156.* H. Fujiwara and K. Kinoshita, "On the Complexity of System Diagnosis," IEEE Trans. Computers, Vol. C-27, No. 10, Oct. 1978, pp. 881-885. J. P. Hayes, "A Graph Model for Fault Tolerant Computing Systems," IEEE Trans. Computers, Vol. C-25, No. 9, Sept. 1976, pp. 875-883. F. Saheban and A. D. Friedman, "Diagnostic and Computational Reconfiguration in Multiprocessor Systems," Proc. ACMAnn. Conf, Washington, D.C., Dec. 4-6, 1978, pp. 68-78.

*This digest or proceedings is available from the IEEE Computer Society Publications Office. 5855 Naples Plaza, Suite 301. Long Beach, CA 90803.

Arthur D. Friedman is a professor in the Department of Electrical Engineering and Computer Science at George Washington University. His research interests include multiprocessor systems, fault diagnosis and fault-tolerant systems, design automation, and computer architecture. He was previously a _ member of the technical staff of Bell - .X , Telephone Laboratories and an associate professor at the University of Southern California. He is a co-author of Theory and Design of Switching Circuits, (Computer Science Press) and Diagnosis and Reliable Design of Digital Systems (Computer Science Press). Friedman received his PhD in electrical engineering from Columbia University in 1965. Luca Simoncini is a researcher with the Italian National Council of Research at the Istituto di Elaborazione dell' Informazione in Pisa, Italy. His current interests include fault-tolerant com-

puting, design automation, and

com-

puter architecture. He received his PhD degree in electrical engineering from the University of Pisa in 1970. As

a Nato FeUow, he was a visiting re_____ ^ search scholar at the Department of Electrical Engineering at the University of Southern California and at George Washington University in the Department of Electrical Engineering and Comnputer Science.

53