Document not found! Please try again

Software Dependability: A Personal View - CiteSeerX

2 downloads 0 Views 269KB Size Report
Summer of 1973, and Mike Melliar-Smith. The scheme required a ..... C. M. F. Rubira and R. J.. Stroud, “Toward an Object-Oriented Approach to Software.
Software Dependability: A Personal View Brian Randell Department of Computing Science University of Newcastle upon Tyne, Newcastle upon Tyne, UK In Proceedings of the 25th International Symposium on FaultTolerant Computing (FTCS-25), Pasadena, California, USA, 27-30 June 1995,! ,!pp. 35-41 IEEE Computer Society Press,!1995

© 1995 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Software Dependability: A Personal View Brian Randell Department of Computing Science University of Newcastle upon Tyne, Newcastle upon Tyne, UK

Abstract This paper attempts to stand back and consider how the field o f software dependability research has progressed over the last twenty-five or so years. It provides a personal perspective o n early developments such as the recovery block and the Nversion programming scheme, and on more recent research i n which the author has been involved aimed at unifying and extending these schemes. It then discusses first the present state of the art and then the way that the industry is likely t o develop in future and the consequences this will have on the dependability field. This discussion draws on a summary o f some of the ideas that were put forward at a recent ICL/ESPRITsponsored workshop that the author helped to organize. This workshop was in fact on The Future of the Software Industry. However, a number of the ideas discussed, in particular those relating to mega-systems and to system structuring, are o f particular relevance to software dependability research.

1: In The Beginning The Silver Jubilee of the FTCS Series provides an excellent excuse for trying to stand back and consider how our research field has developed, and how it might progress in the future. However, despite the fact that what I might term my second research interest is the history of computing, this paper has no pretensions to historical respectability. Rather, it provides what is at best an anecdotal account of some developments with which I am familiar – an account that will serve me as a springboard for the comments I wish to make about the present and possible future state of research into software dependability. My own involvement with the FTCS series does not I am afraid go back twenty-five years – the first seminar with which I was involved was FTCS-8 in Toulouse. Nevertheless, this paper takes as its starting point an event that preceded even FTCS-1, namely the 1968 NATO Software Engineering Conference. In part in the interests of symmetry, I will end with comments that have their origins in a workshop that had some interesting

similarities to the 1968 NATO conference, even though it was held only last year. The NATO Conference had a deliberately provocative title – it was meant to imply the need for a subject that could justifiably be termed “software engineering.” Thus the choice of title was not meant as a claim that such a subject already existed. The discussions at the conference of the major problems of many then-current large software projects were a great spur to research in subsequent years aimed at producing bug-free software. But I, on the other hand, was motivated to wonder whether it might be possible to find means of achieving reasonably reliable service from complex software systems despite the bugs that they probably still contained. I brought to this task an active interest in issues of system structuring, and the power of recursive structures – dating back to my early involvement in Algol 60, and nurtured by work that I had recently been doing at IBM Yorktown Heights on system design methodology. This then was the background to a proposal for a research project on system (and especially software) fault tolerance that I and colleagues at Newcastle, which I had joined in 1969 from IBM, put to the UK Science Research Council. The initial funding that we received enabled us to undertake a fairly detailed study of several very large systems, including the British Airways airline reservation system, and the Barclays Bank overnight cheque reconciliation system. The lengthy report we produced on our findings enabled us to get the major project grant that we were seeking – and also was used in part as the basis for the text of my invited talk on operating systems at IFIP71[8]. In fact the main lesson that we learnt from the systems we studied was that much of their cost and complexity was due to the large amount of code that they contained for tolerating operational hardware faults and mistakes by users and operators. However, the mechanisms that had been developed had also proved to be capable of providing a degree of protection from some software faults. This was just as well because such faults had turned out to be one of the most significant causes of system failure.

In examining the strategies used for fault tolerance in these systems we found that the recovery provisions in particular seemed rather ad hoc and lacking in generality. The error recovery strategies were complex, and often less reliable than the normal application code, and in general not able to cope with further errors that were detected before an already-started recovery had completed successfully. It was for this reason that in my IFIP paper I stated: “Indeed one might suggest that error recovery should be amongst the first problems that are treated during the system design process rather than, as so often happens, the last.”

2: Early Work at Newcastle The recovery block scheme was our attempt to address this issue. Its first linguistic formulation was due to Jim Horning, visiting Newcastle from Toronto during the Summer of 1973, and Mike Melliar-Smith. The scheme required a suitable mechanism for providing automatic backward error recovery. I produced the first such “recovery cache” mechanism, a description of which was included in the first paper on recovery blocks [3], together with a discussion of “recoverable procedures” – a rather complex mechanism that Hugh Lauer and I had proposed as a means of extending the recovery block scheme to deal with programmer-defined data types. This part of the paper would undoubtedly have been much clearer if we had expressed our ideas in object-oriented terms. But instead, in the paper and in our immediately subsequent work we concentrated on processes as we enlarged the domain for which we explored means of software fault tolerance. Our initial work had simply concerned sequential programs. But by 1975 we had moved on to consider the problems of providing structuring for error recovery among sets of co-operating processes. Having identified the dangers of what we christened the “domino effect,” we had come up with the notion of a “conversation” – this was described in the paper that is now, perhaps a little unfairly, the one that is usually cited when reference is made to recovery blocks [9]. In effect, what we had started to do, and in fact continued to do for some years, was gradually extend the range of systems for which we tried to provide well-structured error recovery; as a long term research project, as opposed to an urgently-needed real application project, we had the luxury of gradually trying to add complexity to reliability, as opposed to striving to add reliability to immense complexity. The latest stage of this research is represented by another paper here at FTCS-25 [12]. This introduces a framework for fault tolerance that integrates conversations (for coordinated error recovery between concurrent interacting activities), transactions (for maintaining the consistency of shared resources in the presence of

concurrent access) and exception handling. It thus supports the use of both forward and backward error recovery techniques, in order to tolerate hardware and software design faults, and also environmental faults (i.e. faults that exist in or have affected the environment of the computing system). Turning back to much earlier work for a moment, when I and colleagues at Newcastle first became interested in the possibility of achieving useful degrees of design fault tolerance, we found that one of the problems facing us was the inadequacy for our purposes of the concepts and terminology that hardware reliability engineers were then using. In the 1970s hardware engineers took various particular types of fault (stuck-at-zero, stuck-at-one, etc.) that might occur within a system as the starting point for their definitions of terms such as system reliability and system availability. But given not just the absence of any useful categorization of design faults, but also the realization that in many cases the actual identification of some particular aspect of a complex system design as being the fault might well be quite subjective, we felt in need of a more general set of concepts and definitions. And of course we wanted these definitions to be properly recursive, so that we could adequately discuss problems that might occur either within or between system components at any level of a system. The alternative approach that we developed took as its starting point the notion of failure, whether of a system or a system component, to provide its intended services. Depending on circumstances, the failures of interest could concern differing aspects of the services – e.g. the average real-time response achieved, the likelihood of producing the required results, the ability to avoid causing failures that could be catastrophic to the system's environment, the degree to which deliberate security intrusions could be prevented, etc. The ensuing generality of our definitions of terms thus led us to start using the term “reliability” in a much broader sense than was perhaps desirable, or acceptable to others [7]. It was our colleague, Jean-Claude Laprie of LAAS-CNRS, who came to our linguistic and conceptual rescue by proposing the use of the term “dependability” instead, and going on to lead a very successful effort to produce a full set of well-argued definitions. Many members of IFIP WG10.4 (Dependability and Fault Tolerance) and of the ESPRIT Basic Research Projects on Predictably Dependable Computing Systems have contributed to this work, one or whose major manifestations is the five-language book on Dependability Concepts and Terminology [6].

3: The Present Much else of course has happened relating to system, and in particular, software, fault tolerance during the period since 1975. The initial Newcastle work was based on concepts of dynamic redundancy, and “stand-by sparing.” Within a year or so, Al Avizienis and his colleagues had introduced the idea of N-Version Programming, an approach that can be seen as a corresponding extension of static redundancy, in particular N-Modular Redundancy [1]. Since these two initial developments, many variants and extensions to these schemes have been developed. However, we now prefer to regard virtually all of these schemes as special cases of a single general structuring scheme [13]. This scheme is based on the concept of an idealized fault-tolerant component [7], with any such component being constructed from, in general, a number of diverselydesigned variants, an adjudicator and a control algorithm. All of these in turn can, in principle, themselves be idealized fault-tolerant components, so that the structuring scheme is properly recursive. A disciplined exception handling scheme is used to handle situations in which errors cannot be masked, and to provide safe means of using forward and backward error recovery in combination as appropriate. This scheme takes advantage of object-oriented programming ideas, so as to facilitate re-use of the code expressing standard fault tolerance strategies (such as recovery blocks and N-version programming) and adjudication mechanisms (such as majority voting). This also facilitates re-use where appropriate of application code modules – though such reuse may need to be limited in order to maintain desired levels of design diversity. The result is a very flexible scheme allowing the combined use in a single application of different fault tolerance strategies, as directed by the application programmer. The paper describing this scheme [13] does not consider issues of concurrency – however it is our intention to combine the scheme with the work that is reported in [12]; we expect by this means to provide ourselves with a very powerful and general method of structuring even quite complex fault-tolerant computing systems. So much for recent work, especially at Newcastle, on software fault tolerance. During all this time, various other research groups have of course been putting much effort into schemes for software fault avoidance, such as methods and tools for formal verification. The respective merits of all these approaches to achieving dependable software have been the subject of much debate. The proponents of software fault tolerance have perhaps been more diligent in attempting to provide quantitative data about the

cost/effectiveness of their schemes – though the analysis of such data has itself led to further controversy. However, while these debates have been going on, there has been an increase in industrial take-up of the results of the research that has been done on software fault tolerance as well as that on software fault avoidance. There has also been a gradual expansion of the class of system and type of environment for which these results are employed [11]. Nevertheless, a great deal of present software is developed by techniques that owe little to fault tolerance and even less to formal methods, as far as I can tell. But in making such a statement, it is important to distinguish between bespoke software projects for critical environments where such techniques are gaining acceptance, and package software that typically is produced in such quantity that program testing (possibly by tens of thousands of willing or unwilling volunteers) becomes feasible and economical. In the former case, one can – or at least ought to be able to – assume that great thought will go into selecting and attempting to make good use of the most effective methods that are available for achieving the high levels of dependability that are required. In practice the great variability that has been demonstrated to exist in individual and institutional competence regarding system design and software engineering [4] almost certainly exists even within the somewhat specialized world of safety-critical system design. The nature of such bespoke projects is often such that their design can rely little on past experience, let alone pre-existing code. One is therefore very much dependent on the ability of organizations that develop such bespoke software to produce a very high-quality product ab initio. Needless to say, it is very difficult to do this, even with the best fault tolerance and fault avoidance facilities, never mind provide believable evidence that it has been done. In contrast, in much of the software package field there is such intensive competition and product evolution that a Darwinian “survival of the fittest” natural selection process works, and often works very well indeed. The fact that products evolve through many versions, employing high levels of code reuse, and that a huge amount of testing is involved, means that at least the better products can be very good, and deliver great functionality with a surprisingly high dependability, relative to their internal complexity. It is a sobering thought that to date most research on software dependability issues has been (often implicitly) aimed at the bespoke safety-critical software world – yet most of the practical successes in obtaining highly reliable software of very sophisticated functionality have been achieved in the software package world.

4: The Future On an occasion like this it is appropriate to try to predict what the future holds for our field. Many dramatic predictions have of course been made about various aspects of the future of computing as a whole. For many years now the computer industry has grown extremely rapidly, yet steadily, and shows every sign of continuing to do so. Its impact on society, particularly in the developed world, has been immense. This impact is sometimes direct, but often indirect. However, it has rarely been closely in line with the many dramatic predictions, whether positive (e.g. relating to personal domestic robots) or negative (e.g. concerning the advent of a computerized Big Brother society) that have frequently been made. At least some of these predictions seem to have arisen from a misunderstanding of the nature, and hence likely future path, of the various sorts of development that fuel the growth of the industry. The principal factor in this growth has of course been progress, often military funded, in first electronic and then microelectronic (and perhaps in the future opto-electronic) technologies, and also in various storage technologies. The direct consequences on the computer industry of this growth have ranged from enabling (i) ever more powerful computers to be provided, for basically the same cost, to (ii) roughly constant levels of computing power to be produced ever more cheaply. An increasingly wide market sector has therefore been opened up within which many sub-species of computer, and indeed sub-industries, have been created. Moreover, both trends (increased power and decreased cost) have spurred work on applying computers to an ever wider range of types of application. The speed with which hardware developments are occurring is often, but misleadingly, contrasted with the more modest but nevertheless very significant rate of improvement that is occurring in software productivity. However it is hardware technology that is developing so dramatically. The problems of hardware logic design, and the rate of productivity improvement being achieved, are essentially similar to those in software. This is not surprising because, especially with the advent of Very Large Scale Integration (VLSI), the similarities between software design and hardware logic design are much greater than their differences. Both types of design are dominated by the problems of mastering logical complexity. Perhaps the most interesting difference, at least conceptually, is that to a first approximation hardware designers, particularly processor designers, are continually designing essentially the same thing, each time from a different starting point as technology improves. In contrast, many software designers start from roughly the same point each time, and design

something different as new applications are invented. Nevertheless, both hardware and software design methods are improving, and indeed converging, as more adequate theoretical foundations and effective higher-level notations and structuring techniques are created, and improved methods of fault avoidance and fault tolerance are developed. The other main factors in the growth of the computing industry are in fact a consequence of the cost reductions that hardware technology improvements have permitted. These have made it practical to apply computers ever more widely, and to involve ever more people in the design of applications and systems software. With so many more people engaged in the creation of software, there is rapid development of many new and innovative applications. On the other hand, customers’ very practical desires for continuity and compatibility are tending to inhibit progress. Taking all these various factors into account in predicting how the world of computers as a whole, and our part of it, will change is not easy. It is one thing to estimate how processing speeds and costs will change, and perhaps how our ability to design and implement comparatively well-understood applications will improve. It is quite another to predict what new, and perhaps revolutionary, application programs will be thought up. Equally difficult is the prediction of when and how various existing limits to our knowledge of how to solve various very challenging design problems will be breached, and various long term goals, for example in achieving ultrahigh dependability from complex software systems, reached. What I do predict with some confidence is that our particular field will remain at the cutting edge, so to speak. I say this because it seems that human nature is such that dependability always breeds dependence – indeed overdependence. As we improve our ability to achieve the currently required level of dependability, from systems of the currently required level of functionality, so individuals and societies – often unthinkingly – start needing (or at least wanting) yet higher levels of functionality and/or dependability. The future of our research field therefore seems very dependable! But “predictions” of such generality do not get us very far in predicting the future of the computer dependability field, even if they are near the limit of what I feel confident in making personally. In order to go further, I shall seek safety in numbers, so to speak, and draw on the combined wisdom and experience that was exhibited at a workshop that I helped to organize recently, and that had much of the flavour of the NATO Conference that I referred to at the start of this talk.

5: The Software 2000 Workshop Last Autumn Bill Wulf and I assisted Gill Ringland of ICL’s Finance and Business Strategy Group in setting up a workshop on the future of the software industry. The motivation for such a workshop was “the belief that a set of discontinuities will be hitting software – technology, techniques, skills, applications – in the next few years, and the fact there did appear not to be any coherent external body of analysis or thought on these issues, at least in any form that might help strategic business planning.” We managed to get a very interesting set of people together for the workshop, from a number of industry, government and academic organizations in Europe, the USA and Japan. It was held in April 1994, in the very congenial surroundings of Hedsor Park, a splendid country mansion in the Thames Valley, not very far from Heathrow Airport. The workshop was extremely stimulating, and although many of the topics discussed were not directly related to the topic of dependability, a number of the most important ones were. (Incidentally, a report of the workshop [10] is available from ESPRIT, who cosponsored the workshop.) The single most memorable characteristic of the Software 2000 Workshop was the degree of unanimity amongst a set of widely-drawn participants as to the dramatic extent to which networking developments, and a global data infrastructure, will change the entire computing scene and in particular what software is, how it is delivered, who writes it and who uses it. Present levels of hype about “data superhighways” notwithstanding, I was very surprised that there was such a level of agreement, amongst industrialists as much as academics, about the unique importance of current networking developments, and the Internet in particular. One likely result, it was agreed, is that in the future there will be a large number of what can justifiably be termed “mega-systems”, each containing vast numbers of computers, and spread across the face of the globe. (The present-day systems that come closest to what I have in mind when I use the term “mega-system” are nation-wide telephone systems.) Many if not most of these future mega-systems will be constructed by treating existing, already very complex systems, as components and linking them together via the global network. The component systems – many of which will be what are these days disarming termed “legacy systems” – will not usually have been designed to be so linked. They will therefore have to be “graunched” together – the verb “to graunch” is old Royal Air Force slang for “to make to fit by the use of excessive force.” (My introduction of this term – which I learnt from my father – to the workshop seems to be giving it a new lease of life, since several accounts of the

workshop by others have made use of the term!) Compared to the problems of building such mega-systems, those of building systems from scratch on a green field site, so to speak, without the need to provide continuous service to dependent users meanwhile, will I suggest come to be seen as relatively minor. (I presume similar remarks can be and are made by the designers of airports and road systems!) The resulting mega-systems are certain to be extremely fragile unless something very special is done to prevent this. The problem of how to make such systems adequately dependable will I believe be one of the major challenges that the dependability research community faces. It is a vastly more difficult task than that of achieving high dependability from a computer system composed out of a homogeneous set of specially designed and interfaced component systems. The problems will range from those deriving from the inconsistent views of the world that are usually embodied in separately designed systems (problems that were very thoroughly analyzed years ago in the book “Data and Reality” by Bill Kent [5]) to the problems of providing continuous service (i) despite the presence of large numbers of design faults, and also (ii) while performing the system integration task and the component re-engineering that this task may well involve. Another highly relevant topic that was extensively discussed at Software 2000 was that of the structuring of complex software systems. This discussion was motivated in part by some startling statistics concerning the growth, in size and numbers, of embedded software systems – for example, it was stated that a top-of-the-line TV set now contains 500 kilobytes of code, even a shaver contains 2 kilobytes, and the amount of code in consumer products is doubling every year – as well as by the more evident huge and rapidly growing size of many current major application programs. There was general agreement about the importance in this regard of object-oriented techniques – but also about the fact that they are by no means a complete solution, in particular to the problem of achieving high levels of software re-use. (To use a favourite analogy of mine, object-oriented languages and systems provide a very convenient and powerful form of mortar but do not help much with the difficult task of settling on a suitable set of types of brick.) One interesting thread to this discussion concerned the fact that object-oriented techniques are just one of a long line of useful abstraction mechanisms, and that what is needed are powerful specialization mechanisms allied to the various abstraction mechanisms. The point is that such specialization mechanisms can help solve the performance problems that often inhibit the use, or at least retention into operational systems, of abstraction mechanisms.

A programming technique that I and my colleagues have become very interested in during these last few years, namely “reflection,” can be seen as providing such a specialization mechanism. This technique allows various language mechanisms that normally would be regarded as basic and unchangeable parts of the language (such as the means by which objects are created, and the means by which their methods are invoked) to be modified in a wellcontrolled and well-structured manner by an application programmer, possibly even at run time. For example, one can use reflection to arrange that when an application program calls for an object to be created, several replicas are also created – and that when operations are performed on the object, they are also performed on the replicas, and the results checked against each other. Thus NMR-like schemes can be introduced selectively, in order to ensure that critical objects are held and processed very dependably, without making any changes to – and in particular without introducing any additional complexity into – the application program. (A joint LAAS/Newcastle paper on this topic is being given by my colleagues at this Symposium [2].) This approach is of course just another means for structuring and building dependable computing systems out of less dependable parts – a system structuring strategy that is much more fully accepted in the reliability community than it is in the safety and the security communities. People and organizations who do not accept or fully understand this viewpoint are, needless to say, the ones who build large nuclear reactor software systems without isolating the safety-critical aspects into as small and wellconfined an area as possible, or huge multi-level secure office automation systems in which even the word processing software has to be specially implemented and validated at great expense as trusted (and one hopes trustworthy) code. The two topics of evolving mega-systems and extensions to current ideas of object-oriented programming were for me, at least, the major themes that emerged from the workshop – though of course many other topics were also covered. But the two topics were not just treated separately. Instead they were brought together by Mario Tokoro, with his notions of “autonomous agents,” which he described in the following terms: “We need an evolved notion of objects, which we call autonomous agents, to describe open and distributed systems. An autonomous agent is the unit of individual software that interfaces with humans, other agents, and the real world in real time. Each autonomous agent has its own goal, and reacts to stimuli, based on its situation. It behaves so as to survive. A collection of such autonomous agents shows emergent properties which cannot be ascribed to individuals, eventually forming a society.”

This notion of a society of autonomous agents, each cooperating with others in an ever-changing world while defending itself from accidental or deliberate threats to its own survival, in my view captures very nicely much of what will be needed in the future. However the challenge of implementing such “societies of agents,” and at the same time coping adequately with the need to make continued use of an ever-growing population of burdensome legacy systems, is a formidable challenge indeed, but one that I trust the FTCS community will rise to, and triumph over, in the years to come.

6: Concluding Remarks As I have already pointed out, we can as a technical community feel confident that our chosen subject, although it will need continued development, will continue to be in demand due to society’s increasing dependence on sophisticated computer-based systems, including what I have termed “mega-systems.” I could stop here, so as to end on somewhat of a high note, albeit a somewhat cynical one. But this would in fact be a false note, since I feel I must go on to admit that I fear that general recognition of this dependence will arise only at the price of occasional catastrophic software-induced system failures. Our researches will, I hope, help to reduce the frequency and criticality of such failures, but they will still occur, since many of the new systems will involve attempts to automate processes that previously relied in part on human intelligence. My own view is that when people try to automate a complex system that previously was partly or wholly manual, although there will be a great opportunity to reduce the incidence of minor errors, unplanned-for circumstances may occasionally occur, and that when they do very large errors will often result and lead to major failures – errors that would probably have been caught and compensated for by human beings exercising common sense. In other words, with such automation projects, the best that can possibly be achieved is fewer but bigger failures. This, I fear, is a more realistic note on which to end this attempt to step back a little from our field, and to assess where it has come from, and where the future is leading it. However, I am sure that it is a future in which the FTCS series will continue to play an important and highly beneficial role, so it has been a great privilege to be asked to play a part in this celebration of FTCS’s first twenty-five years.

Acknowledgements The work at Newcastle has been supported during recent years by the Commission of the European Community in the framework of ESPRIT Basic Research Projects 3092 and 6362 “Predictably Dependable Computing Systems,” and my thinking about the topics I have attempted to cover has benefitted greatly from discussions with a number of colleagues in these projects, at Newcastle and elsewhere, i n particular Jie Xu and Robert Stroud.

References [1] L. Chen and A. Avizienis, “N-Version Programming: A fault-tolerance approach to reliability of software operation”, in Proc. 8th IEEE Int. Symp. on Fault-Tolerant Computing (FTCS-8), (Toulouse, France), pp.3-9, 1978. [2] J. C. Fabre, V. Nicomette, T. Pérennou, R. J. Stroud and Z. Wu, “Implementing Fault-Tolerant Applications using Reflective Object-Oriented Programming”, in Proc. 25th Int. Symp. on Fault-Tolerant Computing (FTCS-25), (Pasadena, CA, USA), IEEE Computer Society Press, 1995. [3] J. J. Horning, H. C. Lauer, P. M. Melliar-Smith and B. Randell, “A Program Structure for Error Detection and Recovery”, Lecture Notes in Computer Science, 16, pp.177-93, 1974. [4] W. S. Humphrey, Managing the Software Process, 492p., Addison-Wesley, Reading MA, 1989. [5] W. Kent, Data and Reality: Basic assumptions in data processing reconsidered, 211p., North-Holland, Amsterdam, 1978. [6] J. C. Laprie (Ed.), Dependability: Basic concepts and terminology — in English, French, German, Italian and Japanese, Dependable Computing and Fault Tolerance, 5, 265p., Springer-Verlag, Vienna, Austria, 1992. [7] P. M. Melliar-Smith and B. Randell, “Software Reliability: The role of programmed exception handling”, in Proc. Conf. on Language Design For Reliable Software (ACM SIGPLAN Notices, vol. 12, no. 3, March 1977), (Raleigh), pp.95-100, ACM, 1977. [8] B. Randell, “Operating Systems: The problems of performance and reliability”, in Proc. IFIP Congress 71 (vol. 1), (Ljubljana, Yugoslavia), pp.281-90, NorthHolland, 1971. [9] B. Randell, “System Structure for Software Fault Tolerance”, IEEE Trans. on Software Engineering, SE-1 (2), pp.220-32, 1975.

[10] B. Randell, G. Ringland and B. Wulf (Eds.), Software 2000: A view of the future, 118p., ICL and the Commission of the European Communities, Brussels, 1994. [11] U. Voges (Ed.), Software Diversity in Computerized Control Systems, 2, Springer-Verlag, Vienna, 1988. [12] J. Xu, B. Randell, A. Romanovsky, C. M. F. Rubira, R. J. Stroud and Z. Wu, “Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery”, in Proc. 25th Int. Symp. Fault-Tolerant Computing (FTCS-25), (Los Angeles), IEEE Computer Society Press, 1995. [13] J. Xu, B. Randell, C. M. F. Rubira and R. J. Stroud, “Toward an Object-Oriented Approach to Software Fault Tolerance”, in Fault-Tolerant Parallel and Distributed Systems (D. R. Avresky, Ed.), IEEE Computer Society Press, 1994.

Suggest Documents