IEEE TRANSACTIONS ON COMPUTERS,
VOL. 51, NO. 8,
AUGUST 2002
881
Guest Editors’ Introduction to Special Section on Asynchronous Real-Time Distributed Systems E. Douglas Jensen and Binoy Ravindran
æ
A
SYNCHRONOUS real-time distributed systems are emerging in many domains, including defense, space, financial markets, autonomy and artificial intelligence, telecommunication, and industrial automation for real-time control above the device-level. Such systems are fundamentally distinguished by the significant runtime uncertainties that are inherent in their application environment and system resource states. Another source of nondeterminism is that some events and state changes are apparently spontaneous to the computer system per se because their causal reasons are from outside the system. Consequently, it is difficult to postulate upper bounds on application workloads or distributions for failure occurrences for such systems that will always be respected at runtime. Thus, they violate the deterministic foundations of hard real-time theory that ensures that all timing constraints are always satisfied under deterministic postulations of application workloads, execution environment characteristics, and failure distributions. Asynchronous real-time distributed systems thus raise the fundamental, apparently contradicting, issue: “How to build timely systems that operate in the presence of uncertain timeliness?” This special section presents papers that answer this question by focusing on different, but fundamental problems in asynchronous real-time distributed computing systems. The section presents four papers that address fundamental problems, including uniform agreement, group communication, and group priority inversion. Furthermore, the section presents a paper that describes a generic architectural construct for asynchronous real-time distributed systems. From the papers, we find that two divergent schools of thought are emerging. The two schools of thought are divergent in that each school of thought contradicts the other. The first school of thought is the “measure-compareadapt” approach. Mishra and Fetzer and Wang, Anceaume, Brasileiro, Greve, and Hurfin show how group
. E.D. Jensen is with the Mitre Corporation, Center for Air Force C2 Systems, PO Box 1089, Sherborn, MA 01770. E-mail:
[email protected]. . B. Ravindran is with the Real-Time Systems Laboratory, the Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061. E-mail:
[email protected]. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number 116307.
communication services can be constructed and the group priority inversion problem can be solved, respectively, in asynchronous real-time distributed systems using the Timed Asynchronous (TA) model. Furthermore, Verissimo and Casimiro show how asynchronous real-time distributed systems can be built using the Timely Computing Base (TCB) architectural construct. The TA model and the TCB construct are based on the principle that uncertainty in asynchronous real-time distributed systems can be countered by postulating upper bounds on delays for timing variables such as clock drift rates and end-to-end interprocess communications. Based on such postulates, and thus assuming a partially synchronous model, one can construct higher level services such as group communication services or network of TCB modules that can detect when timing failures occur at runtime. Fundamental to this belief is that such postulates on upper bounds on timing variables are respected most of the time in asynchronous realtime distributed systems, but clearly, not always. Upon detection of timing failures, one may employ some sort of an adaptation scheme to counter the failure. While Mishra and Fetzer is silent on how adaptation can be achieved as their focus is on how the group communication service itself can be constructed, Verissimo and Casimiro propose the notion of coverage stability, which provides a framework for runtime adaptation. Wang, Anceaume, Brasileiro, Greve, and Hurfin present a protocol for solving the group priority inversion problem that occurs in real-time distributed systems that perform actively replicated processing based on static priorities. Group priority inversion is an extension of the priority inversion problem that was originally studied in the context of single processor systems. Their protocol assumes the TA model that is equipped with failure detectors. The second school of thought is the “no runtime adaptation, but guaranteed safety” paradigm. Hermant and Le Lann subscribe to this divergent philosophy. They believe that the “measure-compare-adapt” school of thought cannot help in improving timeliness guarantees. This is due to 1) runtime uncertainties that will cause postulated upper bounds on timing variables to be violated and, thus, the fail-aware property (which is used to detect timing failures) itself is lost and 2) the difficulty in conducting accurate schedulability analysis, which is exacerbated by the need to account for the overhead of the measure-compare-adapt techniques for performing runtime adaptation.
0018-9340/02/$17.00 ß 2002 IEEE
882
IEEE TRANSACTIONS ON COMPUTERS,
Hermant and Le Lann present an asynchronous solution for the uniform consensus problem that does not make any assumptions on timing variables such as upper bounds on interprocess message delays. First, considering an asynchronous model, they prove the safety of the solution, i.e., all processes correctly agree on some value. Then, considering a partially synchronous model, they analytically determine computable functions for timing variables that can be instantiated with known workload and failure hypothesis. Thus, the philosophy of their approach is that partial synchrony should be considered only for proving timeliness. They discuss the merits of “late binding” a design to some partially synchronous model, in terms of solution coverage and performance. Given an application, one would use such analytically established computable functions and conduct worst-case schedulability analysis using known workload and failure hypothesis of the particular application situation and thus obtain a “yes” or “no” answer on timeliness. If yes, then timeliness is guaranteed, besides safety. Of course, if the workload or failure hypothesis is violated at runtime, then the timeliness guarantees do not hold, but safety is still assured due to the “time-free” nature of the solution. Since the papers took divergent views on reconciling the apparent timeliness paradox in asynchronous real-time distributed systems, we, as Guest Editors, had requested that the papers by Mishra and Fetzer, Verissimo and Casimiro, and Hermant and Le Lann comment on the opposite viewpoint. We believe that the comments present an interesting comparison and contrast section in their papers on the divergent schools of thought. We warmly thank Professor Jean-Luc Gaudiot and the TC Editorial Board for allowing us to organize this special section. We thank all the reviewers of the special section for their diligent reviews. Finally, we thank all the authors of the special section for submitting their work, especially the aforementioned authors who went “the extra mile” to comment on the opposite viewpoint.
E. Douglas Jensen is the chief scientist of the Information Technologies Directorate at the MITRE Corporation. His principal focus is on time-critical resource management in dynamic distributed object systems, particularly for combat platform and battle management applications. He directs and conducts research, performs technology transition, and consults on DoD programs. He joined MITRE from previous positions at Hewlett-Packard, Digital Equipment, and other computer companies. From 1979 to 1987, he was on the faculty of the Computer Science Department at Carnegie Mellon University, where he created and directed the largest academic realtime research group of its time. Prior to that, he was employed in the real-time computer industry, where he engaged in research and advanced technology development of distributed real-time computer systems, hardware, and software. He is considered one of the original pioneers and leading visionaries, of distributed real-time computer systems and is widely sought throughout the world as a speaker and consultant.
E. Douglas Jensen, The MITRE Corporation Binoy Ravindran, Virginia Tech
VOL. 51,
NO. 8,
AUGUST 2002
Binoy Ravindran received the Master’s degree from the New Jersey Institute of Technology (NJIT) in 1994 and the PhD degree from the University of Texas at Arlington (UT Arlington) in 1998, both in computer science. He is an assistant professor in the Bradley Department of Electrical and Computer Engineering at Virginia Tech. His current research focus includes real-time distributed systems having application-level, end-to-end quality of services (e.g., timeliness, survivability, security). Through DARPA and ONRfunded research programs, Dr. Ravindran has developed key real-time resource management technology that enables the engineering of future real-time combat systems of the US Navy. He received two “Student Achievement Awards” from NJIT for outstanding research by a graduate student (1994 and 1995) and a “Doctoral Dissertation Award” from UT Arlington for outstanding doctoral research (1998). During 1999, Dr. Ravindran was an invited speaker at INRIA, France, and at the ARTES (Swedish National Strategic Research Initiative in Real-Time Systems) Real-Time Week, Sweden. He also served as the program chair of the 1999 IEEE International Workshop on Parallel and Distributed Real-Time Systems.