First, a simple distributed software model was implemented using software tools, such as Java sockets, VxWorks with sockets,. VxWorks with shared memory, ...
A New Benchmark for Distributed Real-Time Systems: Some Experimental Results Dahai Guo 1 , Jan van Katwijk 2 , Janusz Zalewski3
Abstract In a real-time distributed system, timeliness is a critical issue. Whether or not the system meets its deadlines has a great impact on the system's behaviour, its performance and therefore its reliability. Understanding system behaviour under high loads is an important aspect for the prediction of system performance. To evaluate the timing behaviour of distributed real-time systems, we propose and evaluate a simple benchmark especially aiming at understanding patterns of behaviour with varying loads. Our benchmark supports two metrics and we are defining a third one. The first one is the overall time the deadlines are mis sed for a given number of tasks, the other is how many times the deadlines are missed by a given time interval, leading to a third metrics, the sensitivity of the system with respect to a varying load. The practical evaluation reported in this paper consists of two parts. First, a simple distributed software model was implemented using software tools, such as Java sockets, VxWorks with sockets, VxWorks with shared memory, VisiBroker CORBA, TAO CORBA and IIOP. Performance of all implementations of the model was evaluated to validate the benchmark. Next, the benchmark was extended to evaluate performance of various parts of a sophisticated Air Traffic Control System. Using this benchmark, we found appropriate lower bounds on the deadlines for message transfers in CORBA-based Air Traffic Control System, as well as the practical limits on message transfers and the sensitivity of the system under various aircraft loads.
1. Introduction A real-time system is a computer system with a defined and bounded response time, whether it is distributed or not. Dealing with bounded response times in stand-alone systems is fairly well understood, however, dealing with bounded responses in a networking environment is more complicated. Many of the network-related activities have an inherent non-deterministic behaviour, involving packet loss, traffic congestions, node failures, etc. It would be helpful if a general applicable theory existed for dealing with the prediction of bounds in the response time, and it is our long-term objective to contribute to the development of such a theory. However, at this point in time our focus is on understanding the impact of the nondeterministism in the behaviour of real-time systems by using benchmarks. We are particularly interested in studying patterns that show how distributed systems are meeting or missing their deadlines, and in studying sensitivity of their behavior as a function of the load parameters. An implementation of a CORBA standard, as a middleware, can glue different modules of a distributed application together, regardless what operating system and programming language those modules are built in. However, complex distributed real-time applications, such as our Air Traffic Control Simulation System (described in detail in [6]), require some degree of assurance that the timing requirements of the entire system are met. Timing properties of these systems can 1 2 3
University of Central Florida, Computer Engineering Program, Orlando, FL 32816, USA TU Delft, Faculty of ITS, PO Box 5031, 2600GA Delft, The Netherlands Florida Gulf Coast University, Computer Science Program, Fort Myers, FL 33965, USA
be evaluated in advance or during system’s operation, to make sure that the possibility of catastrophic events is minimized to save money, property and human lives. Evaluation of performance can be done with carefully designed benchmarks. Taking this into account, the problem addressed in this research is two-fold: • to develop a benchmark or benchmark approach for distributed real-time systems, which allows a better characterization of realistic applications, in particular in relation to the degradation as function of the load (sensitivity), and • to validate this benchmark or benchmark approach on a variety of platforms and apply it to evaluate the performance of a complex Air Traffic Control Simulation System (ATCS). The structure of the paper is as follows. In Section 2 we briefly discuss some existing real-time benchmarks. In Sections 3 and 4, we introduce our own benchmarking approach and discuss the application of the benchmarking principle for our research ACTS. In Section 5, we discuss the new parameter called sensitivity, and in Section 6 draw some conclusions and discuss future work.
2. Real-Time Benchmarks A variety of benchmarks has been developed or is under development for distributed real-time systems in general. It seems that most benchmarking approaches we looked into so far are more concerned with average response time than the problems we are facing, which are more concerned with bounded response. Traditional benchmarks often focus on component level benchmarking, thereby not necessarily dealing with end-to-end constraints. Typical benchmarks we looked at are: • The RhealStone benchmark [2]; • The Hartstone Benchmark [3]; • SW benchmark and SWSL language [4]; • DynBench [5]. The Hartstone Distributed Benchmark is designed to give figures of merit for end-to-end scheduling and timing behaviour of the system. Using this benchmark, the Channel Access protocol can be tested. Channel-Access indicates the ability of a distributed system to successfully share the communication channel among the transmitting nodes. There are two versions of the benchmark defined, the "Master-Slave" (MS) version, and the "Equals" (EQ) version. Our experience shows that neither versions of the benchmark turned out to be quite useful to evaluate systems, such as our ACTS. The explanation is simple: in this kind of systems, different modules communicate through different types of communication at the same time. In general, Air Traffic Control System modules normally perform master-slave, and peer-to-peer functions at the same time, and cannot be categorized as only one of those considered by Hartstone. The SW Benchmark provides distributed, communicating, periodic and aperiodic tasks, driven by a synthetic workload, but was felt to be not completely adequate for our purposes, for reasons similar to the Hartstone Distributed Benchmark. The DynBench is a very advanced benchmark developed in the context of the DeSiDeRaTa project and provides a development and evaluation environment for real-time distributed systems. DynBench includes a set of performance metrics for the evaluation of QoS in distributed real-
time systems. Currently, we are investigating to what extent the DynBench approach could be translated into the approach we are aiming at.
3. Basics of a New Benchmark 3.1 Architecture for real-time systems and a basic benchmarking system In [1] we discuss a high level architecture for real time systems in detail. We are convinced that the approach taken there is an excellent starting point for a more structured development of realtime systems. Our architecture for real-time systems involves 5 task types, cooperating together towards a common goal. The relevant components in our architecture are related towards: • communications, • sensors, • computations, • database and • user interface. Our benchmarking system includes tasks that perform the above mentioned functions: sensor readout and sensor simulation, computation, database access, and user interface, each on a separate node communicating with other selected nodes. We expect the benchmark to help in understanding the behaviour of the system under a varying load, and, especially, to help in gathering some data on the sensitivity of the system under a varying load. The overall configuration is presented in Fig. 1: • • •
• •
Task A1 is simulating a sensor. It generates a random integer once every 20 milliseconds and sends the data to Task B. Task A2 is simulating another sensor. It generates periodically a random integer in an interval of time, and sends the data to Task B for processing. The length of the interval is randomly selected between 10 to 1000 milliseconds. Task B is a computational task, which accepts all incoming data and makes some calculation (in this case, the average of all the data every some time, which can be changed between some minimum value and a maximum value). After computation, the task sends its results to Task C for storage. In the benchmark Task B sends at least 100 result values to Task C. Task C accepts the data from Task B and stores it in a file. Task D provides an interface for the user. In our benchmark, we have chosen for a simple interface, with commands to get the most recent average from Task C and display it on the screen and commands to cause task D to terminate all the tasks, including itself.
The configuration in Fig. 1 is imitating a real-time distributed system, whose performance we want to evaluate. Under the assumption that a real-time system demonstrates satisfiable performance if it meets its timing constraints (deadlines), we propose the following two measures: • the overall time the deadlines are missed; • the number of times the deadlines are missed, which can be evaluated for a particular software module on a particular node.
Fig. 1: Five-task benchmark architecture.
In the following sections, both measures are applied to some different implementations of the five-task benchmark.
3.2 Basic implementations A first implementation was built using Java and Java sockets for interprocess communications. In this implementation, Task B receives data from Task A1 and A2 through sockets and sends data to Task C through sockets as well. All experiments are run on a network of Sun SparcStations under Solaris 2.6. The results of the experiments are presented in Fig. 2. A second and third implementation were done on a local VME board with Motorola 68040 processors, running VxWorks. The second implementation was using sockets as the primary means for intertask communication, while the third implementation was using shared memory for intertask communication. In these experiments, Task B and Task C were located on the VME boards, while Task A1 and Task A2 and Task C resided on a network-connected Unix machine. As expected, it was found that in the socket-based implementations the first deadline is missed normally, which relates to the time spent in establishing the connection. Therefore, the first missed deadline is excluded in each graph for the VxWorks related experiments. Experiments with shared memory show - obviously - that communication is much faster than in the case of use of sockets. This is because all data between Task B and Task C are transferred just through the VMEbus. The combined results for the three cases are shown in Figures 2 and 3. Both graphs show the obvious advantage of using VxWorks shared memory, which validates the applicability of our model. An interesting observation in the results of the experiments is that the implementation language for socket communication (i.e. Java or C) does not seem to make much difference.
Fig. 2: Overall time the deadlines are missed for 100 experiments
Fig. 3: The number of times the deadlines are missed by 2% for 100 experiments
3.3 Corba-based implementations Looking into benchmark figures for distributed real-time systems should be based on the use of some kind of middleware. At this point in time, CORBA can be viewed as a de facto standard for distributed real-time systems [10], [11]. In this section, we therefore discuss the use of CORBA in our benchmark. In the next series of experiments, the five-task benchmark model was implemented using TAO [7] and VisiBroker [8], respectively. In each case, the deadlines are measured at Task C. In this experiment, for both implementations, all five tasks run on five different nodes in each case. Because TAO and VisiBroker are both IIOP compliant ORBs, they can communicate with each other through the IIOP protocol. Therefore two additional cases were considered:
• •
Tasks A1, A2, B run under TAO, while Tasks C and D run under VisiBroker. Task B sends 100 average values to Task C. Tasks A1, A2, B run under VisiBroker, while at the same time Tasks C and D run under TAO. In all experiments, deadlines are measured at Task C.
The combined results for the four cases, using CORBA, are shown in Fig. 4 and Fig. 5. The minimum deadline, which could be achieved in some cases, are 12 and 14 milliseconds, so this is reflected in Figures 4 and 5.
Fig. 4: The number of times the deadlines are missed by 2% in case of CORBA
Fig. 5: Overall time the deadlines are missed for 100 experiments in case of CORBA
The graphs show that TAO calling itself, or TAO calling Visibroker is more efficient than any VisiBroker calls. Furthermore, the graphs clearly show that from a certain point, extending the deadline hardly has any effect on the performance of the system (measured in terms of missed deadlines).
4. Benchmarking Air Traffic Control System Software In [6] we discuss an ACTS as an exercise in software development, based on our high-level realtime software architecture. We have been using this system for a number of experiments and we regard it as an excellent vehicle for benchmarking. In Fig. 6 we show the high-level structure of the simulator, which is implemented on top of CORBA (VisiBroker and TAO), using different
database engines (Oracle and MySql). The communications among modules are through the IIOP protocol. Functions of respective modules are as follows: • The Time Server provides the time via the NTP service. • The Weather Server provides weather information from http://www.weather.com. • The Radar Server simulates the aircrafts paths, based on the flight plans stored in the database. • The Collision Detection module monitors the aircraft paths and sends advisories to GUI module in case when collision may occur. • The Database Server provides functions used to operate flight plans. • The GUI Module gets the information or is sent information from or to other modules, and packetizes it to the Display; one GUI module is in charge of one sector. • The Display gets the packetized information every some time. • The Communication Module will hand off the aircraft from source GUI Information Acquisition Server to destination GUI Information Server, when the aircraft flies across the sector boundaries • The Data Recording Server provides functions used to save events description.
Fig. 6: Architecture of the ATCS Implementation in VisiBroker CORBA
The controller's Display or GUI, shows the relevant attributes of the air traffic for the purpose of this benchmark. The modules shown in Fig. 6 provide their services and send their computation results to GUI acting as servers. A module, which acts as a client to GUI is the collision detection software, which normally collects the information from flight plan database, the radars and the weather server and sends advisories and alarms to GUI, the latter acting as a server.
In this ATCS, Display module gets information in the form of packets from the GUI module every some time. When the hand-off between the different sectors is in process, Communication module will tell GUI module the hand-off status. Their relationship is shown Fig. 7.
Fig. 7: Principle of GUI communication with Display and Communication Server
The relationships between GUI Information Acquisition Server and Communication Server or Display are similar to the relationship between the Tasks called "B" and "C" in the five-task benchmark. Like in the five-task benchmark, we measured the overall time the deadlines are missed in terms of milliseconds as function of length of the chosen deadline, and the number of times the deadlines are missed by 2%, again as a function of the length of the chosen deadline. For each case, 100 measurements were taken. For all measurements, the results are collected for two cases: 4 and 20 aircraft. In this sorts of experiments, it was more meaningful to look at missing deadlines by 20%, rather than by 2%, a direct consequence of the high load of the system. Combined results for all three configurations in case of 20 aircraft are presented in Figures 8-9. Figures 10 and 11 show a comparison between simulations involving 4 and 20 aircraft for Collision Detection module cooperating with GUI.
Fig. 8: Overall time the deadlines are missed (milliseconds) in 100 experiments, for 20 aircraft.
Fig. 9: The number of times the deadlines are missed by 20%, in 100 experiments, for 20 aircrafts.
Fig. 10: Overall time the deadlines are missed (milliseconds) in 100 experiments, for the Collision Detection module cooperating with the GUI (4 and 20 aircraft).
Fig. 11: The number of times the deadlines are missed by 20% in 100 experiments, for the Collision Detection module cooperating with the GUI (4 and 20 aircrafts)
5. Measuring Sensitivity It follows from the graphs presented in the previous sections that some curves descend smoothly, while some other curves descend very sharply. As an example, in Fig. 2, it shows clearly that the performance of the VxWorks implementation begins to worsen rapidly when the deadline is shortened below 1000 milliseconds. Therefore, we may want to study what happens or changes in more detail, when the deadline is exactly around 1000 milliseconds. On the other hand, in Fig. 5, it is shown that some curves, including the TAO implementation, are relatively indifferent to the predefined deadlines, since they remain fairly flat and none of the points exhibits anything special. This means that the system is not very sensitive to changes of deadlines within the range studied. Based on these observations, we define a new parameter as a metric for system performance, called sensitivity: Sensitivity of a real-time system is a measure of how fast the system's responses change when deadlines are increased or decreased. The practical interpretation of sensitivity is that it shows whether performance degradation occurs sharply or gracefully. Sensitivity should be represented quantitatively by the ratio of the change of results over the range of deadline lengths for the changed interval. A first order approximation is to derive the slope coefficient "a" of the line "y=ax + b" that can be drawn trough the endpoints of the graph. For example, for the graph representing CORBA/TAO (in Fig. 5) this results in a value of "a" of 0.024925. For the graph representing the overall time the deadlines are missed for the communication module cooperating with the GUI (in the 4 aircraft case, given in Fig. 10), the value would be 0.084141. These raw numbers, however, are anti-intuitive, since looking at the graphs it is clear that the system the first graph is related to (Fig. 5), is more sensitive than the second one studied (Fig. 10), while the numbers tell differently. Therefore, we need to adjust the ratio and obtain relative numbers, by taking the average result into account. The average value for the relevant graph in Fig. 5 is 32.7, while the average value for the relevant graph in Fig. 10 is 2003.429. Calculating relative values by dividing the ratios by the average values, the sensitivity values obtained are 0.000762 and 0.000042 respectively. Furthermore, one can even obtain absolute and comparable sensitivity values by using the formula: (y2 – y1) / [(y1 + y2) / 2], which normalizes the result. Using this method, it can be seen that the five-task model implementation with VxWorks shared memory and VxWorks sockets is the most sensitive one. The least sensitive implementation is the one with the communication between Communication module and GUI module with 4 aircraft. Overall, the sensitivity parameter tells us, how fast the system degrades, if the deadlines are shortened, that is, how fast it gets saturated. A detailed description of computing sensitivity is discussed in [9].
6. Conclusion In this, we investigated issues related to real-time performance in systems operating in a networking environment. To study patterns, showing how distributed systems are meeting or missing their deadlines, we developed a simple benchmark, as a model of operation. Through experiments with the five-task model and through exercising a more elaborate case, the Air Traffic Control System, it was confirmed that the overall time, by which the deadlines are missed, and the number of times the deadlines are missed by a certain time length, are largely depending
on the type of workload and the length of deadline. Our experiments give some quantitative data, which we expressed as the sensitivity factor. First, to validate the five-task model we implemented it in Java sockets, VxWorks sockets, VxWorks shared memory, VisiBroker CORBA, TAO CORBA and IIOP. The benchmarking tests were separated into two groups. The first group, for Java sockets, VxWorks sockets and VxWorks shared memory indicated roughly the same performance for both types of sockets, with a huge improvement of performance in case of shared memory communication. In the second group, for CORBA, the TAO CORBA has much better performance than VisiBroker CORBA; in IIOP implementations, the performance is better for TAO CORBA calling VisiBroker CORBA and nearly the same as in the implementation in TAO CORBA only. VisiBroker CORBA calling TAO CORBA has worse performance than TAO CORBA calling VisiBroker CORBA, but better than the implementation in VisiBroker only. The CORBA experiments additionally validated the concept of this benchmark. Benchmarking results also acknowledge the relation between the length of deadlines and performance. In most cases, the performance degradation will occur when the deadlines are shortened, ultimately leading to crash of the system. On the other hand, the overall performance of the systems we looked at is roughly independent on the length of the deadlines, once these lengths exceed a certain threshold. In the measurements for the Air Traffic Control Simulation System, it was acknowledged that the number of simulated aircraft has a significant impact on performance. A five-times increase in the number of aircraft causes a degradation in the performance of the whole system from two times for light-load modules to two orders of magnitude for heavy-load modules, in terms of the overall time the deadlines are missed. If the workload remains the same, the performance will also get better while the deadline length is increased. In the experiments with GUI module for higher number of aircraft, almost each deadline was missed by 20%, regardless of the length of the deadline. The rationale is that the time needed for data transmission with a higher number of aircraft is much greater than the predefined deadline. In the GUI module, each time a request to display information is received, it will call other modules for specifics, which effectively increases the computation time more than linearly. As a result, applying the benchmark to the Air Traffic Control Simulation System acknowledges the intuition quantitatively that the computational bottleneck of the entire system is in the GUI module. The new benchmark also shows system's sensitivity to shortening the deadlines as well as increasing them. However, in some of the experiments, the benchmarking results did not vary much in response to changing the deadlines and workloads. For example, in the experiments with VisiBroker, the overall time missed by the deadlines does not depend much on the deadline length. This behaviour may require further studies based on our sensitivity metric and developing new metrics to more accurately measure the performance of distributed real-time systems.
References [1] J. Zalewski, "Real-Time Software Architectures and Design Patterns: Fundamental Concepts and Their Consequences", Annual Reviews in Control, Vol. 25, pp.133-146, 2001 [2] R.P. Kar and K. Porter, “Rhealstone - A Real-Time Benchmarking Proposal”. Dr Dobbs Journal, Vol. 14, No. 2, pp. 4-24, February 1989.
[3] D.L.Kiskis and K.G.Shin, "SWSL: A Synthetic Workload Specification Language for RealTime Systems", IEEE Transactions on Software Engineering, Vol. 20, No. 10, pp. 798-811, October 1994. [4] B Shirazi, L. Welch et al., “DynBench: A Dynamic Benchmark Suite for Distributed Real Time Systems”, Parallel and Distributed Computing Practices, Vol. 3, No. 1, March 2000 [5] B.G.Ujvary and N.I. Kamenoff, "Implementation of the Hartstone Distributed Benchmark for Real-Time Distributed Systems: Results and Conclusions", Proceedings of the 5-th International Workshop on Parallel and Distributed Real Time Systems, pp. 98-103, IEEE Computer Society Press, Los Alamitos, Calif., 1997. [6] J. van Katwijk, J.-J. Schwarz, J. Zalewski. “Practice of Real-Time Software Architectures”, Proc. IFAC Conference on New Technologies for Computer Control, Hong Kong, P.R. of China, November 19-21, 2001. [7] TAO, http://www.cs.wustl.edu/~schmidt/TAO.html [8] VisiBroker, http://www.borland.com/visibroker [9] D. Guo, “Real-Time Computing in a Networking Environment: An Air Traffic Control System Case Study”, MSc Thesis, Department of Electrical and Computer Engineering, University of Central Florida, Orlando, Fla., 2001. [10] A. Polze, K. Wallnau, D. Plakosh, M. Malek, "Real-Time Computing with Off-the-Shelf Components: the Case for CORBA", Parallel and Distributed Computing Practices, Vol. 2, No. 1, pp. 1-14, 1999. [11] W. Jiang, "Experience of Applying CORBA Middleware to Air Traffic Control Automation Systems", Proceedings of the 16th DASH Digital Avionics Systems Conference, AIAA/IEEE , Vol. 1, pp.1.2 -8-15, 1997