geared for modeling distributed applications over simulated inter-networks. In this paper we ... grained models for the assessment of OSI physical-, data link- ...
A Novel Framework for the Evaluation, Verification and Validation of Distributed Applications and Services ANTONIO LIOTTA, CARMELO RAGUSA, MARCO BALLETTE Centre for Communication Systems Research – School of Electronics, Computing and Mathematics University of Surrey Guildford, Surrey, GU2 7XH, UK UNITED KINGDOM Abstract: - A recurrent problem encountered by distribute system designers is that of verifying, validating and evaluating the performance of complex software systems. The behavior of these systems generally depends on how the various software entities inter-relate and on the status of the underlying inter-network. Although many aspects of the distributed software engineering process have been addressed, there is still the need to investigate suitable simulation tools and methodologies that provide support to all seven OSI layers and are geared for modeling distributed applications over simulated inter-networks. In this paper we present such a simulation framework, illustrating its use and applicability through a number of example applications including application-level networking, mobile agent systems, GRID computing, and mobile services for 3G. Starting from the lessons learned from implementing those applications we propose a general methodology for the assessment of distributed systems and applications. Key-Words: - Application-level simulation; simulation tools; parallel and distributed computing.
1 Introduction The importance as well as the complexity of distributed systems and applications has been increasing pari passu with the level of distribution, dynamics, and scale of inter-networks. Stand-alone applications have become an exception, while the majority of existing applications rely on more and more sophisticated software paradigms. These range from the popular Client-Server (CS) model used in many web-based applications to the Code on Demand (CoD) model realized, for instance, through Java applets. An alternative to CoD is offered by the Remote Evaluation (REV) [1] model, often referred to as ‘code pushing’. In this case, the client application can dynamically extend the server capability, for instance through Jini [2]. Modern inter-networked systems adopt even more complicated communication schemes such as the Peer-to-Peer (P2P) model [3], which allows overcoming the bottlenecks typically associated to the CS approach. In the past decade we have also seen a plethora of applications realized through mobile software components or Mobile Agents (MAs) [14]. In this case the location of the application components may vary at run time, depending on the user and network context. Because of their complexity, network dependence, and unpredictable run-time behavior, distributed applications pose serious hurdles to the designer. A recurrent problem is that of verifying,
validating and evaluating the performance of complex software systems before their full-scale development and deployment. Two alternative approaches may be pursued towards distributed systems realization and evaluation: prototyping and simulation. Each of these approaches unveils different, complementary aspects of the run-time system behavior. Prototyping has the advantage of allowing measurements of real overheads introduced by the system but is, however, limited by the scale of the experimental testbed. In other words, distributed, network-bound applications – i.e. applications whose behavior depend on the network state – cannot be fully assessed until large-scale deployment. Per contra, a simulation-based approach does provide better means for evaluating performance and scalability and have the additional advantage of allowing the assessment of other important aspects such as correctness, validity, robustness, and stability. Therefore, simulation is often chosen as a first step towards the development of distributed algorithms, applications and services. The most common practice is, however, to create a simplified model of the system under scrutiny and develop ad hoc simulation software which embeds the application as well as the network behavior. This approach is time- and resource- consuming and is only justified/viable under specific circumstances – e.g. in relatively large and long projects.
The alternative to ad hoc simulators are simulation environments that are either network- or application-specific. The former provide finegrained models for the assessment of OSI physical-, data link-, network- and transport layers. A plethora of network simulators has been developed for such purpose, including NS [4], OPNET [5], JavaSim [6] and PARSEC [7]. Network simulators, however, tend to offer very light support to application-layer simulation. On the other hand, application-specific simulators do suffer from the opposite problem, i.e. they provide good modeling to applications but are based on fairly abstract, high-level network modeling. An example can be seen in the context of GRID computing for which many simulators have been developed, including Bricks [8], GridSim [9] and Simgrid [10]. In this case the focus is on evaluating the performance of application-level functions such as those of resource scheduling or load balancing. We introduce herein a novel simulation framework addressing the abovementioned requirements of distributed, network-dependent applications. Figure 1, exemplifies our approach which can be seen as an extension of network-level simulation with additional application-level support. This provides specific primitives for modeling a variety of applications including services, mobile services, agent systems, mobile code, and GRID computing, as detailed in Section 3. Services
Applications
Agent Systems
…
Model Application Level
Network Level Simulation Environment
Figure 1. Proposed simulation framework.
By providing support to all seven OSI layers (rather than just to a subset of them), our framework allows a new way of performing application-level simulation. The system is particularly geared to ease and rapid application prototyping. Through a simple scripting language, it is possible to define application scenarios (e.g. number of users and software components, software interaction paradigms, etc) and network scenarios (e.g. network topology, congestion, faults, etc). We can therefore capture the effects produced by network changes onto the application and vice versa with the aim of studying the performance, scalability, and stability of the application, verifying also its correctness and validity.
Section 2, describes the network simulator used as the basis of our framework. The building blocks for application support are described in Section 3. The usage of our framework is illustrated in Section 4 through a number of example applications including application-level networking, mobile agent systems, GRID computing and mobile services for 3G. Finally, starting from the lessons learned from implementing those applications we propose a general methodology for the assessment of distributed applications (Section 5).
2 Network-level Simulation The simulation framework of Figure 1 has been designed incrementally, within an existing opensource network simulator, in order to re-use a validated and verified set of networking simulation tools, protocols, and functions. As a starting base, we have chosen the NS network simulator [1], which is widely used by the networking community, as documented by the extensive information available at the NS web site [11]. We summarize below some background information on NS. Our extensions for distributed applications support are detailed in Section 3. NS is a network, discrete event simulator that runs in a non-real-time fashion and supports the most common physical, link, network, transport and application layer protocols for fixed and mobile communication networks. Over the last few years, its extensive use in the networking research community has contributed to make it a valuable tool for studying, improving, and introducing new protocols. In addition to fixed networking, NS supports wireless and satellite networking, unicast and multicast routing, centralized and hierarchical routing, static and dynamic routing. Extensions to the core NS have been provided by researchers worldwide, including support to UMTS, GPRS, Mobile IPv6, BlueTooth, RSVP, Differentiated Services, MPLS, active networking, IEEE 802.11 for WLAN, multi-hop wireless ad-hoc networking, and Cellular IP (the interested reader may refer to [1] for software and documentation). NS enjoys also the advantages of being an open software with an extensible architecture (Figure 2). The user specifies the simulation scenarios in Otcl (these are interpreted in the NS Kernel). The simulator functionality may be enhanced by creating new C++ classes, adding new functions to the NS library, and mirroring the new code onto OTcl classes.
TCL Scripts TCL Interpreter
Network Link
Topology
Transport
C++ Core
Scheduler
Application
Ph ysical
Trace File
Statistical analysis interpretation
Figure 2. The environment.
NAM
NS network simulation
Two important tools come with NS, the Network Animator (NAM) [11] and the GT-ITM topology generator [12]. NAM generates graphical animations of trace files generated by NS simulations and is used for verifying the correctness of the simulations. GT-ITM is used to automate the process of generating networks for the purpose of evaluation. It generates realistic Internet-like topologies following a well-established methodology [13]. Although very geared for networking, NS has not been developed for end-user application simulation. Its application layer includes simple OSI layer 7 functions (i.e., network applications). Examples include FTP, Telnet and other traffic generators such HTTP and Web generators, aimed at experimenting with networking and communication protocols. The extensions described below can be seen as part of the NS application layer, although they provide generic support to user-level, distributed applications, algorithms, services, and systems.
3 Application-level Simulation User applications are considerably more complex than network applications in terms of software distribution, communication paradigm, use of resources, etc. Thus, when it comes to modeling user applications for simulation purposes, appropriate support must be sought by the simulator. For this purpose, we have introduced in NS a number of building blocks aimed at facilitating the modeling of complex distributed applications, as depicted in Figure 3. We can model four different dimensions of the application: the software development paradigm (CS, REV, CoD, and MA); the type of software entities (objects, components, and agents); the consumption of resources by the various software entities (hardware, software, and hybrid); and the type of application monitoring paradigm (centralized, hierarchical, and distributed).
Figure 3. End-user application support in NS.
Complex applications such as MA systems or GRID applications also need a substantial level of management support of resources and services. We have introduced specific models for service/resource discovery, load balancing, reservation and allocation. Finally, models for the specification of the workload for the purposes of benchmarking and evaluation have been introduced. The use and application of our simulation framework is illustrated in the following section through a number of different case studies that we have implemented over the past three years.
4 Running Distributed Applications over Simulated Networks 4.1 Mobile Software Agent Systems Mobile Agents (MAs) are computational entities that act on behalf of some other software entity, exhibit some degree of autonomy and are particularly featured with migration capability. Other properties include re-activeness, proactiveness, adaptability and cloning. Because of these properties, MA systems behavior, performance and effectiveness cannot always be anticipated at design time. Our framework proved particularly useful in this context in order to assess a novel agent deployment algorithm in the context of Internetworks. The detailed description of the algorithm is beyond the scope of this paper and can be found in [14]. We used MAs to solve the following network partitioning problem: given a large-scale, dynamic inter-network find a scalable algorithm to create p partitions and the center of each of them. By modeling our MA-based algorithm based on the primitives introduced in Figure 3, we could carry out a methodical simulation-based analysis for different types of networks. We could also assess the
computational complexity of our algorithm (that was found to be linear with network size), its optimality (that was found to be near-optimal via a comparison with an existing provably near-optimal algorithm), and resilience to network fault and congestion (this was done by using different network scenarios). The MA simulation support functions– including agent creation, migration, cloning and termination – have also been used for the MA systems described in Sections 4.2, 4.3 and 4.5. This means that we have found a general way to model MA systems on top of the NS simulator.
Having modeled our solution over NS we could then carry out a direct comparison with another existing layer 3 multicast protocol, i.e. the Distance Vector Multicast Routing Protocol (DVMRP). We assessed the performance degradation introduced by our protocol in terms of average multicast build-up time (see Figure 4) and average stress ratio. As expected, the performance of the application-layer protocol is worse than the case of native layer 3 multicast. However, we could prove that our approach was feasible in terms of scalability and performance, in addition of being a viable application-layer solution.
4.2 Distributed Network Monitoring In this case, we wanted to assess the adaptability of an MA-based network monitoring system in face of changing network conditions. We have designed a distributed monitoring system composed of a number of area monitors that were implemented as MAs. As such, when the network conditions changed, MAs could self-relocated themselves, aiming at incurring the minimum possible overheads into the system [15]. Our solution to the problem depended upon MAs constantly reading routing tables from local routers in order to build cost estimates that were, in turn, used to compute the best location within the internetwork. In such a way, as network condition changed (e.g. because of congestion of network faults), routing tables were changed by the routing protocols. MAs could thus indirectly sense those changes and migrate accordingly. This case study highlights another specific situation in which the application behavior is tightly intertwined with the network behavior (i.e. routing protocols, network traffic profile etc). Because of that, the natural way of assessing the MA system was to use our framework. We could assess a number of interesting properties such the sensitivity of the monitoring system to the routing protocol, network scale, and network dynamics and the resilience to network fault and congestion.
Avg Tree Build Time (sec)
DVMRP VS MA-Based 2.5 2 1.5
MA-Based
DVMRP
1 0.5 0 0
100
200
300
400
Topology size (No of nodes)
Figure 4. Performance application-layer multicast.
of
MA-based
4.4 Resource Management Algorithms in GRID Computing A completely different application domain for our simulation framework falls under the umbrella of GRID computing [17]. In this case, we were interested in assessing various resource management algorithms including resource reservation and allocation mechanisms. More specifically, we have considered the Globus Resource Allocation Manager [17], implementing some of its core functions within the service/resource management layer depicted in Figure 3. GRID nodes have been realized in NS as a new NS agent (Figure 5). MDS
RSL Job RSL Job RSL Job
GRAM
Incoming Grid job request
Grid Resource Management Strategy
4.3 Application-level Multicast
Node status information
Local Jobs Local Jobs Local Jobs Local Jobs
Figure 5. The GRID simulation model.
Node N
Node 2
Node N -1
GRAM domain
Node 1
In [16] we have introduced a novel MA-based, application-level multicast solution. In this case, an MA-based multicast overlay is build for networks lacking layer 3 multicast support. Similarly to the previous cases, MAs sense the network environment and autonomously decide ‘where’ and ‘when’ to migrate in order to maintain a logical overlay network in face of network changes.
The GramAgent can receive incoming requests from resource brokers following the CS paradigm. It can also assign incoming jobs to appropriate GRID entities, depending upon different resource management policies (e.g. co-allocation, reservation, etc.). The agent is also responsible for performing local resource balancing and for GRID monitoringthrough interaction with the MDS (Meta Directory Service) [17].
4.5 Resource Discovery for M-services The last case study is in the context of the evaluation of Mobile Services in fixed and mobile networks. We were interested in evaluating different types of service management and service/resource discovery protocols in a variety of network scenarios and conditions. To start with, we have enhanced the NS node model in order capture not only network but also computational resources (Figure 6). We also modeled our services in terms of software components (both static and mobile) consuming node resources (CPU, memory, disk) as well as interacting among each other (network resource consumption).
applications and services, having the extra dimension of code mobility. We can therefore conclude that any application that can be modeled using the building blocks of Figure 3 can also be assessed within our framework. Clearly there will be applications that require further support. An example is the case of multi-agent systems for which we have not provided any specific support. Remaining in the realm of those applications that can be easily mapped onto Figure 3, we have carried out the exercise of drawing the lessons learned from the prototype implementation. The result was the flow-chart diagram depicted in Figure 7. This identifies the main steps that lead from problem definition (i.e. conception of an idea of a new distributed application/service) to its simulationbased assessment. The reader interested in the ‘Art of System Performance Analysis’ may also want to refer to [18] that provides an insight into how to design tests, scenarios, and simulations and into how to carry out a statistical analysis and the interpretation of the results. Problem Definition
High Level Scenarios
Job Request (DSize,Msize,CPU_time)
Requirements
Modeling Request Execution
Admission control
Resource Discovery
Resource Discovery
Otcl Primitives Available
Model Refinement
N
Create C++ and mirror new Otcl primitives
Y
Disk
Memory
CPU
Design Testing
Decision maker
Design Simulation Scenarios Admission Control
Resource Agent
Figure 6. Computational node modeling.
Finally we re-used a sub-set of the GRID resource management model presented in Section 4.4 in order to simulate flexible service allocation policies. We have then realized an MA-based resource discovery protocol, assessing its performance under light, medium and heavy service request conditions.
5 Lessons Learned and Use of the Simulator The variety and complexity of the case studies presented in Section 4, indicates the potential and range of applicability of our simulation framework. In choosing those case studies we have placed particular focus on MA-based applications that can be considered as a superset of distributed
Data Interpretation Statistical Analysis
Simulations Design
Run Simulations
Analysis
Trace File Scripts
Nam Visual Verification
Figure 7. The process of evaluating distributed application through simulation.
6 Conclusions In this paper we have addressed the issues related to simulating complex distributed applications over simulated networks. We have designed a suitable framework for this purpose, following an incremental approach. Having started from an opensource network simulator, we could trust a range of existing functions (i.e. validated by others) at physical, data-link, network and transport layers. We concentrated upon the application layer, developing a model that allows simulating user-level
applications. We have then used the building blocks depicted in Figure 3 to experiment with a range of applications, as described in Section 4. Finally, we carried out a generalization exercise in order to illustrate the process of evaluating distributed application through simulation (Figure 7). The outcome of our work is more than just another simulator. The effort required to design it has taught us a number of lessons and leads to interesting conclusive results. First, it is possible to assess distributed applications over simulated networks, getting an insight into the interdependencies between application behavior and network state. It is also possible to monitor the effects of the application onto the network in terms of traffic, overheads, delays etc. Second, a careful design of the simulator can result in a viable simulation environment in terms of ease of application modeling, simulation time, and significance of data. Our results are encouraging in terms of countering the largely adopted practice of testing applications via ad hoc simulators. Having tested a number of complex applications over our framework we can conclude that our approach combines a number of advantages: it allows a rapid development of the application model; it relies on a fullyvalidated networking layer; it facilitates software reuse; and it provides information that is difficult to obtain via experimental prototypes (e.g. scalability and network state dependency). On the other hand, there is ample scope for improvement. First, because the case studies have been developed by the same team that has realized the simulator, it is difficult to assess how difficult it would be for other people to use the system. The other aspect that can spark further work relates to the fact that our development was aimed at a feasibility and viability study rather than aiming at completeness. The GRID modules, for instance, need to be further developed in order to take into account the other layers of the Globus Resource Management architecture [17]. The same applies to multi-agent systems, whereas the MA functions are relatively elaborate in the current prototype. Acknowledgments: The technical work presented in this paper has been developed in the context of the POLYMICS project, funded by the UK Engineering and Physical Sciences Research Council (EPSRC) - Grant GR/S09371/01. We also kindly acknowledge the feedback received by the ANWIRE consortium (www.anwire.org) which is funded by the European Community under the contract IST-2001-38835.
References: [1] J. Stamos et al., Remote Evaluation, ACM TOPLAS 12 (4), 537-565, 1990. [2] Jan Newmarch, A Programmer's Guide to Jini Technology, Apress, 2000. [3] D. Liben-Nowell et al., Observations on the Dynamic Evolution of Peer-to-peer Networks, in Proc. of IPTPS'02. [4] www.isi.edu/nsnam/ns/ [5] www.opnet.com [6] www.javasim.org [7] R. Bagrodiaet et al., Parsec: A Parallel Simulation Environment for Complex Systems. IEEE Computer, 31(10):77—85. Oct 1998. [8] A. Takefusa et al., Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms, HPDC8 ‘99. [9] R. Buyya, et al GridSim: A Toolkit for the Modeling and Simulation, J. Conc. & Comput.: Practice and Experience, May 02. [10] H. Casanova. SimGrid: a Toolkit for the Simulation of Application Scheduling. Proc. of the 1st IEEE/ACM Int. Symposium on Cluster Computing and the Grid, May 2001. [11] www.isi.edu/nsnam/nam/ [12] E. Zegura, The GT-ITM Topology Generator, (www.cc.gatech.edu/fac/Ellen.Zegura/). [13] K.L. Calvert et al., Modeling Internet Topology, IEEE Comm. Mag., June, 1997. [14] A. Liotta, Towards Flexible and Scalable Distributed Monitoring with Mobile Agents, PhD Thesis, Dept. of Computer Science, University College London, London, UK, 2001. [15] A. Liotta et al., Exploiting Agent Mobility for Large Scale Network Monitoring, IEEE Network, special issue on Applicability of Mobile Agents to Telecommunications, Vol. 16, No. 3, IEEE, May/June 2002. [16] C. Ragusa et al., A Scalable Application-level Multicast Approach based on Mobile Agents, Proc. of ICON'03, Sydney. [17] I. Foster and C. Kesselaman (eds.). The Grid: Blueprint for a New Computing Infrastructure MorganKaufmann, 1999. [18] R. Jain, The Art of Computer Systems Performance Analysis, Wiley, 1991.