cation and scheduling algorithm that maps passively repli- cated application ..... rate-monotonic scheduling algorithm [12] as described in. Section 2.1, higher ...
Resource-Aware Deployment and Configuration of QoS-enabled Middleware Jaiganesh Balasubramanian† , Friedhelm Wolf† , Abhishek Dubey† , Aniruddha Gokhale† , Chenyang Lu‡ , Priya Narasimhan§ , and Douglas C. Schmidt† † Department of EECS, Vanderbilt University, Nashville, TN, USA ‡ Department of CSE, Washington University in St. Louis, St. Louis, USA § Department of ECE, Carnegie Mellon University, Pittsburgh, PA, USA Abstract Ad hoc deployment and configuration (D&C) of faulttolerance mechanisms (e.g., replica-host mapping and failover ordering of replicas) can lead to unacceptable response times, overloads, and low-availability for soft realtime applications. This paper describes how our quality-ofservice (QoS)-enabled middleware called DeCoRAM (Deployment and Configuration Reasoning via Analysis and Modeling) provides a holistic and automated solution to fault-tolerance and real-time D&C through two novel capabilities. First, DeCoRAM provides a deployment-time allocation and scheduling algorithm that maps passively replicated application components to appropriate hosts subject to their soft real-time requirements and determines the failover order of application replicas based on their worstcase state synchronization delays. Second, DeCoRAM’s model-driven D&C engine deploys and configures replicas for each application and provides resource-aware failover and delay-bounded state synchronization between backup and primary replicas. Empirical results on a Linux cluster demonstrate that DeCoRAM can automatically configure fault-tolerance middleware capabilities at deployment time and deliver desired soft real-time performance for clients operating in the presence of failures at runtime.
1 Introduction Emerging trends and challenges. Distributed real-time and embedded (DRE) systems, such as shipboard computing environments, intelligence, surveillance and reconnaissance systems, and smart buildings, are increasingly developed using quality-of-service (QoS)-enabled middleware, such as Real-time CORBA and Distributed Real-time Java. DRE systems operate in resource-constrained environments and often consist of soft real-time applications whose availability and timeliness requirements must be satisfied simultaneously. For example, in SCADA systems for power grid monitoring, remote terminal units must continue to process updates from sensors monitoring power grid failures in a timely manner, even when processor failures occur. ACTIVE and PASSIVE replication are common approaches for building fault-tolerant applications for DRE systems. Due to its low resource consumption, PASSIVE replication is appealing for soft real-time applications that cannot afford the cost of maintaining active replicas and
need not assure hard real-time performance. In PASSIVE replication [15] only one replica—called the primary— handles all client requests, and backup replicas do not incur runtime overhead, except for receiving state updates from the primary. If the primary fails, a failover is triggered and one of the backups becomes the new primary. The advent of middleware that supports applicationtransparent passive replication [7, 5, 15] appears to simplify the development of fault-tolerant DRE systems. In practice, however, simultaneously meeting real-time and faulttolerance requirements is hard due to the need to support fault-tolerance in a resource-aware manner that satisfies soft real-time application requirements [13]. In particular, the following problems must be addressed to deploy and configure (D&C) DRE systems: • Application developers must determine how to configure middleware fault-tolerance properties (e.g., replicahost mapping and client failover order) to ensure that DRE system availability and performance requirements are met. Ad hoc fault-tolerance configurations can lead to unacceptable response times, overloads, and low-availability applications. • Even after values for fault-tolerance configurations (e.g., client failover order) are determined, application developers are ultimately responsible for deploying and configuring the underlying QoS-enabled middleware [3, 14], so that runtime passive replication support is provided. Manual configurations can yield tedious and error-prone source code changes whenever real-time and/or fault-tolerance requirements of applications change. Solution approach → Deployment and Configuration Reasoning via Analysis and Modeling (DeCoRAM). To address the problems described above, we have designed a QoS-enabled middleware approach called DeCoRAM. DeCoRAM can automatically deploy and configure passively replicated DRE systems and provide soft real-time assurance for applications while tolerating user-specified number of processor failures. This paper describes and empirically evaluates how DeCoRAM provides the following contributions to developing passively replicated soft realtime applications for DRE systems: • DeCoRAM’s deployment-time allocation and scheduling (DAS) algorithm, which allocates and schedules passively replicated application components onto appropriate processors while satisfying both their real-time
and fault-tolerance requirements. This algorithm also schedules backup replicas for each application to ensure delay-bounded state synchronization between primary and backup replicas. Moreover, the DAS algorithm determines the failover order of backup replicas based on their increasing state synchronization delays. • DeCoRAM’s model-driven D&C engine, which uses declarative domain-specific techniques [4] to specify application real-time and fault-tolerance requirements. Given a set of inputs, DeCoRAM’s DAS algorithm (outlined above) determines the fault-tolerance configurations the system needs to meet the specified QoS requirements. DeCoRAM’s model-driven D&C engine automatically configures the underlying QoS-enabled middleware used by applications with the decisions made by the DAS algorithm to simultaneously achieve soft real-time and high availability assurance in the presence of N processor failures. We evaluate DeCoRAM empirically in the ISISlab testbed (www.dre.vanderbilt.edu/ISISlab). The results demonstrate how DeCoRAM can dynamically maintain both system availability and desired soft real-time performance for clients, while incurring negligible runtime overhead. Paper organization. The remainder of this paper is organized as follows: Section 2 describes the fault model and technology infrastructure that underlies our work on DeCoRAM; Section 3 describes the structure and functionality of DeCoRAM; Section 4 empirically evaluates DeCoRAM in the context of distributed soft real-time applications with varying real-time and fault-tolerance deployment and configuration requirements; Section 5 compares DeCoRAM with related research; and Section 6 presents concluding remarks.
in DRE systems is usually much longer than periods of the services. Processor failures are therefore assumed permanent. To provide fault-tolerance, DeCoRAM employs PAS SIVE replication [15], where services are replicated and deployed across multiple processors. Its goal is to tolerate any N number of processor failures in the system of M processors, and hence for each service Ti , we deploy N backup copies. Each backup replica of Ti is represented by the tuple , where Si is the backup replica’s worst-case state synchronization time (WCSST) with the state of the primary replica Ti . The WCET Ei , and the request period Pi is same as the respective WCET and request period of the primary replica Ti . The primary replica Ti is always executed and each of the N backup copies become a primary copy, depending on the failover order configured in the underlying QoSenabled middleware [15]. When a backup copy of a service is not the playing the role of a primary replica, it only consumes time equal to the WCSST required for synchronizing its state with the state of the primary replica; otherwise it consumes time equal to the WCET required for processing periodic client requests. All services allocated to a processor (including the backup replicas) are scheduled using the Rate-Monotonic algorithm [12]. We assume that networks provide bounded communication latencies and do not fail or partition. This assumption is reasonable for many soft real-time systems, such as SCADA or shipboard computing systems, where nodes are connected by highly redundant high-speed networks. Relaxing this assumption by integrating DeCoRAM with network-level fault tolerance and QoS-enabled middleware management techniques [2] is an area of future work.
2 DeCoRAM’s Fault Model and Technology Infrastructure
2.2 Overview of the Fault-tolerant Load-aware and Adaptive middlewaRe (FLARe)
This section describes the system and fault models that forms the basis of for DeCoRAM. We also describe the structure of the QoS-enabled middleware infrastructure that DeCoRAM is built upon.
FLARe [3] is a middleware framework that achieves fault-tolerance through PASSIVE replication of distributed objects, as shown in Figure 1. FLARe’s middleware replication manager (Label A in Figure 1) provides the replication needs of applications hosted in the system and tracks their group membership via a monitor deployed on each processor. FLARe’s client failover manager (label B in Figure 1) contains a redirection agent that is updated with failover and redirection targets by the middleware replication manager as it tracks group membership changes. FLARe’s client request interceptor (label C in Figure 1) catches failure exceptions and provides application transparent failover for clients. FLARe’s state transfer agent (label D in Figure 1) allows server objects to inform it about changes to application states. State transfer in FLARe is accomplished by remote invocations from the primary replica to the backup replicas
2.1 System Fault Model DeCoRAM supports DRE systems where application servers provide multiple long-running services on a cluster of computing nodes. The services in this system are invoked by clients periodically via remote operation requests. Each service Ti is represented by the pair (Ei , Pi ), where Ei is Ti ’s worst-case execution time (WCET), and Pi is Ti ’s request period. Clients require both soft real-time performance and system availability despite processor failures. We consider fail-stop processor failures; multiple processors can fail concurrently. Although these failures are often transient the mean time to recover from such failures 2
through the state transfer agent (label D in Figure 1), which gets the references of the backup replicas from the middleware replication manager. FLARe supports scheduling state update propagations from the primary replica to the backup replicas after it sends its response to the client, which significantly reduces client response time.
tion 3.2 describes the DAS algorithm in detail. The DeCoRAM model-driven D&C engine consists of two pieces: the deployment analysis modeling language and the deployment and configuration middleware. DeCoRAM’s deployment analysis modeling language is a domain-specific modeling language that supports design-time specification of per-application real-time requirements, such as WCET, as well as fault-tolerance requirements, such as number of failures to tolerate. The DAS algorithm determines appropriate fault-tolerance configurations based on these requirements.
Figure 1: The FLARe Middleware Architecture
Figure 2: The DeCoRAM QoS-enabled Middleware Architecture DeCoRAM’s deployment and configuration middleware extends the FLARe framework described in Section 2.2. DeCoRAM uses the configurations determined by the DAS algorithm as a template and provides automatic QoSenabled configuration of soft real-time applications hosted atop FLARe. Section 3.3 describes DeCoRAM’s modeldriven D&C engine in detail.
Previous work on FLARe [3] addressed the issue of adapting failover targets of applications to handle dynamic load fluctuations in DRE systems. For systems with relatively stable loads, however, addressing the passive replication D&C problem described in Section 1 is crucial since deployment-time configurations can significantly improve runtime performance. This paper therefore significantly extends our prior work by showing how DeCoRAM can (1) automatically deploy and configure passively replicated soft real-time applications for DRE systems and (2) determine resource-aware configuration values for fault-tolerance properties (e.g., replica-host mapping) and provisioning real-time, fault-tolerance QoS properties at deployment-time.
3.2 Deployment-time Allocation and Scheduling of Replicas We now describe the Deployment-time Allocation and Scheduling (DAS) algorithm. 3.2.1 Algorithm Overview The DAS algorithm receives the following inputs: (1) the WCET and WCSST of each service, including its primary and backup copies, (2) M number of available processors in the system, and (3) upto N number of processor failures to tolerate. Given these inputs, the DAS algorithm allocates primary and N backup copies of each service to the available processors based on a set of real-time and fault-tolerance criteria.
3 Structure and Functionality of DeCoRAM This section describes the structure and functionality of DeCoRAM. 3.1 The DeCoRAM Middleware Architecture DeCoRAM addresses limitations with conventional D&C techniques for passively replicated soft real-time applications via two architectural elements: (1) a deployment and scheduling (DAS) algorithm and (2) a modeldriven D&C engine, as shown in Figure 2. The DeCoRAM DAS algorithm determines values for configuring the fault-tolerance mechanisms (e.g., replica-host mapping) of QoS-enabled middleware based on per-application resource, real-time, and fault-tolerance requirements. Sec-
3.2.2 Real-time and Fault-tolerance Criteria We now describe the real-time and fault-tolerance criteria employed by the DAS algorithm to determine if a primary or a backup copy of a service can be allocated to a processor. Fault-tolerance criteria. The fault tolerance criteria checks if the fault-tolerance requirements are satisfied for 3
each attempted allocation. The criteria used by the DAS algorithm requires the primary and any of the N backup copies of a service cannot be allocated to the same processor. Real-time criteria. The real-time criteria performs the following checks: (1) a primary replica needs to meet the service deadlines assuming no failures, and (2) the kth backup replica needs to meet the service deadline after any k processor failures. When a primary or a backup copy of a service is allocated to a processor, the real-time criteria determines if the attempted allocation satisfies the soft real-time requirements of (1) the service being allocated and (2) the previously allocated services (both primary as well as backup copies of different services) on that processor. Since multiple services are hosted, and are scheduled using rate-monotonic scheduling algorithm [12] as described in Section 2.1, higher priority services preempt lower priority services; hence, it may take lower priority services much longer than their WCET described in Section 2.1. The time taken to execute a service in the presence of such preemption is defined as the worst-case response time (WCRT) of a service. WCRT of a service is equal to WCET plus waiting times caused by preemption by higher priority services. If the WCRT of all the services allocated in a processor is less than their respective deadline, then the processor allocation is valid. Note that we focus on achieving desired delays on the server in this work. The end-toend service delay experienced by the client also includes the client-side and network delay, which may be controlled by appropriate configuration of QoS-aware networks [2]. The DAS algorithm determines the WCRT Ri of each service Ti allocated to a processor using the well-known response time analysis technique in the real-time scheduling literature [9]. Section 3.2.5 and Section 3.2.6 describe how the DAS algorithm use these real-time and fault-tolerance criteria to allocate both primary and backup replicas to a processor.
another available processor. The DAS algorithm leverages the WCRT analysis outlined above and orders the services in decreasing order of their rate-monotonic priority [12] (in increasing order of the periods of all services that must be allocated) and allocates them in that order to the available processors. The N backup replicas of a service also have the same period—and hence the same priority—as described in Section 2.1. The N backup replicas are therefore also allocated along with the primary replica, with the replica allocated before any of its N backup replicas. The algorithm thus allocates services in stages, with one service (including its primary and backup copies) allocated per stage. To minimize the number of processors used to allocate all the services after determining the service allocation order, the DAS algorithm uses a First-Fit binpacking technique [8], where heavily utilized processors are checked for newer allocations before checking lightly utilized processors. Section 3.2.5 and Section 3.2.6 describe how the DAS algorithm uses this staged allocation process to allocate primary and N backup replicas of a service in a stage. 3.2.4 Failure-aware Look Ahead Analysis Determining the WCRT of a primary or a backup copy of a service that is being allocated involves determining (1) which higher priority services have been assigned to the same processor and (2) the blocking time caused by such services to the service being allocated. A processor could host both primary and backup replicas of different services; any of these services could preempt the primary or a backup copy being allocated, as described in Section 2.1. Moreover, a backup replica only consumes time equal to WCSST for synchronizing its state with the state of the primary replica; however, when it is promoted as the primary replica, it consumes time equal to its WCET for processing client requests. As upto N processor failures occur, different backup replicas in a processor get promoted as primary replica. Blocking times can therefore vary during faulty as well as fault-free conditions. The DAS algorithm considers the faulty case in a failure-aware look ahead analysis before allocating a primary or a backup replica to a processor. In the failure-aware look ahead analysis, the DAS algorithm considers the scenario when the system reacts to N processor failures. In this scenario, the WCRT of a primary or a backup replica that is being allocated is determined from (1) all primary replicas that execute WCET, (2) all backup replicas that execute WCSST, and (3) all backup replicas in this processor that get promoted as primary replica as N processor failures occur. In (M - 1) (M minus the processor being allocated) number of available processors, any N processor failures can
3.2.3 Replica Allocation Order We now describe the order in which the services are allocated and candidate processors are chosen to attempt a replica allocation. In WCRT analysis the schedulability of lower priority services is affected by the presence of higher priority services on the processor, as described in Section 3.2.2. The computation complexity of WCRT analysis can be simplified if services are allocated in a priority order, i.e., higher priority services are allocated before lower priority services are allocated to the same processor. This approach assures the allocation of newer services does not affect the schedulability of services that are already allocated to the same processor, i.e., backtracking is not required. If WCRT analysis determines that a service cannot be allocated to a processor, its allocation can be tried on 4
occur in (M - 1)CN combinations. For each combination, the DAS algorithm determines the backup replicas that become primary replicas in this processor and for each case, the DAS algorithm computes the WCRT of the primary or the backup replica that is being allocated. If the WCRT is less than the deadline in each case, then the primary or the backup replica is allocated to this processor. If not, another processor is chosen, and the allocation procedure is repeated. Section 3.2.5 and Section 3.2.6 describe how the failure-aware look ahead analysis is applied by the DAS algorithm to allocate primary and backup replicas that can satisfy soft real-time requirements even in the presence of N processor failures.
3.2.6 Backup Replica Allocation Similar to the primary replica allocation procedure described in Section 3.2.5, the DAS algorithm allocates N backup replicas to appropriate processors after checking a backup replica’s WCRT under both faulty as well fault-free conditions. After allocating all N backups to processors, the DAS algorithm also determines the order in which each backup replica is promoted to a primary replica. Determining the failover order of the backup replicas has two purposes. First, the failover order can be used by QoSenabled middleware [3, 14] to support application transparent passive replication for clients. Second, the DAS algorithm can use failover order to allocate primary replicas of lower priority services. Specifically, in scenario P2 of the primary allocation (Section 3.2.5) the DAS algorithm can use the failover order to determine when a backup replica will be promoted to the primary replica after a designated number of failures of a particular service. The DAS algorithm allocates a backup replica to a processor after determining its WCRT under the following scenarios: • Scenario B1: fault-free conditions. This scenario occurs when the system is bootstrapped. Hence, WCRT is determined similar to how it is determined in scenario P1 described in Section 3.2.5. The allocated backup replica is assumed to consume WCSST. The DAS algorithm checks if the WCRT is less than the service’s deadline. If so, it proceeds to the scenario B2 described next. If the WCRT is not less than the deadline, the DAS algorithm attempts the allocation in another processor by following First-Fit binpacking technique as described in Section 3.2.3. • Scenario B2: faulty conditions. This scenario occurs when the system reacts to N processor failures. Hence, WCRT is determined similar to how it is determined in scenario P2 described in Section 3.2.5. The DAS algorithm applies failure-aware look ahead analysis in the following two cases: Case 1: Primary replica processor fails. In this case, (N - 1) failures are considered in (M - 2) processors, as the primary replica processor fails, and the backup replica is assumed to consume WCET, as it is promoted as the primary replica. WCRT is determined for each combination. If WCRT is less than the deadline under all cases, then the backup replica is allocated to this processor; else, the algorithm attempts this allocation in another processor. Case 2: Primary replica processor does not fail. In (M - 2) (M minus the processor being allocated minus the processor hosting the primary replica) number of available processors, any N processor failures can occur in (M - 2)CN combinations. For each combination, the DAS algorithm determines the backup replicas that become primary replicas in this processor and for each case, the DAS algorithm computes the WCRT of the backup replica that is being al-
3.2.5 Primary Replica Allocation Failure-aware look ahead analysis is essential, but expensive. A primary replica that is being allocated to a processor must satisfy soft real-time assurances at deployment-time, i.e., before N processor failures occur, before planning for faulty conditions. • Scenario P1: fault-free conditions. This scenario occurs when the system is bootstrapped. Hence, WCRT is determined from (1) all primary replicas that execute WCET, and (2) all backup replicas that execute WCSST. The DAS algorithm checks if the WCRT is less than the service’s deadline. If so, it proceeds to the scenario P2 described next. If the WCRT is not less than the deadline, the DAS algorithm attempts the allocation in another processor by following First-Fit bin-packing technique as described in Section 3.2.3. • Scenario P2: faulty conditions. This scenario occurs when the system reacts to N processor failures. Hence, the DAS algorithm applies failure-aware look ahead analysis. For each N processor failure combination checked as described in Section 3.2.4, the DAS algorithm also applies the intuition that a backup replica gets promoted as a primary replica in a failover order (Section 3.2.6 describes how the failover order is determined when the N backup replicas of a service are allocated) as described in Section 2.1. This intuition allows the DAS algorithm to prevent assuming the promotion of a backup replica to a primary replica while determining the WCRT of the primary replica being allocated. We call this allocation a ranked order primary replica allocation. The DAS algorithm computes the WCRT of the allocated primary replica for each possible combination of failures. If the WCRT is less than the deadline in each case, then the primary replica is allocated to this processor. If not, another processor is chosen, and the allocation procedure is repeated from scenario P1. The DAS algorithm therefore allocates a primary replica and satisfies real-time requirements even when N processor failures occur. Section 3.2.6 describes how the N backup replicas of a primary replica are allocated. 5
located. The DAS algorithm checks if the WCRT is less than the service’s deadline. If so, it proceeds to the scenario B3 described next. If the WCRT is not less than the deadline, the DAS algorithm attempts the allocation in another processor. After allocating the N backup replicas to appropriate processors, the DAS algorithm determines the failover order of the backup replicas based on the increasing WCRT values obtained for each of the N allocated backup replicas in condition B1. If all N backup replicas have been allocated (an allocation satisfies all the conditions in the scenarios B1, B2, and B3) all backup replicas can be promoted as a primary replica. The DAS algorithm ranks the backup replicas based on the staleness of their states compared to the primary replica, where the lower the WCRT, the less stale the data. The DAS algorithm therefore determines the failover order of the backup replicas based on the increasing WCRT values in scenario B1. While allocating backup replicas to a processor, state synchronization at the backup replicas is considered as a periodic service (see scenario B1). State synchronization occurs at the backup replica, after the request completes at the primary replica, as described in Section 2.2. Two successive instances of state synchronization at a backup replica is therefore determined by the response time of the primary replica, which is not periodic. To provide periodic, and hence time bounded state synchronization which can be analyzed to determine failover order of the back up replicas, the DAS algorithm utilizes the release guard protocol [19]. In this scenario, the primary replica sends an asynchronous state update to each of the backup replicas as described in Section 2.2. The state update is executed, however, at the backup replica only at the start of each period. At time 0 the state update service at each backup replica does nothing; in the same time interval, the primary replica processes a client request and sends a state update to each of those backup replicas. The backup replicas execute those state updates periodically from the next period. This decision helps the DAS algorithm provide time-bounded state synchronization for all backup replicas.
ify real-time and fault-tolerance requirements and manually write code to configure the middleware. These approaches, in turn, can yield suboptimal—and often incorrect—replica deployments and middleware configurations. Declarative mechanisms based on model-driven engineering (MDE) and generative programming can help address these challenges. DeCoRAM therefore provides an MDE tool interface, named deployment analysis modeling language to the DAS algorithm, as shown in Figure 2. This
Figure 3: DeCoRAM Deployment and Configuration Middleware MDE tool provides intuitive visual modeling capabilities that application developers can use to specify their realtime and fault-tolerance requirements. In particular, DeCoRAM’s MDE tool offers developers with mechanisms to (1) model applications and their interactions, (2) specify their RT requirements including WCET, periods, and deadlines, and (3) specify FT requirements, including the number of processor failures that can be tolerated by the application and WCSST. The generative tools associated with DeCoRAM’s MDE tool collects the requirements modeled by application developers and synthesizes the inputs necessary for the DAS algorithm. DeCoRAM then computes deployments using the DAS algorithm to address the first problem mentioned above. To address the second problem of automating the middleware configuration, DeCoRAM captures the output of the DAS algorithm in the form of metadata that can be processed by DeCoRAM’s deployment and configuration middleware as shown in Figure 3. The output of the DAS algorithm are (1) processors with their primary and backup replica assignments, and (2) the priorities at which the primary and backup replicas need to operate in each processor. To realize a real-time faulttolerant system, DeCoRAM deployment and configuration middleware provides the following capabilities: • Deployment. DeCoRAM’s deployment and configuration middleware provides automatic deployment of applications as well as real-time fault-tolerant middleware ar-
3.3 Automated Real-time Fault-Tolerance Configuration Effectively leveraging the DAS algorithm capabilities described in Section 3.2 requires the resolution of two problems: (1) the requirements specification problem, i.e., how application developers can use the DAS algorithm to determine the placement of their applications and (2) the middleware configuration problem, i.e., how the middleware hosting the primary and backup replicas be configured with the right set of options after deployment decisions are made. Without automated solutions to address these problems, application developers resort to ad hoc approaches to spec6
chitectural elements that support providing runtime faulttolerance support. For example, in the context of FLARe, DeCoRAM deploys the middleware replication manager, per-process state transfer agent, and a per-process monitor. This capability enables DRE applications to be completely agnostic of the environment in which they need to be deployed and operated. • Registration. After the architectural elements of the underlying fault-tolerant middleware are deployed, applications need to register with them. For example, in the context of FLARe, applications need to register with the middleware replication manager, which manages group membership. DeCoRAM’s component server manages a specialized container that is defined by the Lightweight CORBA Component Model (LwCCM) specification. This container encodes a fault-tolerance policy (labeled as FT policy in Figure 3) to associate fault-tolerance specific registration requests with each component hosted by the container. DeCoRAM’s containers register the created object references with the middleware replication manager. This capability enables application developers to write source code that can be reused across different DRE system deployment contexts; no special development is required for registration with external QoS elements. • Underlying middleware configuration. DeCoRAM provides automatic configuration of the underlying middleware. For example, FLARe’s client request interceptor must be initialized with the underlying middleware to catch failure exceptions and provide application-transparent failover for client applications. With existing QoS-enabled middleware [5, 13] application developers must modify source code to incorporate such middleware features. DeCoRAM’s component server initializes the interceptor when the container creates the underlying middleware. Moreover, to obtain real-time assurance from the underlying middleware (including the release guard support required for timebounded state synchronization), DeCoRAM’s container incorporates our previous work described in [18, 19], which provides auto-configuration of the underlying QoS-enabled middleware.
performance using the underlying FLARe middleware. Finally, we show how DeCoRAM’s model-driven D&C engine significantly reduces application development effort compared with conventional approaches. 4.1 Evaluating the DAS Algorithm To evaluate DAS’s PASSIVE replication placement strategy, we implemented the following placement strategies in DeCoRAM, using the DAS algorithm’s real-time and faulttolerance criteria to allocate replicas to a processor: • No fault-tolerance strategy, where the DAS algorithm deployed only the primary replicas. • Active backups strategy, where the DAS algorithm deployed both primary as well as N backup replicas of all the services. The backup replicas always consumed WCET since they received periodic client requests just like the primary replica. • Backups with only state synchronization strategy, where the DAS algorithm deployed both primary and N backup replicas of all the services. The backup replicas always consumed WCSST, however, and they never became primary replicas. • Unranked backups strategy, where the DAS algorithm deployed both primary and N backup replicas for all services. The DAS algorithm’s failure-aware look ahead analysis, however, considers all the backups as equal, i.e., the backups are assumed to become primary replicas after the processor hosting the primary replica fails. • Ranked backup strategy, which is DAS algorithm’s preferred strategy for deploying passively replicated DRE systems. In this strategy, the DAS algorithm’s failureaware look ahead analysis considers the failover order of the backup replicas while determining which failure combinations can promote a backup replica on a processor to a primary replica. Methodology. We measure the performance of the strategies outlined above in terms of number of processors utilized while varying (1) the number of tasks to allocate and (2) the maximum utilization load of each task. The task utilization load is defined as a ratio of its WCET and its period. For these sets of experiments, we randomly chose the task periods that are uniformly distributed with a minimum period of 1 ms and a maximum period of 1000 ms. After the task period was obtained, the task load was picked from a uniformly distributed collection of minimum task load of 0% and the max task load, which determines the WCET. We followed a similar methodology to pick the WCSST for all tasks from a uniform period of [10%,15%] of the period. All placement strategies tolerated 3 processor failures, except for the no fault-tolerance strategy, which does not tolerate any processor failure. Analysis of results. Figures 4a, 4b, and 4c show the number of processors used when each of the placement
4 Empirical Evaluation of DeCoRAM We empirically evaluated DeCoRAM at ISISlab (www. dre.vanderbilt.edu/ISISlab) on a testbed of 14 blades. Each blade has two 2.8 GHz CPUs, 1GB memory, a 40 GB disk, and runs the Fedora Core 4 Linux distribution. Our experiments used one CPU per blade and the blades were connected via a CISCO 3750G switch to a 1 Gbps LAN. We first validate that DeCoRAM’s DAS algorithm can allocate primary and backup replicas of multiple services while minimizing the processor utilization. We then show that the resource-aware fault-tolerance configuration decisions made by the DAS algorithm can assure soft real-time 7
(a) Varying number of tasks with 10% max load
(b) Varying number of tasks with 30% max load
(c) Varying number of tasks with 50% max load
(d) Varying number of backups with 30% max load
Figure 4: Performance of the DAS Algorithm with Varying Tasks and Backups the performance of each placement strategy when the number of processor failures to tolerate is varied. The DAS algorithm deployed a different number of backups for each experimental run. The ranked backup strategy consistently outperforms the active backups and the unranked backups strategy, particularly as the number of replicas increase. This result occurs because the ranked backup strategy takes advantage of the failover order to clearly determine when a backup replica can become a primary replica as the number of replicas increase. This heuristic allows us to reduce the number of alternatives that we have to search over in the failure-aware lookahead analysis. Reduction in number of alternatives means a faster decision and also a lower WCRT for the replica being currently allocated on the processor.
strategies outlined above attempts to allocate varying number of tasks with varying maximum CPU utilization of a task. As the number of tasks increase, the number of processors also increased exponentially in the case of the active backup strategy. When DeCoRAM uses any of the other passive replication placement strategies, however, the rate of increase in number of processors used is less steep. The DAS algorithm’s preferred ranked backup placement strategy also consistently outperforms other passive replication placement strategies. This behavior occurs whenever there are many tasks or when there is a higher load, as shown in the Figures 4a, 4b, and 4c. Moreover, the performance of the ranked backup placement strategy is comparable to the performance of the no fault-tolerance and the backup with only state synchronization placement strategy. This result shows that the DAS algorithm can allocate applications while tolerating upto N processor failures and minimizing the number of processors utilized. Benefit of Using Ranked Backups. Figure 4d shows
4.2 Evaluating Resource-aware Fault-tolerance Configurations This experiment evaluates the resource-aware faulttolerance decisions made by the DAS algorithm. We eval8
uate the DAS algorithm’s capability to determine failover order of backup replicas based on state synchronization delays at the backup replicas. We also evaluate the DAS algorithm’s capability to tolerate N processor failures while maintaining end-to-end QoS assurance for soft real-time applications.
followed by T _2_2, and T _2_3; if T _0 fails, contact T _0_1 followed by T _0_2, and T _0_3. We emulated three processor failures at 100, 200, and 300 seconds as clients of these three primary replicas invoked remote operations on those processors. Analysis of results. Figure 6 shows the response times perceived by the three clients of the primary replicas T _4, T _0 and T _2, as three processor failures occurred. At 100, 200, and 300 seconds, the client for T _0 shows an increase in response times because of the client failover delays. This delay is not marked for clients of T _4 and T _2 because of the varying state synchronization delays. The results show that the clients maintain satisfactory real-time performance as processor failures occur. The DAS algorithm’s failure look ahead analysis made allocation decisions by predicting the performance of the applications both in the presence and absence of failures.
Figure 5: DeCoRAM Evaluation
4.3 Evaluating DeCoRAM’s Automation Capabilities
Experiment setup. We used DeCoRAM’s model-driven deployment and analysis modeling language to specify realtime and fault-tolerance requirements for 10 services. The maximum load for each task is set to 30% and three processor failures were tolerated. The DAS algorithm generated allocations for the primary and backup replicas of each service using its ranked backup placement strategy, as described in Section 4.1. The DeCoRAM D&C middleware deployed all the primary and backup replicas at their respective processors and configures FLARe.
We now evaluate DeCoRAM’s automated deployment and configuration capabilities in terms of savings in developer efforts to deploy and configure applications of the DRE system on the underlying middleware compared to conventional approaches. We demonstrate these savings in the context of two scenarios: (1) at system bootstrap-time, when the DRE system and all its applications are initialized for the first time, and (2) at runtime, when the real-time and/or fault-tolerance requirements of one or more applications in the DRE system change, which invariably requires redeployment and reconfiguration of the applications. We have defined a metric to measure effort expended wherein we keep track of the number of steps required of a developer to D&C an application in each approach we compare. The advantage of this metric is that it does not depend on any particular modeling tool or middleware API. The downside of the approach is that the metric does not capture the time it takes to perform each step. Despite this, we believe the metric is powerful enough to provide effective comparisons between DeCoRAM and conventional approaches. Case I: Bootstrap-time, where we calculate the number of steps required by DeCoRAM and conventional approaches in the initial D&C an application (which includes its primary and backup replicas) along the following artifacts: 1. Specifying real-time and fault-tolerance requirements. Conventional approaches tend not to use any formal technique to capture the these requirements and hence they require zero steps for this artifact. In DeCoRAM, application developers use deployment and analysis modeling language to model the application and its real-time requirements (e.g., WCET, periods) and fault-tolerance requirements (e.g., WCSST, number of processor failures). We count this as requiring one step per application using our
Figure 6: End-to-end response times for clients Figure 5 shows how the DAS algorithm allocated the primary and backup replicas of the services to appropriate processors. We focus on processor LINDY, which hosts three primary replicas T _4, T _0 and T _2. The DAS algorithm selects failover targets for these primary replicas at deployment-time as follows: if T _4 fails, contact T _4_1 followed by T _4_2, and T _4_3; if T _2 fails, contact T _2_1 9
5 Related Work
metric, so it requires N steps for N applications. All inputs are maintained in a repository inside the tool and the rest of the stages are automated via a workflow. 2. Running the DAS algorithm. Conventional approaches must manually execute the DAS algorithm. They must therefore first create the input that the DAS algorithm understands for each application, collect all these inputs, and then invoke the DAS algorithm. This accounts for N + 2 steps, where N is the number of applications. 3. Parsing DAS algorithm’s output. Conventional approaches must write code to parse the output generated by the DAS algorithm, requiring one step. 4. Deploying middleware artifacts. The middleware replication manager must be deployed in both approaches. Moreover, the state transfer agent and health monitor must be deployed for every process, i.e., for each primary and backup of every application. Conventional approaches involve handcrafted code, which must then be compiled and linked with the middleware to deploy each capability. Even if we count this as one step, we will need a total of N ∗ R ∗ 2 steps in conventional approaches to deploy the state transfer agent and health monitor, where R is the total number of replicas of each application. 5. Registration. Each application must register with the middleware replication manager. In conventional approaches this accounts for N steps for N applications. 6. Middleware configuration. Conventional approaches will require programmatic efforts to initialize the request interceptor and the release guards for the state transfer agent. If we count this as one step, such a step must be repeated for every replica of every application requiring N ∗ R steps. Overall, in comparison to the N modeling steps required by DeCoRAM, conventional approaches will require a total of 3NR + 2N + 3 steps. Case II: At runtime when real-time and fault-tolerance requirements change. Assume that these requirements of M applications change, where M ≤ N. DeCoRAM requires modeling the requirements for M applications. It is possible that changes in the requirements of M applications will impact the D&C for M ′ applications, where M ≤ M ′ ≤ N, which will be revealed by the executing the DAS algorithm for the new set of requirements. All impacted applications (including replicas) must be stopped and uninstalled prior to the new deployment. In DeCoRAM, this process is entirely automated. Conventional approaches require M ′ ∗ R steps, where each step indicates the teardown of individual replica. With M ′ affected applications, using the calculations from the bootstrap-time section, the overall number of steps in conventional approach is 4M ′ R + 2M ′ + 3 steps compared to M modeling steps in DeCoRAM.
Fault-tolerance in real-time systems based on active replication. Prior research has focused on developing fault-tolerant real-time systems using ACTIVE replication. AQUA [11] dynamically adapts the number of replicas receiving a client request in an ACTIVE replication scheme so that slower replicas do not affect the response times received by clients. Eternal [10] dynamically changes the locations of active replicas by migrating soft real-time objects from heavily loaded processors to lightly loaded processors, thereby providing better response times for clients. In contrast, DeCoRAM focuses on PASSIVE replication which is more suitable for resource-constrained systems. DeCoRAM also provides resource-aware replica placement for passively replicated applications, and uses the placement knowledge to predict the failover order for the backup replicas and assures soft real-time application QoS even in the presence of failures. Fault-tolerance in real-time systems based on passive replication. MEAD [16] reduces fault detection and client failover time by determining the possibility of a primary replica failure using simple failure prediction mechanisms and redirects clients to alternate servers before failures occur. [20] presents a real-time primary backup replication scheme that uses scheduling algorithms such as rate monotonic scheduling algorithm for providing temporal consistency guarantees for operations as well as update transmissions. The key contributions of DeCoRAM is its DAS algorithm that can provide resource-aware allocation and configuration decisions for deploying applications atop such middleware, as well as our own QoS-enabled middleware [3]. Deployment-time scheduling and allocation algorithms for real-time fault-tolerance. RBSA [1] focuses on a bi-criteria scheduling heuristic that allocates actively replicated applications to processors, while minimizing scheduling length and increasing reliability. [17] focuses on allocating precedence-constrained passively replicated applications to heterogeneous processors and assures soft real-time performance while considering single processor failure. [6] provide fault-tolerance for tasks sets using both ACTIVE and PASSIVE replication and allocate tasks and their replicas to processors using first-fit bin-packing and rate monotonic scheduling algorithms. DeCoRAM differs from the above work as follows: (1) it allocates and deploys passively replicated applications to appropriate processors while maintaining soft real-time performance and tolerating N processor failures and (2) it assists in configuring QoS-enabled middleware at runtime by determining appropriate failover targets for each primary replica using its novel failure-aware look ahead analysis. 10
6 Concluding Remarks Effective deployment and configuration (D&C) of primary and backup passive replicas is a challenging problems when applications require both soft real-time and fault tolerance properties. This paper presents the Deployment and Configuration Reasoning via Analysis and Modeling (DeCoRAM) approach that provides a Deployment-time Allocation and Scheduling (DAS) algorithm whose novel failureaware look ahead analysis can (1) forecast excess resource consumption in processors due to processor failures and (2) utilize this forecasting to allocate and configure applications to processors at deployment-time. We also describe how DeCoRAM’s novel model-driven D&C engine automatically configures the underlying QoS-enabled middleware based on the allocation and configuration decisions made by the DAS algorithm. Our experimental results show that apart from minimizing the processors utilized, DeCoRAM can make resource-aware configuration decisions that can assure both real-time and fault-tolerant performance for soft real-time applications. DeCoRAM is available in open-source format at (www.dre.vanderbilt.edu/~jai/ FLARe/DeCoRAM).
[8] [9] [10]
[11]
[12]
[13]
[14]
References [1] I. Assayad, A. Girault, and H. Kalla. A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints. In DSN ’04, page 347, Florence, Italy, 2004. [2] J. Balasubramanian, S. Tambe, B. Dasarathy, S. Gadgil, F. Porter, A. Gokhale, and D. C. Schmidt. Netqope: A model-driven network qos provisioning engine for distributed real-time and embedded systems. In RTAS’ 08: Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 113–122, Los Alamitos, CA, USA, 2008. IEEE Computer Society. [3] J. Balasubramanian, S. Tambe, C. Lu, A. Gokhale, C. Gill, and D. C. Schmidt. Adaptive Failover for Real-time Middleware with Passive Replication. In Proceedings of the 15th Real-time and Embedded Applications Symposium (RTAS), San Francisco, CA, Apr. 2009. [4] K. Balasubramanian, J. Balasubramanian, J. Parsons, A. Gokhale, and D. C. Schmidt. A Platform-Independent Component Modeling Language for Distributed Real-time and Embedded Systems. Journal of Computer Systems Science, 73(2):171–185, 2007. [5] T. Bennani, L. Blain, L. Courtes, J.-C. Fabre, M.-O. Killijian, E. Marsden, and F. Taiani. Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization. In DSN’ 04, pages 549–554, Florence, Italy, 2004. [6] A. A. Bertossi, L. V. Mancini, and F. Rossini. Faulttolerant rate-monotonic first-fit scheduling in hard-real-time systems. IEEE Trans. Parallel Distrib. Syst., 10(9):934–945, 1999. [7] Z. Cai, V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan, and R. E. Strom. Utility-Driven Proactive Management of
[15]
[16]
[17]
[18]
[19]
[20]
11
Availability in Enterprise-Scale Information Flows. In Proceedings of ACM/Usenix/IFIP Middleware, pages 382–403, 2006. S. Dhall and C. Liu. On a real-time scheduling problem. Operations Research, 26(1):127–140, 1978. M. Joseph and P. Pandya. Finding Response Times in a RealTime System. The Computer Journal, 29(5):390–395, 1986. V. Kalogeraki, P. M. Melliar-Smith, L. E. Moser, and Y. Drougas. Resource Management Using Multiple Feedback Loops in Soft Real-time Distributed Systems. Journal of Systems and Software, 2007. S. Krishnamurthy, W. H. Sanders, and M. Cukier. An Adaptive Quality of Service Aware Middleware for Replicated Services. IEEE Transactions on Parallel and Distributed Systems, 14(11):1112–1125, 2003. J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: exact characterization and average case behavior. In RTSS ’89: Proceedings of the IEEE RealTime Systems Symposium, pages 166–171, Washington, DC, USA, 1989. IEEE Computer Society. P. Narasimhan. Trade-Offs Between Real-Time and Fault Tolerance for Middleware Applications. Workshop on Foundations of Middleware Technologies, Nov. 2002. P. Narasimhan, T. Dumitras, A. M. Paulos, S. M. Pertet, C. F. Reverte, J. G. Slember, and D. Srivastava. MEAD: support for Real-Time Fault-Tolerant CORBA. Concurrency - Practice and Experience, 17(12):1527–1545, 2005. Pascal Felber and Priya Narasimhan. Experiences, Approaches and Challenges in building Fault-tolerant CORBA Systems. Computers, IEEE Transactions on, 54(5):497– 511, May 2004. S. Pertet and P. Narasimhan. Proactive recovery in distributed corba applications. In DSN ’04: Proceedings of the 2004 International Conference on Dependable Systems and Networks, page 357, Washington, DC, USA, 2004. IEEE Computer Society. X. Qin, H. Jiang, and D. R. Swanson. An efficient faulttolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In ICPP ’02: Proceedings of the 2002 International Conference on Parallel Processing (ICPP’02), page 360, Washington, DC, USA, 2002. IEEE Computer Society. N. Wang, C. Gill, D. C. Schmidt, and V. Subramonian. Configuring Real-time Aspects in Component Middleware. In Proc. of the International Symposium on Distributed Objects and Applications (DOA), volume 3291, pages 1520–1537, Agia Napa, Cyprus, Oct. 2004. Springer-Verlag. Y. Zhang, B. Thrall, S. Torri, C. Gill, and C. Lu. A RealTime Performance Comparison of Distributable Threads and Event Channels. In Proceedings of the 11th Real-time Technology and Application Symposium (RTAS ’05), San Francisco, CA, Mar. 2005. IEEE. H. Zou and F. Jahanian. A real-time primary-backup replication service. Parallel and Distributed Systems, IEEE Transactions on, 10(6):533–548, 1999.