An Adaptive N-variant Software Architecture for

2 downloads 0 Views 2MB Size Report
monitoring [6]. While the proposed benefits of adaptive software is promising, developing adaptive software also raises some challenging questions. First, in.
An Adaptive N-variant Software Architecture for Multi-Core Platforms: Models and Performance Analysis Li Tan1 and Axel Krings2 1

School of Electrical Engineering and Computer Science Washington State University [email protected] 2 Department of Computer Science University of Idaho [email protected]

Abstract. This paper discusses the models and performance analysis for an adaptive software architecture, which supports multiple levels of fault detection, masking, and recovery through reconfiguration. The architecture starts with a formal requirement model defining multiple levels of functional capability and information assurance. The architecture includes a multi-layer design to implement the requirements using Nvariant techniques. It also integrates a reconfiguration mechanism that uses lower layers to monitor higher layers, and if a fault is detected, it reconfigures a system to maintain essential services. We first provide a general reliability model (based on generalized stochastic Petri nets) for such a system with cross-monitoring for reconfiguration. Next, we define a probabilistic automaton-based model for behavioral modeling of the system. This model is especially suitable for modeling security problems induced by value faults. Whereas the Petri net allows for reliability modeling and reconfiguration, the performance analysis of the system is given via probabilistic model checking. The models are experimentally evaluated and compared. With the current widespread deployment of multi-core processors, one question in software engineering is how to effectively harness the parallel computing power provided by these processors. The architecture presented here allows us to explore the parallel computing power that otherwise may be wasted, and uses it to improve the dependability and survivability of a system, which is validated by our performance analysis.

1

Introduction

With the introduction of multi-core processors and their wide-spread deployment ranging from laptops to cloud computing, a question has been raised on how we can effectively exploit the parallel resources. It is projected that the level of application-exploitable parallelism will limb behind the number of cores in multi-core processors and GPU (graphics processing units) since to date most

applications only allow limited parallelism. One way of harnessing these otherwise unused or underutilized resources is to use them for the purpose of increasing dependability, security, and survivability of a system, but doing so requires the support from adaptive software. Two key features of an adaptive software are (1) the ability to monitor its own execution and (2) the ability to reconfigure itself based on the result of runtime monitoring [6]. While the proposed benefits of adaptive software is promising, developing adaptive software also raises some challenging questions. First, in many cases self-adaptation adds one more dimension of complexity to often already complicated dependable system designs. A question is how to specify requirements and implement them in a way that facilitates orderly and verifiable system reconfiguration. Second, a system may be subject to a variety of faults. So a challenge is how one could compartmentalize and diversify system design so the system can be resilient to different types of faults. This may be especially relevant in safety-critical applications. Finally, runtime monitoring requires additional computation power. Thus, it is important that our design can make efficient use of the underlying hardware architecture to minimize overhead. To address the first challenge, we use a formal model to specify requirements for self-adaptation and then propose a multi-layered assured architecture to realize requirements expressed in the formal model. The Adaptive Functional Capability Model (AFCM) introduced in [10] defines levels of capabilities for each system functionality. The AFCM specifies how a system shall reconfigure itself and scale down its functional capabilities while still providing essential services and information assurance. Each level of functional capability in the AFCM will then be implemented as a layer in a Multi-layered Assured Architecture Design. The architecture design embeds a Monitoring and Reconfiguration Module (MRM) that uses lower-layer functionalities as reference to monitor highlayer functionalities. In case a fault is detected, the system reconfigures itself by disabling affected layers, while lower layers still maintain essential services. To further improve system resilience, we use a diversified layered design based on N-variant techniques in each layer [3, 4]. The N-variant techniques use redundant executions to reduce system vulnerability to common-mode faults. The expectation is that redundant but dissimilar implementations reduce or eliminate common-mode faults. Dissimilarity is typically discussed in the context of Nversion programming [1] dating back to the late 70s. In N-version programming it is assumed that several software development groups independently derive programs from the same specification. The concept of N-variant software is inspired by N-version software, but in N-variant software different variants are generated in a more automated fashion. In both cases a fault is detected if a difference is detected between outputs generated by two versions or variants. Redundant executions exercised by multiple variants and extra work of runtime monitoring requires additional computational power. To reduce overhead, our N-variant-based implementation takes advantage of multi-core hardware. Most new general-purpose computers incorporate dual or quad-core processors and higher numbers of cores are already used widely in GPUs. Whereas in theory the computational capabilities increase with the number of cores, it becomes difficult to exploit sufficient parallelism to keep all cores utilized. Most common applications still allow little parallelism and it is likely that cores may be un-

derutilized or running idle. In our approach, unused or underutilized cores are exploited to increase reliability, security, and survivability. Specifically, multiple variants execute on different cores, and if they can execute on idle cores, this overhead can be largely absorbed. This was also shown in [8] where multivariants executed in multi-core systems. Our approach extends this by making extensive use of N-variant implementation at each layer of functional capability. In general, the lower a layer, the more variants it may have in order to provide a higher degree of resilience and information assurance. Nevertheless, the exact number of variants and their configuration required at each layer depends on the type and number of faults that are to be detected or masked. The contributions of this paper are three-fold. First the hierarchical formal framework for adaptive N-variant programs outlined in [10] is applied to a system with multiple functionalities. Second, a reliability model is derived for such systems using Generalized Stochastic Petri Nets (GSPN). This model specifies how N-variant executions with different levels of capabilities are managed, and it models cross-monitoring as a mechanism to initiate survivability measures. Third, we develop a stochastic behavior model and apply probabilistic model checking to it for analyzing the performance of our adaptive software architecture. Whereas the GSPN model allows to determine the reliability of the system, the stochastic behavior model incorporates the consequences of data-dependent false negatives, and it aids in the formal verification towards an implementation. We will show how one can use the reliability model to derive the theoretically achievable reliability that is independent of the data, and how one can use the independently derived behavioral model to verify the results of the reliability analysis, quantify the impact of data dependent coincidental faults, and aid in the derivation of implementations. The rest of the paper will be organized as follows: Section 2 introduces a formal model for specifying levels of functional capabilities, a layered adaptive architecture design to realize the adaptive functional capability model, and an Nvariant-based approach to implement the architecture design. Section 3 discusses the features and benefits of our adaptive software architecture in terms of its adaptivity and survivability. Section 4 gives a Generalized Stochastic Petri Net (GSPN) model that we use for reliability analysis. Section 5 develops a stochastic behavior model that is used in conjunction with probabilistic model checking for analyzing performance of our software architecture and also validating the result of reliability analysis. Section 6 discusses the outcome of our computational experiments that are designed for assessing the reliability and performance of our software architecture. Finally Section 7 concludes the paper.

2 2.1

Specification Model & Architecture Design Adaptive Functional Capability Model

The Adaptive Functional Capability Model (AFCM) specifies multiple functionalities with adjustable levels of capability. The model attaches each functionality to layers of capability. During requirement elicitation, a development team works with stake holders of a project to identify not just functionalities, but also

F13

F12

F11

F22

F21

Fig. 1. AFCM for functionality F1 and F2

capability levels for each functionality. These capability levels specify graceful degradation in case of faults or when under attack. Assume that the system is comprised of functionalities F1 · · · Fm . Figure 1 shows the AFCM for two sample functionalities F1 and F2 . The requirements for F1 define three levels of capabilities: F11 defines the set of core operations that are mission-critical, F12 includes F11 and some non-critical but value-added operations, and F13 adds some more value-added operations. We write F11  F12  F13 , where  is a preorder on the capability levels. The exact semantics of  is defined and interpreted based on application context. In [10] we described a transaction-based asynchronous system that contained mission-critical data and non-mission-critical but value-added data to demonstrate the AFCM. The purpose of the AFCM is to specify not only functional requirements, but also requirements for adaptiveness. It has two features to serve its purpose: First, the model associates each functionality with capability levels, which specify reconfiguration requirements for the functionality. It states that, in the event of a fault, e.g., the system has been compromised, a system shall scale back its services in an orderly manner by following the capability levels defined in the AFCM, e.g., recovering to a lower level implemented in the next layer down; Second, the definition of capability levels also facilitates reconfigurable design. It requires that the system behavior at a higher capability level shall be an extension of the behavior at the lower level. Hence, we can use the behavior at a lower capability level as a reference for monitoring the behavior at a higher level, and an implementation for capability levels provides a path for a system to scale itself back. It shall be noted that each functionality in an AFCM may have its own hierarchy of capability levels reflecting its requirement for adaptiveness. For example, F1 and F2 in Figure 1 have different levels of capabilities. 2.2

Layered N-variant Architecture

To implement the reconfiguration requirements specified in the AFCM using different levels, a layered adaptive architecture design is used. Each layer realizes its corresponding AFCM level using N-variant techniques such as shown in [3, 8]. Figure 2 shows an example of the adaptive N-variant architecture for two functionalities F1 and F2 . The architecture has a layered structure. Each vertical layer realizes its respective capability level, i.e., it implements sequences of operations specified for its capability level. A layer may be disabled if it is found not functioning correctly, i.e., if a fault is detected. Fault detection is the result of redundancy management at the specific layer, or the layer beneath it, which

has the monitoring abilities of its capabilities at the layer above it. The capability level of the entire functionality is decided by its highest enabled layer. Each layer itself is a collection of variants implementing its capability level. Variants are systematically diversified so that it is unlikely that a common-mode fault can occur [3, 8], e.g., for a given fault model, an attacker can not compromise all the variants without being detected and/or the fault being masked. One could argue that a lower layer should have more variants to improve resilience of the functionality. Each functionality may have a different layered structure that reflects its adaptability requirement. For example, in Figure 2 F1 and F2 are implemented by (L11 , L21 , L31 ) and (L12 , L22 ), respectively.

F1

2 2 L V 1,1

2 V1,2

1 1 L V 1,1

1 V1,2

1 V1,3

Capability

Capability

3 3 L V1,1

N-variant

F2

2 2 L V 2,1 1 1 L V 2,1

1 V2,2

N-variant

Monitoring and Reconfiguration Module (MRM)

Fig. 2. A layered adaptive N-variant architecture design for the AFCM of Figure 1

3

Adaptability and Survivability

The layered adaptive N-variant architecture improves system resilience by supporting 1) real-time fault detection though redundancy management and crosslayer monitoring, 2) fault masking, and 3) system reconfiguration. The architecture design in Figure 2 includes the Monitoring and Reconfiguration Module (MRM). Critical or sensitive functionalities are implemented using the layered N-variant architecture and the MRM acts as a sentry for layered N-variant components. The MRM monitors and sanctions the communication in and out of the N-variant components. Together, the N-variant-based layers and the MRM provide runtime monitoring and real-time fault tolerance with reconfiguration, essential for an adaptive system. Runtime monitoring by the MRM: The MRM uses observable behavior of a lower layer to decide whether the layer above is compromised or not. If a fault is detected, it reconfigures the system by disabling affected layers while essential functional capabilities are still provided by the lower layer(s). In section 2.1

capability levels in the AFCM are defined in such a way that a sequence of operations specified at a higher level is an extension of some sequence of operations specified at a lower level. In our layered adaptive N-variant architecture, all the layers process incoming requests concurrently. Since a layer Li is an implementation of an AFCM capability level F i , a sequence of operations executed by layer Li shall be included in a sequence of operations executed by layer Li+1 . Should this not be the case it indicates problems (i.e., a fault) in Li+1 . The lower layer Li is realized using N-variants of simpler implementations and potentially a higher degree of redundancy. It is argued that lower complexity implementations together with more stringent analysis/testing at Li is assumed to make variants in Li more reliable than in Li+1 . A larger degree of N-variants also increases reliability, as it implements a k-of-N configuration. Real-time reconfiguration: The AFCM provides a reconfiguration plan in which a functionality can scale back its services in an orderly manner, and thus provide graceful degradation. A layer Li serves as the backup for layer Li+1 above it. A lower layer forgoes some functional capability in lieu of improved dependability. If the MRM detects a fault in layer Li+1 , it disables Li+1 and the system automatically scales its capability to the level implemented by Li . For the sake of completeness it should be noted that capabilities can not only be decreased, but also extended should the need arise, e.g., after recovery or repair. The layered adaptive N-variant architecture is designed for improving system survivability for mission-critical applications: the capability of shifting to a lower layer (upon detection of a fault) provides a contingency plan that allows a system to scale back its services towards essential services as the result of the fault.

4

Reliability Analysis

In this section we discuss the impact of the layered model from a reliability analysis point of view. Before establishing a formal model for the analysis we need to establish the link to previous research. Multi-variant approaches have been previously described in [3, 4, 8, 12]. However, these models are described within a single layer, and their N-variant models constitute special cases of the layered architecture presented here. Thus, the cited approaches can be adopted at any layer within our architecture. It should be noted that they all have specifications and implementation at the same level and layer respectively. This means that the approaches deal with fault detection and possible treatment dependent on the degree of redundancy. However, adaptability and graceful degradation as described above is not supported. For example, the multi-variant scheme described in [8] uses two variants of memory referencing. Both variants implement 1 1 . The the same functional capability, e.g., at layer L1 with variants V1,1 and V1,2 model in [3] has the similar limitation. The application of N-variant approaches is not simply another way of generating dissimilarity similarly to N-version programming, but the specific derivation of variants that are designed to minimize common-mode faults, to the point of predictable elimination. For example, the memory management in [8] practically eliminates the potential for buffer overflows, since the two variants use reverse

[2] M.H L down [2] M.H. Azadmanesh, and R.M. K Om Figure 5. Cross-level monitoring Omissive Faults in Synchronous men ment, IEEE Trans. Computers 104 1042, Oct. 2000. Figure 5 generalizes monitoring between two adjacent Figure 5. Cross-level monitoring [3] B. layers. The left side of the figure shows [3] the relationship B. Cox, et.beal., N-VariantFram Sy tween two levels of requirements for functionality F , i.e., Framework for Security thro USE i i+1 between F and Ftwo . adjacent Note that F i ≺ F i+1 . The respecFigure 5 generalizes monitoring between USENIX Security Symp., Vanc i i+1 N-variant Execution A Hierarchical Formal tive implementations are in belayers and Lfor . Note that layers. The left side the figure shows the relationship AofHierarchical Formal Model forLModel N-variant Executions in [4] Mu C.M i+1 A Hierarchical Formal Model for N-variant Executions in Multi-core Syst [4] C.M. Jeffery, and J.O. Figueire layer Lfor functionality consists of theFimplementations of the operations tween two levels of requirements , i.e., tine i i i+1 based on F i as well as the Fault Tolerance in Many-c thethe between F and F i+1 . Noteof that Flower ≺ Flayer . The respecmemory allocation (one uses forward and other reversed allocation). In or- tinevalue-added form i+1 i i i+1 Li Tan AxelInternationa Krings,Rob forms, IEEE specified F variants \ F . Thus is limder for a buffer overflow attack to only would both have monitoring to Axel13th tive implementations aresucceed in operations layersnot L Tan and L .by Note that Li Krings,Robert Rinker cific LitheTan Krings,Robert Clinton Jeffer i cific Rim Rinker, Dependable Computin be attackedlayer at the same time,ofbut, more buffer overflow Washington State by University Univ ited importantly, to operations specified FAxel . would Li+1 consists implementations of the operations Washington State University University of I Abased Hierarchical Formal Model for N-variant Executions in Multi-core Washington State of Idaho have to have meaning in two directions. The latter implies that the [5] 8384 A. K of the lower layer on University F iThe as well as the value-added Stochastic Activity Network (SNA)University of inter-layer Richland, WA 99354-1641 Moscow A Hierarchical Formal Model for N-variant Executions inoverflow Multi-core Systems Richland, WA 99354-1641 Moscow, ID [5]Moscow, A. Systems Krings, et.al., Resilient Multi A would Hierarchical Model for N-variant Executions in Multi-core AFormal Hierarchical Formal Model for N-variant Executions in Multi-core Systems i+1 i Richland, WA 99354-1641 ID 83844-1010 haveoperations to “flow” into reverse memory in the two variants, and only a buffer Across-monitoring Hierarchical Formal for N-variant Executions in Mult erar specified by F Formal \[email protected] F . Thus monitoring is Model limisfor shown on the right side of the figure. A Hierarchical Model N-variant Executions in Multi-core [email protected] {krings,rinker, {krings,rinker,jeffery}@ Formal Model forSy N i overflow that as a palindrome could to succeed. The ap- erarchical [email protected] {krings,rinker,jeffery}@cs.uidaho.edu Pro itedacts to operations specified by . the potential TheFhave transition is activated when operations specified by F i Proc. CSIIRW09, ORNL, April Li Tan Axel Krings,Robert Rinker, Clinton proach in [12] isLisimilar in nature in differ that memory isand partitioned in Fsuch way i Rinker, i Clinton in layer iAxel i+1, i.e., (Lia) �= F (Li+1Jeffery ),Clinton where Jeffery The Tan Stochastic Activity Network (SNA) ofKrings,Robert inter-layer Li Tan AxelifRinker, Krings,Robert Rinker, Washington State University University of Idaho that Washington a valid Li access in one variant is invalid in the other. Thus, given the schemes Tan Axel Krings,Robert Clinton Jeffery i j [6] R. Li Tan Axel Krings,Robert Rinker, State University University of Idaho F (L ) indicates the functional specification with respect cross-monitoring is shown on the right side of the figure. Li Tan Axel Krings,Robert Rinker, Clinton JefLc A Hierarchical Formal Model for N-variant Executions in Multi-core Systems [6] R. Laddaga, P. Robertson, and Washington State University University of Idaho described in [8] and [12] it is statistically very unlikely that two modules produce Abstract reduce orcommon eliminate Richland, WA 99354-1641 Moscow, ID 83844-1010 Abstract reduce or eliminate mo j i+1 i tion i Washington State University University of Idaho Abstract orthe eliminate mode faults. Dissim Richland, WAWashington 99354-1641 IDMoscow, Washington State University University of Id to layer L . Since F Moscow, includes F83844-1010 MRM indicates The transition is activated when operations specified by F reduce tion tocommon Self-adaptive Software: State University University of Idaho Richland, 99354-1641 ID 83844-1010 the same fault as the result ofWA code injection. typically discussed i typically discussed in the conte [email protected] {krings,rinker,jeffery}@cs.uidah i i i i+1 2nd i i i i+1 typically discussed in the context of N-version Richland, WA 99354-1641 Moscow, 83844-1010 [email protected] {krings,rinker,jeffery}@cs.uidaho.edu fault-free behavior if F),(L )ID =Multi-core F (L ). Let Ci ID be the differ in layer i [email protected] andModel i+1, i.e., if F (L ) = � F (L where Richland, WA 99354-1641 Moscow, ID 83844 2nd Workshop on Self Adapti A Hierarchical Formal for N-variant Executions in Systems Richland, WA 99354-1641 Moscow, 83844-1010 {krings,rinker,jeffery}@cs.uidaho.edu ming dating back the asdf ming dating back the late 70s asdf A Hierarchical Formal Model for in [1 ming dating backrespect theN-variant late [1]Executions and N-variant pM asdf i i 261 Li the Tan Axel Krings,Robert Rinker, Clinton Jeffery {krings,rinker,jeffery}@cs.uidaho.edu condition that F i (Lwith ) �= respect F (Li+1 ). {krings,rinker,jeffery}@cs.uidaho.ed This with to70s the F i (Ljfor ) indicates functional specification 2614, pp 275-283, May, 2001. [email protected] {krings,rinker,jeffery}@c [email protected] In N-version program In N-version programming itsev is A Hierarchical Formal [email protected] Model N-variant Executions in Multi-core Systems In N-version programming it is assumed that j i+1 i Washington State University University of Idaho figure we have and indicates C2for . N-variant to layer L . Since includes FFormal the C MRM AFHierarchical Executions inderive Multi-core 1Model cross-layer monitor ware programs development ware development groups deriv [7]Syst A.RingN wareMoscow, development groups base Ai (L Hierarchical Model for N-variant Executions in Multi-core Systems Abstract reduce or eliminate common mode faults. i i Formal i+1 [7] A. Nguyen-Tuong, et. al., Secu Lithe Tan Axel Krings,Robert Richland, WAif99354-1641 ID 83844-1010 Li Tan Axel Krings,Robert Rinker, Clinton Jeffery Abstract reduce or eliminate common mode faults. Dissimilarity isspecification fault-free behavior F ) = F (L ). Let C be C Abstract eliminate common mode faults.same Dissimilarity isdan 1 reduce or 2 same specification in isolation. in same specification in isolation. The expectation iso Washington State University 1 Axel 1 Introduction typically discussed in the context of N-v 1 State Introduction iIntroduction dant DataofDiversity, DSN, Anch typically discussed in University the context N-version [email protected] {krings,rinker,jeffery}@cs.uidaho.edu Washington University University of Idaho Li Tan Clinton Jeffery typically discussed in of theor context N-version programcondition that F i (L ) �=Krings,Robert F i (Li+1 ) Rinker, helps to reduce or eliminate com helps to reduce or el helps to reduce eliminate common mode faults Abstract reduce or eliminate common mode faults. Dissimilarity is Richland, WA 99354-1641 Moscow, ID 8 Abstract reduce or eliminate common mod ming back the late 70s [1] and N-v reduce or eliminate common mode faults. Diss Li Tan Axel Krings,Robert Rinker, Clinton Jeffer Li Tan ming Axel Krings,Robert Rinker, Clinton Jeffery dating back the late 70s [1] and N-variant programs. asdf Richland, WA layerasdf controlAbstract mingID dating back the latedating 70s [1] and N-variant programs. asdf 99354-1641 Washington State University University of Idaho Moscow, 83844-1010 proach inspired by N-version sof proach inspired by N-version software is N-variant proach inspired by N [8] B. [email protected] {krings,rinker,jeffery discussed in the context of N-version programUniversity University of Idaho 1 Washington 3 1State 1 1typically 2 3 1 2 3 1 2 3 Washington 2 2 3 State 2 3 [8] B. Salamat et. al. Multi-Varia 2 3 1 2 3 1 2 3 typically discussed in the contex In N-version programming it is assumed University of Idaho typically discussed in the context of N-versio InLN-version programming it L isdown assumed that several soft5 L)upλConclusion m(λ ) m(λ ) λ Lup) m(λ LMoscow, L L LL m(λ L L L m(λ )L83844-1010 m(λ )λ L L variant Ldown Ldown N-version programming itUniversity is assumed that several softup up99354-1641 up up down down down down up LIn updown Richland, WA asdf 99354-1641 [email protected] ID down {krings,rinker,jeffery}@cs.uidaho.edu Richland, Moscow, ID 83844-1010 software, where different variants are gene variant software, where different variant software, whe tion ming datingdevelopment back the lateup 70s [1]derive and N-variant programs. ware development groups derive program ming dating back the late 70s [1] asdf Richland, WAWA 99354-1641 Moscow, ID70s 83844-1010 tion: Using Multi-Core System ming dating back the late [1] and N-varian asdf ware groups programs based on the ware development groups derive programs based on the 5 Conclusion [email protected] {krings,rinker,jeffery}@cs.uidaho.edu [email protected] {krings,rinker,jeffery}@cs.uidaho.edu Abstract reduce or eliminate common mode faults. Dissimilarity more automated fashion. Again, the expectation more automated fashion. Again more automated fash same specification in isolation. The expec In N-version programming it is assumed that several softOve 2 [email protected] {krings,rinker,jeffery}@cs.uidaho.edu In N-version programming it is asi same specification in isolation. The expectation is that this In N-version programming it is assumed that same specification in isolation. The expectation is that this Overflow Vulnerabilities, CISIS 1 Introduction 12 Introduction Abstract reduce or affect eliminate common outline 12 Introduction typically discussed in the context of N-version program outline This fault affecting one variant will not another. 2 outline fault affecting one variant will research defined a hierarchical formal model for Nfault affecting one v helps to reduce ormode eliminate common mod warehelps development groups derivedevelopment based on the to reducehelps or eliminate common mode faults. An apware development groups derive to reduce orprograms eliminate common faults. An apware groups derive programs bac typically discussed in the ming dating back the late 70s [1] and N-variant program asdf cases ifby afaults. difference isbased detected between outputs cases if amode difference is detected Abstract reduce or eliminate common faults. Dissimilarity is especially suitable for systems on [9] P.is Th Tg This research defined formal Nproach inspired by N-version N cases ifinsoftware aor difference [9] P. and Y.-K. inspired by software isThambidurai, N-variant or multilayer proach inspired N-version software is N-variant multiming dating back the Park late 70 asdf Abstract reduce or eliminate common mode Dissimilarity is specification in isolation. The expectation is that this same specification isolation. 1 22a hierarchical 2variant 31 1 executions 2same 3proach 1 model 2 for 3N-version same specification in isolation. The expectation 1 2 3 1 2 3 1 3 1 2 3 1 3 2 3 m(λ )L )Lλdown Lup Ldown Lup Ldown Ldown 1Abstract Introduction 1down m(λ ) m(λ ) 1λ L L)up L Introduction m(λ m(λ λmodels Lup Lm(λ LIntroduction Lmodels L typically discussed in the of N-version program-sof InLN-version programming it iscontext assumed that several up by two versions or variants aInfault is detected. reduce or eliminate common mode faults. Dissimilarity is reduce intro fault background toL related work down up) L upfault up down down down N-version programming by two versions or variants a fau 1) intro background to related work implementation variant software, where different variants a variant software, where different variants are generated in a variant software, where different variants are generated in a Abstract or eliminate common mode faults. Dissim multi-core architectures. The model has two dimensions tenc variant1)up executions especially suitable for systems based on typically discussed in the context of N-version programby two versions or va 1) intro fault models background to related work tency with Multiple Failure Mo helps to reduce or eliminate common mode faults. An apto[1] reduce or eliminate comm helps reduce orhelps eliminate mode fau mingto dating back the late 70s andcommon N-variant programs. asdf wareprogramdevelopment groups derive programs based on thd typically discussed in2.1 the given context of N-version groups 2) Architecture system view by collectypically discussed in context ofof In N-version programming it isthe assumed that several softmore automated fashion. Again, the expectation isN-version that asoftw 2) 2.1 Architecture system view given by collecThe availability of unused ordevelopment more automated fashion. Again, the expectation isware that aunderutilized more automated fashion. Again, the exppco ming dating back the late 70s [1] and N-variant programs. asdf 2) Architecture 2.1 system view given by collecmulti-core architectures. The model has two dimensions The availability unused proach inspired by N-version software is N-varia proach inspired by N-version proach inspired by N-version software is N-variant or multion Reliable Distributed System The availability same specification isolatio 3dating 1 back 3 late 170s13[1] 2and 3 3 same 12 the 2model 1 N-variant 1programs. 2 wareming 3 dating 32 1outline 32.2 1ming specification in isolation. The expectation isboth that thi asdf groups programs based onin the tion of updated capability independence back thederive latereliability, 70s [1] and N-variant pro asdf )2Fm(λ )down m(λ )2FViup V3 up Vup Vdown V V2programming 2 1 )outline m(λ )updated m(λ m(λ ) down VIntroduction Vup Vupfault Vdown Vitdown Vdevelopment m(λ m(λ2 )1m(λIntroduction ) m(λ Vup V1up Vi up V2.2 22Vtion outline affecting one variant will not affect another. In fault affecting one variant will not affect another. In both been exploited tobeen increase security an fault affecting one variant will not affect ofVdown model independence up down down N-version is assumed that several softdown helps to reduce tion of)variant FIncapability 2.2 updated capability model independence exploited to increase relia same specification in isolation. The expectation is or thateliminate this variant software, where different variants are ge to support fault detection and real-time Mulvariant software, where different v software, where different variants are generated in abeen iit adaptation. pp. 93-100, Oct. 1988. In N-version programming is assumed that several softhelps to reduce or eliminate common mode faults. An ap 1 dependencies Introduction exploited to in In N-version programming it is assumed that seve of Fi s (later consider (spatial[sharing], or 2 cases ifis1ability. aderive difference is detected between outputs generated cases if a difference detected between outputs generated toor eliminate common mode faults. An ap-syso Redundant ininspired multi-core cases ifability. afashion. difference is detected between proach by N-version ware development groups programs based onautomated the ofware Fi s development (later are consider dependencies (spatial[sharing], 1 2 3 2helps 3automated 1reduce 2or 3executions of F s (later consider dependencies (spatial[sharing], or groups derive programs based on the Redundant executions more Again, the expectati more fashion. Again, m(λ ) m(λ ) m(λ ) V V V V V V more automated fashion. Again, the expectation is that a tiple levels of functionality implemented in layers. At proach inspired by N-version software is N-variant or mult i ware development groups derive programs based up up uporinspired ability. Redundant down down down issoftware 11) intro 2fault3 models 1to2.3 2background 1architecture 2 1work 3 two proach by N-version is N-variant orwhere multi-diffe layered describe using 1 2to related 3 2by 3 1 versions 2 byor 3 variants two versions variants ainfault detected. variant software, aconsidered fault istwo detected. intro fault m(λ models background related work )outline m(λ ) λ time[sequential]) L1) L L3up L Lup been inor the context m(λ )outline m(λ ) 2.3 m(λ ) Vupsame V V Vdown byspecification versions variants agenerated fault istransie detec intro models toV2.3 related work specification inVdown isolation. The expectation is[4] that this layered architecture describe using up up up down down down down 2time[sequential]) same specification inLbackground isolation. The expectation isarchitecture that this 2fault variant software, where different variants are in a in same in The expectation 1 1) Introduction been considered in [4]offashion. in the fault affecting one variant will not affect anoth time[sequential]) layered describe using variant software, whereanother. different variants are generated fault affecting one variant will no faultby affecting one will not affect Inisolation. both 1 Introduction 2 outline more automated Ac 112) Introduction been considered inis 2)FArchitecture system view given collece.g., Fview of2.1 Figure Fig-two-2-Dim-Capability.pdf i s, 2) Architecturetwo 2.1 system given by collecThe availability of unused or underutilized cores hasdegre where the number of replicas determined the helps view to reduce or eliminate common mode fashion. faults. An ap-the expectation is that more automated Again, a Architecture 2.1 system given by collec-

Fi

Li

L iup

availability of unused has helps reduce common mode faults. An ap- helps two Fi s,toe.g., F1ortwo ofeliminate Figure Fig-two-2-Dim-Capability.pdf tounderutilized reduce or is eliminate common mode faults. 2 if outline The availability ofexpectation unused or underu fault affecting one variant more automated fashion. Again, the is that cases ifaffecting aor difference between the number of Fi s,cases e.g., FThe Figure Fig-two-2-Dim-Capability.pdf cases ifdetected acores difference isreplicas detected bw difference is detected between outputs generated outline 1a of tion3)ofAdaptation Fi 2.2tion updated capability model independence fault onewhere variant will not affect another. In output both det where the number of and2Fiinspired Reconfiguration describe how one tion of 13Fi 2.2 updated capability model independence been exploited to increase reliability, security and survivtolerance. Idle resources have also been used too proach inspired by N-version software isdifference N-variant orN-version multiof 2.2 capability model proach by N-version or multiproach inspired by software is N-variant been exploited to increase reliability, security and survivcases if a difference is detec 2 outline 2 3models 1 Adaptation 2 2 updated 3 and 2 1 2 1 2 1)3 intro 31 fault 3) Reconfiguration describe how one 1 background 3 to 1 2 software 3work 1 is N-variant 2independence 3 been exploited to increase reliability, sec fault affecting one variant will not affect another. In bot cases if a is detected between outputs generated by two versions or variants a fault is detected. related tolerance. Idle resources have m(λ ) m(λ ) m(λ ) V1) VFup V m(λ m(λ ) Vm(λ )sV Vdown Vupto Vup Vm(λ )Vm(λ )Vm(λ ) or Vup Vup Vup Vdown VReconfiguration Vdown awork by two versions or variants a fault 1) intro fault models background to related 3) Adaptation and describe how one of F (later consider dependencies (spatial[sharing], or by two versions or variants fault is detected. fault models background related work up up) V up down down down down down down i would move intuitively between levels etc (similar to 2.3. tolerance. Idle resou ofintro s (later consider dependencies (spatial[sharing], ability. Redundant inin multi-core systems has variant software, where variants genera variant software, different are generated inby a two i two versions orare variants 1)variants intro fault models background to related variant software, where different variants are generated indifferent aby security. For example, [3] sethas of automatically ofwould F consider dependencies (spatial[sharing], versions or variants asystems fault isaexecutions detected. 1) intro fault where models background to related work ability. Redundant executions inexecutions multi-core i s (later move intuitively between levels etc (similar to 2.3. ability. Redundant in multi-c cases if a or difference iswork detected between outputs generate Architecture 2.1 system view given by collecFor example, in [3] aors time[sequential]) 2.3 layered describe using 2)architecture Architecture 2.1intuitively system view given by collecwould between levels etc (similar to 2.3. 2)time[sequential]) Architecture 2.1 system view given collec2)by Architecture 2.1 move system view given by collecin2) CSIIRW paper) 2) Architecture 2.1 view given collecThe availability ofthat unused or underutilized more automated fashion. Again, the expectation 2.3 layered architecture describe using been considered in [4]execute inbysecurity. the context transient faults, more automated fashion. Again, the expectation issystem thatin aor The availability ofFor unused security. exampl The availability of unused orhas underutilized cores hasdiff The availability of unus more automated fashion. Again, the expectation is aof The availability of unused underutilized cores fied variants on the same inputs. Any time[sequential]) 2.3 layered architecture describe using been considered in [4] the context of transient faults, been considered in [4] in the context of in CSIIRW paper) by two versions or variants a fault is detected. 1) intro fault models background to related work tion tion ofcapability Fiof2.2F updated capability model independence 2 outline two F s, e.g., F of Figure Fig-two-2-Dim-Capability.pdf tion of F 2.2 updated model independence tion of F 2.2 updated capability model independence fied variants execute on the sam 2 outline been exploited to increase reliability, security and survivi 1 fault affecting one variant will not affect another. i 2.2 updated capability model independence i in CSIIRW paper) been exploited increase fault affecting one variant willwhere not affect another. Inreferencing both where the number of security replicas determined the degree oftosecurity fault 4) Model Analysis a)F use RBD simple k-of-N compoe.g.,updated F1 of Figure Fig-two-2-Dim-Capability.pdf i of 2 Foutline been tothe increase reliability, tion two of capability model independence iis,2.2 been exploited to increase reliabi fault affecting one variant will notexploited affect another. In both fied variants execute memory of different variants can betr the number of determined degree of fault two F s,Model of Figure Fig-two-2-Dim-Capability.pdf been exploited to increase reliability, and survivofe.g., (later consider dependencies (spatial[sharing], orreplicas isystem 1view is F ability. Redundant executions in multi-core systems has where the number of replicas determined 2)of Architecture 2.1 given by collecof F s (later consider dependencies (spatial[sharing], or 4) Analysis a) use RBD of simple k-of-N compoi and how one describe cases if of a difference is detected between outputs g ability. Redundant executi The availability unused orbeen underutilized ha F3)iReconfiguration sAdaptation (later consider (spatial[sharing], or referencing memory of cores differen casesReconfiguration ifdependencies a difference isdescribe detected between generated tolerance. Idle resources have also used tomulti-core increase of Fi show (later consider dependencies (spatial[sharing], Adaptation and describe one sitions 4) Model Analysis a)outputs use RBD of simple k-of-N compotime[sequential]) 2.3 layered architecture ability. Redundant of Fi s 3) (later consider dependencies (spatial[sharing], orGSPN cases if Redundant a difference isusing detected between outputs generated ability. Redundant executions ins been in [4] inexecutions the context ofin transient faults,cod tolerance. Idle resources have also been used to increase and used toorversions detect the execution of injected referencing memory 3) Adaptation and Reconfiguration describe how one time[sequential]) 2.3executions layered architecture describe using Fig. Complete Model ability. inbyconsidered multi-core systems has tion time[sequential]) of Fi move 2.2 updated capability independence tolerance. Idle resources have also been been considered in [4] in t two or variants adetect fault is the detected. 1)3. intro fault models background to work would intuitively between levels etc (similar toisrelated 2.3. sitions two F Fmodel of Figure Fig-two-2-Dim-Capability.pdf been exploited to increase reliability, security and surviv by two versions or variants a fault detected. 1) intro fault models background to related work i s, e.g., 1to 2.3 layered architecture describe using security. For example, in [3] a set of automatically diversiwhere the number of replicas determined the degree of fault and used to executio would move intuitively between levels etc (similar 2.3. time[sequential]) 2.3 layered architecture describe using into similar to CSIIRW sitions been considered in [4] in the context of trans two F s, e.g., F of Figure Fig-two-2-Dim-Capability.pdf by two versions or variants a fault is detected. 1) intro fault models background to related work time[sequential])of2.3Filayered architecture describe using For(similar example, in tolerance. [3] amulti-variate set of automatically diversii given 1in been considered innumber [4] in the co program execution asof used would move intuitively levels etc toilarely, 2.3. where replicas and toincrease detect t 3) Adaptation andbetween Reconfiguration describe how one 2) Architecture 2.1 system view by collecbeensecurity. considered [4] in the context of transient faults, s by (later consider dependencies (spatial[sharing], or security. Forone example, inthe [3] atosystems set ofin autom Idle resources have also been used in CSIIRW paper) The availability ofmulti-variate orused underutilized co 2) Architecture 2.1 system viewpaper) given collecsimilar to CSIIRW ability. Redundant executions inunused multi-core ha fied variants execute on the same inputs. Any difference two FFig-two-2-Dim-Capability.pdf e.g., Finview ofinto Figure Fig-two-2-Dim-Capability.pdf The availability ofF1unused or underutilized cores has in CSIIRW 3) Adaptation and Reconfiguration describe how ilarely, program e i s,system 1tion Fi collecs, e.g., of Figure Fig-two-2-Dim-Capability.pdf would move intuitively between levels etc (similar to 2.3. tolerance. Idle resources where thefied number of replicas determined thedete deg 2) 2.1 given by into similar to CSIIRW fied variants execute on the same inputs. Any difference in security. For example, in increase [3] a has set ofilarely, automatically diversidefuse buffer-overflow vulnerabilities. Variants usha two capability Fi s, e.g.,Architecture Fmodel of Figure CSIIRW where the number of replicas of Ftwo 2.2 updated capability model independence The availability of unused or underutilized cores 1 time[sequential]) multi-variate ipaper) 2.3 layered architecture describe using 4) Model Analysis a) use RBD of simple k-of-N compobeen exploited to reliability, security and where the number of replicas determined the degree of fault variants execute on the same inputs. A tion of Fi 2.2 updated independence been considered inof [4] inon context ofAny transient fault would move intuitively between levels etc (similar tothe 2.3. referencing memory different can be observed inexploited CSIIRW paper) been to increase reliability, security and surviv4) Model Analysis use RBD of simple k-of-N composecurity. For example, 3)a)Introduction Adaptation and Reconfiguration describe how one fied variants execute thevariants same inputs. difference inin [3] defuse buffer-overflow vulnerab 3)consider Adaptation and Reconfiguration describe how one of Fi(spatial[sharing], 2.2 updated capability model independence referencing memory of different variants can beallocation observed of4) FModel (later (spatial[sharing], or tolerance. Idle resources have also been used ent directions inused memory somulti-core that buffer i sFig-two-2-Dim-Capability.pdf Analysis a)dependencies use RBD of simple k-of-N compo3) tion Adaptation andF describe how one 3iReconfiguration tolerance. Idle resources have alo been exploited to increase reliability, security and survivability. Redundant executions incan syst two s, e.g., F1orof Figure 4) Redundant Model Analysis a) use RBD of simple k-of-N compositions defuse buffer-overflo of Fi s (later consider dependencies in CSIIRW paper) referencing memory of different variants tolerance. Idle resources have also been to increase referencing memory of different variants be observed where the number of replicas determined the degree of fau and used to detect the execution of injected code. Simfied variants execute on the ability. executions in multi-core systems has sitions would move intuitively between etc (similar to 2.3. 2.3levels layered architecture describe using ent directions in memory and used tolevels detect the execution of injected code. ofmove FThe consider (spatial[sharing], or sitions 3time[sequential]) Introduction would intuitively between etc (similar to 2.3. security. For example, inhas [3] aSimset of automatica i s (later considered indetect [4] inreferencing the context of allocat transien “crashed” into different neighboring memory, whi analysis ofdependencies thelevels multi-level and multi-layer approach isilarely, described using sitions 4) Model Analysis a) RBD of simple k-of-N compoused to detect the execution of injected code. Simability. Redundant executions inbeen multi-core systems 3) Adaptation and Reconfiguration describe one time[sequential]) 2.3 would layered architecture describe using into similar to CSIIRW security. For set intuitively between etc (similar to move 2.3. memory diff and used toexecution the execution of injec ent directions in been inCSIIRW [4] in thehow context of transient faults, tolerance. Idle resources have also been used to increas multi-variate program asexample, used in in [7][3] tooftoamem 3Figure Introduction security. For example, inuse[3] aand set of automatically diversiinto similar to into similar to in CSIIRW two Fi s,considered e.g., F1 ofusing Fig-two-2-Dim-Capability.pdf 1 2 ilarely, multi-variate program execution as used in [7] CSIIRW paper) sitions ilarely, multi-variate program execution as used in [7] to time[sequential]) 2.3 layered architecture describe “crashed” into different neighbo where the number of replicas determined the degree in CSIIRW paper) fied variants execute on the same inputs. Any be detected. The concept of N-variance can also bda and used used to on detect the exec into similar to CSIIRW two Fi s, e.g., F1 of Figure Fig-two-2-Dim-Capability.pdf the example ofmove functionality F shown in Figure 2. F has three levels, F , F been considered in [4] in the context of transient faults, between levels etc (similar to 2.3. 1 1 fied variants execute the same in CSIIRW paper)would where the number of replicas determined the degree of fault Mostintuitively new general purpose computers incorporate dual 1 1 ilarely, multi-variate program execution defuse buffer-overflow vulnerabilities. Variants differsecurity. For example, in [3] a set of automatically divers “crashed” into differ defuse buffer-overflow vulnerabilities. Variants used differ3) Adaptation and describe how onesame fied3 variants execute on the inputs. Any difference in into similar to CSIIRW tolerance. Idle resources have also been used topb 1Model 2 ofReconfiguration 4)how Model a) new use RBD simple k-of-N compodefuse buffer-overflow vulnerabilities. Variants used differilarely, multi-variate progra two FiFs, 3e.g., FCSIIRW Figure Fig-two-2-Dim-Capability.pdf be detected. The concept ofcan N-v 3) Adaptation and Reconfiguration describe oneofAnalysis 1 of 4) Analysis a) use RBD of simple k-of-N compotoreferencing the representation ofallocation data, rather than only the memory of different variants where the number of replicas determined the degree of fault entin directions in memory so that buffer overflows in paper) their respective layers L , Lresources and Lhave use 3-variant, 2-variant, and 3move Introduction Most general purpose computers incorporate dual tolerance. Idle also been used to increase referencing memory of different ent directions memory allocation so that buffer overflows oruse quad-core processors and higher numbers of cores are on defuse buffer-overflow vulnerabilities. Var 4) and Model Analysis a) RBD simple k-of-N compo1 , and fied variants execute on the same inputs. Any difference i 3 Introduction would intuitively between levels etc (similar to 2.3. be detected. The con security. For example, in [3] a set ofwhich automatically referencing memory of computers different variants can be observed defuse buffer-overflow Most new general incorporate dual “crashed” intoso different neighboring memory, couldvulne ent directions in memory allocation that buffer overflows 3) Adaptation and Reconfiguration describe how one sitions would move intuitively between levels etc 4) (similar toAnalysis 2.3. respectively. 3simplex Introduction to the representation ofbe data, rath as was shown in [6]. sitions Model use RBD of simple k-of-N compotolerance. Idlepurpose resources have also been used to increase security. For example, in and [3] ahigher set of capabiliautomatically diversiand used toThe detect the execution of injected c implementations If we consider each layer individually then, or quad-core processors numbers of cores are on ina) CSIIRW paper) “crashed” into different neighboring memory, which could the horizon. Whereas in theory the computational ent directions in memory allocation so that referencing memory of different variants can observe and used to detect the execution sitions ent directions in memory allo 3 Introduction fied variants execute on the same inputs. Any diffe be detected. concept of N-variance can also be applied 3 Introduction to the representation Most newtogeneral purpose computers incorporate and used to detect the execution ofautomatically injected code. Sim-could “crashed” into different memory, which in CSIIRW paper) or2.3. quad-core processors and higher numbers of cores are on would move intuitively between levels etc (similar 1dual into similarwith to CSIIRW variants execute on the same inputs. Any difference in was shown inalso [6]. sitions security. For example, in [3]ilarely, aneighboring set of diversi4) Model Analysis a) use RBD of simple k-of-N compointo similar to CSIIRW multi-variate program execution as Sim use to the representation of data, rather than only theresearch programs, detected. The concept N-variance can be applied into different neig given the levels redundancy, one can mask one value fault atbe L ,on one can only thefied horizon. Whereas in theory the computational capabiliand used to detect theofas execution of“crashed” injected code. tiesof increase the of cores, it becomes difficult The common threat in the previous is o referencing memory of different variants can be “crashed” into different neighboring ornumber quad-core processors and higher numbers of cores are ilarely, multi-variate program exe Most new general purpose computers incorporate dual intoin similar topaper) CSIIRW as was shown inmem [6]. 4) Model Analysis a) use RBD of new simple k-of-N compoilarely, multi-variate program execution as used in [7] to be detected. The concept of N-variance can also be applied CSIIRW as was shownAny in [6]. the Whereas in theory the computational capabilireferencing memory of different variants can be observed 2horizon. sitions detected. The concept oft Most general purpose computers incorporate dual the horizon. Whereas in theory the computational capabiliinto similar to CSIIRW fied variants execute on the same inputs. difference inbe to the representation ofdetected. data, rather than only the programs, and used to detect the execution of injected cod defuse buffer-overflow vulnerabilities. Variants Most newneither general purpose computers incorporate dual detect (but not mask) one value fault at L , and there is detection nor ilarely, multi-variate program execution as used in [7] to exploit sufficient parallelism to keep cores utilized. Most tiple variants execute in order to allow detection or be The concept of N-variance ca ties increase with the number of cores, it becomes difficult The common threat in the pre or quad-core processors and higher numbers of cores are on defuse buffer-overflow vulnerabili sitions Most new purpose computers incorporate dual ties increase with the number ofor cores, it injected becomes difficult The common threat in on the previous is that multowith theofthe representation of data,in rather than only theprogram programs, to theresearch representation of data,i 4) Model processors Analysis a) use RBD simple k-of-N compoand togeneral detect the execution code. Simdefuse buffer-overflow vulnerabilities. Variants used differintoused similar to CSIIRW 3of numbers orcorrection quad-core and higher of cores are on quad-core processors and higher numbers of cores are ties increase number of cores, it becomes difficult The common thre ilarely, multi-variate execution as used referencing memory of different variants can be observed as was shown [6]. ent directions in memory allocation so that buffe defuse buffer-overflow vulnerabilities. Variants used diffe potential atapplications L . 3the Introduction horizon. Whereas in theory the computational capabilicommon still allow little parallelism and it is of faults. The multiple executions introduce of co to the representation of data, rather than on to exploit sufficient parallelism to keep cores utilized. Most tiple variants execute in order to allow detection or masking to exploit sufficient parallelism to keep cores utilized. Most tiple variants execute in order to into similar to CSIIRW sitions ent directions in memory allocatio as was shown in [6]. 3multi-variate Introduction or quad-core processors and higher numbers ofused cores are asexecution was in [6]. ilarely, program as in [7]on tofaults. the horizon. Whereas init to theory the computational capabilidefuse buffer-overflow vulnerabilities. Variants ent directions in memory allocation so that buffer overflows horizon. Whereas in theory the computational capabili3 the Introduction 1 idle. and used toshown detect the execution of injected code. Simto exploit sufficient parallelism keep utilized. Most tiple variants execute common still little parallelism and is of cores The executions introduce course sig- use ent directions in memory allocation so that buffer overflow ties increase with the number ofapplications cores, it theory becomes difficult The common threat inmultiple the previous research isofthat mul“crashed” into different neighboring memory, w 3 Introduction likely that cores may bebuffer-overflow underutilized or allow running nificant overhead. In fact, N-folding the amount The reliability and survivability of functionality F is modeled in the generas was shown in [6]. common applications still allow little parallelism and it is of faults. The multiple executio “crashed” into different neighbori ties increase with the number of cores, it becomes difficult The common threat in the the horizon. Whereas in the computational capabilidefuse vulnerabilities. Variants used differthat cores may be underutilized or running idle.threat overhead. In fact, N-folding the amount of buffer work o ent directions inwhich memory so that similar to CSIIRW 3 parallelism Introduction ties into increase with the number of cores, itlikely becomes difficult The common in nificant the previous research iscould that mul“crashed” into different neighboring memory, ilarely, multi-variate program execution as used inallow [7] toallocation common applications still allow little parallelism and it is of faults. The multip “crashed” into different neighboring memory, which coul to Most exploit sufficient to keep cores utilized. Most tiple variants execute in order to detection or masking be detected. The concept of N-variance can also to be done is even augmented with the overhead Redundant executions to benefit reliability has been exto exploit sufficient parallelism to keep cores utilized. Most tiple variants execute in orde alized stochastic Petri net shown in Figure 3. The net is drawn as three subnets: to be done is even augmented with the overhead of manRedundant executions to benefit reliability has been exlikely that cores may be underutilized ordual running idle. nificant overhead. Inprevious fact, N-fo ent directions inthe memory allocation soitthat buffer overflows newtogeneral purpose computers incorporate 3 Introduction be detected. The concept of N-var ties increase with number of cores, becomes difficult Theinto common in the rese “crashed” different neighboring memory, whic Most new general purpose computers incorporate dual to exploit sufficientcommon parallelism keep cores utilized. Most tiple variants execute inN-variance order to allow detection orthreat masking defuse buffer-overflow vulnerabilities. Variants used bethe detected. The concept of can also be applied tensively used in fault-tolerant systems design where the aging the different variants. However, given the availabilcommon applications still allow little parallelism and it of isdifferofHowever, faults. The multiple exec applications still allow little parallelism and it memory, is of faults. multiple executions introduce of course bewhich detected. The concept of can also besigapplie likely that cores may be underutilized or running idle. overhead. Iw Most generalMost purpose incorporate dual tensively used in fault-tolerant systems design where the aging the different variants. given the toThe the representation data, rather than only th new computers general purpose computers incorporate dual “crashed” into different neighboring could thenew cross-layer monitor, the layer control and layer implementation. The be detected. The concept of N-variance can also be to beN-variance done isnificant even augmented Redundant executions to benefit reliability has been exor quad-core processors and higher numbers of cores are on to exploit sufficient parallelism to keep cores utilized. Most tiple variants execute in order to allow dete to the representation of data, rathe Most new general purpose computers incorporate dual ity of slack-time in cores this overhead has to potential to evolutions of redundancy schemes has gone from homocommon applications still allow little parallelism and it is of faults. The multiple executions introduce of course siglikely that cores may be underutilized or running idle. nificant overhead. In fact, or quad-core processors and higher numbers of cores are on ent directions in memory allocation so that buffer overflows 3three Introduction likely that cores may be underutilized or running idle. nificant overhead. In fact, N-folding the amount of has work to the representation of data, rather than only the programs, tolayer the representation of data, rather than only the program to be done is even a Redundant executions to benefit reliability has been exbe detected. The concept of N-variance can also be applied to theof representation ofvariants data, rather than only theHo pr ity of slack-time in cores this overhead to po evolutions of redundancy schemes has gone from homoas was shown in [6]. or quad-core processors and higher numbers of cores are on orincorporate quad-core processors and higher numbers cores are on N-variant layers are modeled in the three subnets of the implemenbe reasonably absorbed. If can execute on idle geneous redundancy to of heterogeneous systems. Heterogeor quad-core processors and higher numbers of cores are on Most new general purpose computers dual tensively used in fault-tolerant systems design where the aging the different variants. common applications still allow little parallelism and it is faults. The multiple executions introdu the horizon. Whereas in theory the computational capabilito be augment Redundant executions to benefit reliability hasthe been ex-could asamount was shown indone [6].isofeven likely that cores may Redundant be underutilized or running idle. nificant overhead. In fact, N-folding of work “crashed” into different neighboring memory, which horizon. Whereas in theory the computational capabilito be done is even augmented with overhead manexecutions tothe benefit reliability has been exas was shown in [6]. cores, then thiswhere overhead can the be absorbed. What rests is v neous redundancy implies that dissimilar redundant func1 2 as was shown in [6]. as was shown in [6]. to the representation of data, rather than only the programs, be reasonably absorbed. If variants can execute geneous redundancy to heterogeneous systems. Heterogetensively used in fault-tolerant systems design the aging the different the horizon. Whereas in theory the computational capabilithe horizon. Whereas in theory the computational capabilior quad-core processorsthe and higher numbers of cores are on horizon. Whereas inand theory computational capabilitensively used ingone fault-tolerant systems designthe where theslack-time aging theindifferent variants. tation subnet represent TMR, duplex, simplex models ofwillfrom layers L common ,with L likely that cores may beand underutilized running nificant overhead. In manfact, N-folding theo ity ofThe cores this of redundancy schemes has homoties increase with the number ofexecuted cores, itthe becomes difficult The threat incommon the previous to be or done is even augmented overhead of Redundant executions tothe benefit reliability has exthe overhead induced by the frameworks that implement retionalities are under the assumption that thisidle. be detected. The concept of N-variance can also be applied tensively used inevolutions fault-tolerant systems design where aging the different variants. However, given the availabilties increase with number of cores, itThe becomes difficult threat inresearch the prev as increase was shown inbeen [6]. Most general purpose computers incorporate dual 3newties ties with the number ofThe cores, itthe becomes difficult common threat in the previous research isthtc ity of slack-time inthat cores evolutions offuncredundancy schemes hasThe gone from homoincrease with the number of cores, itkeep becomes difficult common threat in the previous research is mu cores, then this overhead can be absorbed. Wha neous redundancy implies that dissimilar redundant the horizon. Whereas in theLcomputational capabiliity of slack-time in evolutions of redundancy schemes has gone from homotiestheory increase with the number of cores, it becomes difficult common threat in the previous research is that muland respectively. Each of these three simple nets only model their associated to be done is even augmented with the o Redundant executions to benefit reliability has been exbe reasonably absorbed. If geneous redundancy to heterogeneous systems. Heterogeexploit sufficient parallelism to cores utilized. Most tiple variants execute in order to allow detection tensively used into fault-tolerant systems design where the aging the different variants. However, given the availability slack-time in cores this overhead potential to tova evolutions of redundancy schemes has gone from homoto representation of data, rathertiple than only the programs, behas reasonably absorbed. If geneous redundancy toofMost heterogeneous systems. Heterogeto exploit sufficient parallelism toutilized. keep cores utilized. Most tiple variants execute indetection order to exploit sufficient parallelism tothe keep cores variants execute in order totoallow oral or quad-core and higher numbers of cores are on ties increase with the number of cores, ittoprocessors becomes difficult The common threat inutilized. the previous research is that mulexploit sufficient parallelism to keep cores Most tiple variants execute in order to allow detection or maskin the overhead induced by the frameworks that imple tionalities are executed under the assumption that this will be reasonably absor geneous redundancy to heterogeneous systems. Heterogelayer in isolation, i.e., they constitute 2-of-3, 2-of-2, and 1-of-1 systems. Note to exploit sufficient parallelism to keep cores utilized. Most tiple variants execute in order to allow detection or masking tensively used in fault-tolerant systems design where the aging the different variants. However, giv cores, then this overhead can b neous redundancy implies that dissimilar redundant funccommon applications still allow little parallelism and it is of faults. The multiple executions introduce of ity of slack-time in cores this overhead has to potential to evolutions of redundancy schemes has gone from homocores, then this overhead ca neous redundancy implies that dissimilar redundant funcbe reasonably absorbed. If variants can execute on idle geneous redundancy to heterogeneous systems. Heterogeas was shown in [6]. common applications still allow little parallelism and it is of faults. The multiple executions introduce of cou common applications still allow little parallelism and it is of faults. The multiple executions to exploit sufficient parallelism keep cores utilized. Most the computational tiple allow variants execute in order to allow detection or masking the to horizon. Whereas inapplications theory capabilicommon still little parallelism and it is of faults. The multiple executions introduce of course sig cores, then this neous redundancy implies thatexecutions dissimilar redundant funcity of slack-time insigcores this overhead h evolutions of redundancy schemes has gone from homothe overhead induced byover the tionalities are executed under the assumption that this will thatapplications only thestill timed transitions the and duplex depend on the markcommon little parallelism and itTMR isbe underutilized ofrunning faults. The multiple introduce of course be reasonably absorbed. If variants can execute on idle redundancy toallow heterogeneous systems. Heterogethe induced the fram tionalities are executed under assumption this will that may beoflikely underutilized or idle. nificant overhead. In fact, N-folding the cores, then this overhead can be absorbed. What rests isamou neous redundancy implies that redundant funclikely that cores may or running idle. nificant overhead. In fact, N-folding the amount common applications still geneous allow little parallelism and it cores isof Thedissimilar multiple executions introduce of sigties increase withlikely the cores, itfaults. becomes difficult Thethe common threat inthat the previous research is overhead that multhat cores may be underutilized orcourse running idle. nificant overhead. In by fact, likely thatnumber cores may beofunderutilized or running idle. nificant overhead. In fact, N-folding theoverhead amount ofN-fold wor be reasonably absorbed. If variants can geneous redundancy to heterogeneous systems. Heterogethe induced tionalities are executed under the assumption that this will to be done is even augmented with the overhead Redundant executions to benefit reliability has been excores, then this overhead can be absorbed. What rests is neous redundancy implies that dissimilar redundant functhe overhead induced by the frameworks that implement retionalities are executed under the assumption that this will likely that cores may be underutilized or running idle. nificant overhead. In fact, N-folding the amount of work to be done is even augmented with the overhe Redundant executions to benefit reliability has been exlikely that cores may be underutilized or running idle. executions nificant overhead. In executions fact, N-folding the amount ofdone work to exploit sufficient parallelism to keep cores utilized. Most tiple variants execute to is allow detection or be masking to in beorder even augmented with the overhead of the man Redundant to benefit has been to done is However, even augmented wa Redundant toexbenefit reliability has been extensively used inreliability fault-tolerant systems design where the aging the different variants. given cores, then this overhead absorbe redundancy implies dissimilar redundant functhe overhead induced by thewith frameworks thatvariants. implement re-can be are executed under theneous assumption that this will that toof be done is even augmented the overhead of sigman- However, executions benefit has been extensively used in systems design where the aging the different given th tofault-tolerant be done isand even augmented with the overhead of manRedundant executions Redundant totionalities benefit reliability hastostill been excommon applications allow little parallelism it is faults. The multiple executions introduce of course tensively used in reliability fault-tolerant systems design where the aging the different variants. However, given thehas availabi tensively used in fault-tolerant systems design where the aging the different variants. How ity of slack-time in cores this overhead to pot evolutions of redundancy schemes has gone from homothe overhead induced by the frameworks th tionalities are executed under the assumption that this will tensively used in fault-tolerant systems design where the aging the different variants. However, given the availabiltensively used in fault-tolerant systems design where the aging the different variants. However, given the availability of slack-time in cores this overhead has to evolutions of redundancy schemes has gone from homolikely that cores may be underutilized or running idle.has toof nificant overhead. In fact, N-folding thecores amount of work ity of slack-time in this has potential evolutions of redundancy schemes gone from homobehomoreasonably absorbed. If variants can execute geneous redundancy heterogeneous systems. has Heterogeity ofoverhead slack-time in to cores this ovet evolutions redundancy schemes gone from ity of slack-time in cores this overhead has to potential to evolutions of redundancy schemes has gone from homobe overhead reasonably variants canonexec geneous to heterogeneous systems. HeterogeitytoHeterogeof inbe cores this hasabsorbed. tobeIf potential to evolutions of redundancy schemes hasneous gone from beslack-time done isredundant even augmented with thethen overhead of man-If Redundant executions toredundancy benefit reliability has homobeenimplies ex- that cores, this overhead cancan be execute absorbed. redundancy dissimilar funcreasonably absorbed. variants idl geneous redundancy to heterogeneous systems. reasonably absorbed. If What vari geneous redundancy heterogeneous systems. Heterogebe reasonably absorbed. If to variants can execute on idle geneous redundancy to heterogeneous systems. Heterogecores, thisexecute overhead canframeworks be absorbed. W neous redundancy implies that dissimilar redundant functhethen overhead induced byabsorbed. the thatrests imple tionalities are executed under the assumption thatcores, this will be reasonably absorbed. If variants can on idle geneous redundancy to heterogeneous systems. Heterogetensively used in fault-tolerant systems design where the aging the different variants. However, given the availabilthen this overhead can be What neous redundancy implies that dissimilar redundant funccores, then this overhead can be i neous implies that dissimilar cores, then thisredundancy overhead can be absorbed. What redundant rests is funcneous redundancy implies that dissimilar redundant functhereoverhead by the that im tionalities are executed under the assumption that thisthat will ityframeworks of slack-time inimplement cores this overhead hasinduced potential to isframeworks evolutions oftionalities redundancy has gone from homothe overhead induced bytothe frameworks that implement re areschemes executed under the assumption that this will cores, then this overhead can be absorbed. What rests redundancy implies that dissimilar redundant functhe overhead induced by the tionalities are executedneous under the assumption that this will the overhead induced by the frame tionalities are executed under the assumption that this will reasonably absorbed. variants canthat execute on idleregeneous to heterogeneous systems. Heterogethebeoverhead induced by theIfframeworks implement tionalities are redundancy executed under the assumption that this will cores, then this overhead can be absorbed. What rests is neous redundancy implies that dissimilar redundant functhe overhead induced by the frameworks that implement retionalities are executed under the assumption that this will 1

2

3

ings of their input places, reflected by the marking functions m(λ1 ) and m(λ2 ) respectively. The simplex at layer 3 has a fail rate of simply λ3 . The layer control subnet models the interaction between layers, which are assumed to be either up or down. Initially all three layers are up as indicated by the single tokens in L1up , L2up and L3up . If a layer fails, i.e., if the threshold of tokens is reached in one of the subnets of the layer implementation subnet, the layer control subnet automatically disables the corresponding layer and all higher layers, i.e., failure at layer i will shut down all layers ≥ i. For example, a failure of one of the variants in layer 2 results in a token being absorbed in 2 2 Vup . This in turn disables the inhibitory arc at Vup (which needs two tokes to inhibit) and the associated transition will “move” the token from L2up to L2down in the layer control subnet. The lack of a token in L2up will cause the transition between L3up and L3down to fire. The result is that both L2down and L3down have a single token. The probability of a token in Lidown is thus the unreliability of layer Li . Alternatively, the probability of a token in Liup is the reliability of Li .

Li+1

F i+1

Li+1

Fi

Li

Fi

Li

Fi

F i+1 F i

Fig. 4. Cross-layer monitoring scope

Monitoring across layers is modeled with the cross-layer monitoring subnet of Figure 3. To describe the exact approach for cross-layer monitoring consider Figure 4, which shows the relationship between two levels of requirements for functionality F , i.e., between F i and F i+1 . Cross-monitoring of layer Li on Li+1 is achieved by comparing results from Li+1 to those computed by the more reliable layer Li . However, the functional capabilities of the layers are not the same, e.g., as was shown in Figure 1. This makes realistic cross-layer monitoring dependent on the computations of F i+1 and F i that can be effectively compared. The resulting extreme cases are shown in Figure 4. To the left we have the scenario in which layer Li can monitor all functionalities of layer Li+1 . This is different in the right scenario in which layer Li can only monitor a subset of the functionality at layer Li+1 . Note that F i ≺ F i+1 . The respective implementations are in layers Li and Li+1 . Note that layer Li+1 consists of the implementations of the operations of the lower layer based on F i as well as the value-added operations specified by F i+1 \F i . Thus cross-layer monitoring is limited to operations specified by F i . Realistic cross-layer monitoring is anywhere in-between the two extreme scenarios and is application dependent. The Stochastic Activity Network (SAN) of cross-layer monitoring is shown in Figure 5. The transition is activated when operations specified by F i differ in

Li+1 up L iup

i

F i (L ) ≠ F i (Li+1) Li+1 down

Fig. 5. SAN for cross-layer monitoring

layer Li and Li+1 , i.e., if F i (Li ) 6= F i (Li+1 ), where notation F i (Lj ) indicates the functional specification with respect to layer Lj . Since F i+1 includes F i the MRM indicates fault-free behavior if F i (Li ) = F i (Li+1 ). Cross-layer monitoring is modeled in the cross-layer monitoring subnet of Figure 3 by translating the condition F i (Li ) 6= F i (Li+1 ) of the SAN in Figure 5 into cross-monitoring detection rates Ci in the GSPN. It is simple to establish the exact reliability of functionality F1 when the fail-rates are known. However, in the presence of malicious faults, e.g., hacking attacks or exploits, the assumption of constant fail-rates does not hold anymore. In that case, the GSPN stays the same, whereas the formal analysis of the net becomes much more complicated. The reason is that the constant fail rates of the timed transitions have to be replaced by time-dependent hazard functions. The GSPN of Figure 3 will be the basis for the reliability simulations in Section 6.

5

Stochastic Models

To evaluate the performance of our architecture, we model its stochastic behavior using probabilistic automaton. We then apply probabilistic model checking to study the performance of the model. We are particularly interested in two metrics: service availability and information security. To help illustrate our approach, we will use an example based on layer L1 of functionality F1 in Figure 2. To model the component-based design in Section 2.2, our stochastic model is a parallel composition of layers and the Monitoring and Reconfiguration Module (MRM), which in turn is a parallel composition of Monitoring and Reconfiguration Sub-Modules (MRSM). Each MRSM is associated with a layer. The MRSM monitors and reconfigures its associated layer by using the output from the layer below. This is consistent with the cross-layer monitoring SAN shown in Figure 5. Preliminary: We model our software architecture and its components as probabilistic finite automata [7]. A probabilistic finite automaton extends transitions in a finite automaton with probabilities. Formally a probabilistic automata is a tuple hQ, Θ, δ, Q0 , F, Pδ , P0 i, where Q is a set of states, Θ is a set of input symbols, δ ⊆ Q × Θ × Q is a set of transitions, Q0 ⊆ Q is a set of start states, F ⊆ Q is a set of accepting states, Pδ : δ → (0, 1] assigns each transition a probability, and P0 : Q0 → (0, 1] assigns each start state a probability. In addition, Σq∈Q0 P0 (q) = 1 and for each (p, a, p0 ) ∈ δ, Σq∈{q | (p,a,q)∈δ} Pδ ((p, a, q)) = 1.

A successful run of the probabilistic automaton B = hQ, Θ, δ, Q0 , F, Pδ , P0 i an a qn such that q0 ∈ Q0 , (qi , ai+1 , qi+1 ) ∈ δ for is a sequence ρ = q0 →1 q1 · · · → 0 ≤ i < n, and qn ∈ F . Given a sequence of inputs a1 · · · an , the probability that the automaton B runs ρ is P0 (q0 ) × Π0≤i