ing/Terrain Avoidance (TF/TA) System of a fighter ... The implementation of avionics systems of modern ..... counts the number of independent failures that oc-.
Safe Allocation of Avionics Shared Resources Laurent SAGASPE, Gérard BEL,Pierre BIEBER, Fréderic BONIOL, Charles CASTEL Oce National d'Etudes et de Recherches Aérospatiales Centre d'Etudes et de Recherches de Toulouse 2, avenue E. Belin, 31055 Toulouse cedex, France {name}@onera.fr
Abstract
We propose an approach to analyse the safety of avionic systems that takes into account the impact of computation and communication resource sharing. The approach is made of three main steps: use a formal notation to describe how failures propagate in the system under study, use model-checking tools to verify safety requirements and to derive allocation constraints, use a constraint solver to generate safe allocations. This approach is illustrated by the study of the Terrain Following/Terrain Avoidance (TF/TA) System of a ghter aircraft.
1.
The impact of IMA on System Safety
The implementation of avionics systems of modern civil (B787, A380) and military (F22, Rafale, Gripen, A400M,...) aircrafts tend to rely on an Integrated Modular Avionics (IMA) architecture instead of the more classical federated architecture. In a federated architecture, each system has private avionics resources whereas in an IMA architecture avionics resources can be shared by several systems. The types of avionics resources that are generally considered are computers with real-time operating systems or Local Area Network communication switches. Several benets of IMA are discussed in [5, 17]. The IMA architecture is supposed to help reduce the weight of the aircraft as less pieces of equipement and wiring are needed. It could also help to reduce the cost of maintenance as a unique type of spare equipement could be used to replace degraded equipements used by several systems. But, the IMA architecture has an important impact on system development as it is no longer possible to develop a system or a sub-system without considering its dependencies with other systems. Of course, in the current process direct depen-
dencies between systems are identied and taken into account. For instance, several aircraft systems such as ight controls depend on the navigation system or on the radioaltimeter system for Altitude data. Hence the impact of the loss of navigation or radioaltimeter data is taken into account when developping the ight control system. But resource sharing adds new indirect dependencies between systems or sub-systems. With respect to system safety, shared resources might cause common cause failures. These failures could break an important qualitative safety requirement stating that no single failure shall lead to a catastrophic situation. Let's consider two scenarios that show typical cases of indirect dependencies. • Scenario 1: Suppose that both navigation and radioaltimeter Altitude data are allocated to the same communication bus. Suppose now, that this bus is damaged, then all Altitude data would be lost. As this loss could lead to the loss of ight controls, this would certainly violate the previous qualitative safety requirement. This allocation causes an indirect dependency between the Navigation and Radioaltimeter. • Scenario 2: One way to correct this situation could be to use at least two independent comunication bus: one to transmit Navigation data and the other one to transmit RadioAltimeter data. Suppose now that navigation data are allocated to a communication bus that is also in charge of transmitting video signals. Then it could be the case that video signals cause a permanent overload of the communication resource that would lead to the loss of the transmission of navigation data. Again, a single failure of the bus transmitting the radioaltimeter data could lead to the loss of the ight controls. This allocation causes an indirect dependency between Navigation, Radioaltimeter and Video systems.
F P Modelling Model SR Validation Independence Assumptions Safe Allocation Gen
Failure Propagations Safety Requirements Architecture
Allocation
Figure 1. The 3 steps of the proposed approach
We think that new methods are needed to help the analyst assess the safety of a system under design while taking into account the impact of resource allocation. One solution we want to avoid is to include all systems that are indirectly dependent when assessing the safety of a sytem. For instance, we do not want to have to consider Navigation, Radioaltimeter and Video Systems when assessing the safety of the ight controls. Another option we want to avoid is to describe all the details of the actual communication and computation resources when studying the impact of the allocation on the safety of the system under design. This is because, as the system could be in an early design process, all the details of the architecture might not be known at this stage of the development. The approach to analyse the safety of systems with shared avionics resource that we propose is made of three main steps: 1. A formal notation is used to describe how failures propagate in the system under study. At the end of this stage, we obtain a failure propagation model that does not include details about resources and allocations. 2. The failure propagation model is used to study under which assumptions safety requirements are enforced. Temporal Logic is used to formalize safety requirements and assumptions. Verication is performed with a Model-checking tool. At the end of this stage, a set of independence assumptions are identied. 3. The architecture is described in terms of virtual resource. A virtual resource has attributes as a failure status, a list of connections with other resources and, for instance, a maximal load. Independence assumptions identied during the previous stage are used to derive allocation constraints. We use a constraint solver to generate allocations of tasks and data ows of the system to virtual resources. At the end of this stage, we have obtained
safe allocations, i.e. allocations that preserve the safety requirements of the system under design. In the following section we describe a Terrain Following/Terrain Avoidance (TF/TA) System of a ghter aircraft that is used to illustrate the proposed approach. Section 3 describes the rst step of the approach, we explain what failure modes we consider and how we model the faulty behaviour of a component and how they are combined to create a failure propagation model. Section 4 describes the second step of the approach, we explain how to formalize safety requirements and how we use the model-checker SMV to identify independence assumptions. Section 5 describes the third step of the approach, we explain how we model the allocation constraints and what tools we use to generate safe allocations. Finally, section 6 deals with the iterative application of the approach to tackle virtual resource renement. 2.
Case
study
:
Terrain
Follow-
ing/Terrain Avoidance
The Terrain Following/Terrain Avoidance (TF/TA) system provides the ight controls or pilot of an aircraft with climb or dive signals such that the aircraft will maintain as closely as possible a selected height above the ground. Figure 2 shows the main tasks and data ows of the TF/TA system. Before using the TF/TA system, the pilot enters through a dedicated panel (task TF/TA panel) the selected height of the aircraft. Using this information as well as Terrain information provided by the Radar and Speed data provided by the navigation system, a vertical acceleration is computed and sent to the ight control system. In parallel, an emergency climb alarm is computed based on the vertical speed data provided by the navigation system and on Altitude data provided by the radioaltimeter. This alarm is sent to the ight control so that the aircraft can climb quickly and reach a safe altitude. The TF/TA system also computes a consolidated roll angle based on two Roll data sources, the ight control uses this information in order to set to zero the aircraft roll angle before initiating an emergency climb. 3.
Failure Propagation Modelling
To build failure propagation models we have developed a library of components using the Altarica language (see [4]). Each task or data ow from the functional description of a system is associated with a component from the library. A component model has one state variable that stores the current failure status of
Ground information
Radar
HS Law
TF/TA Panel
HO
Horizontal Acceleration Computation
Aircraft Monitoring Display
VS TF / TA Alarm Computation
VZ
Navigation
CCS φι
Filght Control
Roll Computation
Figure 2. TF/TA Tasks and data ows the component and it has input and output variables used to describe the eect of failure modes on its internal behavior and the propagation of the failures on its environment. Although we used the Altarica formal language in our experiments we will not detail in this paper how to describe formally failure propagation. The interested reader can refer to [1, 13] for more details on this topic. In this paper we are interested in two types of failure modes: provision of an erroneous result (failure fail_error) and no provision of results (failure fail_lost). Furthermore, in this paper, we suppose that these failures are permanent, i.e. once failure fail_error is applied to a component then it always produce erroneous outputs. We can represent the failure modes of a component by the following automaton, where the initial status correct means that, initally, the component is working properly :
erroneous
fail_error
correct
fail_lost
lost
Figure 3. Failure mode Automaton Our approach is not restricted to permanent failures. For instance, if we would like to consider that provision of erroneous output is transient rather than permanent we would modiy the previous automaton. We would add an arrow from the erroneous state back to the correct state. This arrow would be labelled with the em update event that models a one time step increment. For each value of the component status we have to model how outputs are computed. We use boolean values for alarm signals. For other data ows, we have cho-
sen not to deal with the value actually produced by a task. Instead we associate an abstract value that denotes the occurrence of a failure in the task producing the value or in its inputs. Hence, component inputs and outputs have three possible values: correct when the actual value of the variable is correct, erroneous when the actual value of the variable is dierent from the correct value, lost when no actual value is computed for this variable. In the failure propagation model of TF/TA (gure 2), we use several types of components: 1. Sensor: This component is associated with tasks Radar and TF/TA panel. It has one output that is always equal to the status of the component. 2. SensorWithInput: This component is associated with tasks Navigation and Radioaltimeter. It has one output and one input. If the status is correct then the output value is equal to the input value, if the status is equal to lost then the output is lost, if the status is equal to erroneous then if the input is erroneous the output is correct and if the input is correct the output is erroneous. 3. Function: This component is associated with the vertical Acceleration Computation and the ight controls task. It has several inputs and one output Out. If the status is correct then if at least one input is lost the output is lost else if at least one input is erroneous the output is erroneous else the output is correct, if the status is equal to lost then the output is lost, if the status is equal to erroneous then the output is erroneous. 4. Alarm: This component is associated with the Emergency Climb Alarm task. It has several inputs and one alarm output Out that is boolean. If the status is correct then if all inputs are correct the output value is false else it is true, if the status is equal to lost then the output is false, if the status is equal to erroneous then the output is true. 5. PerfectSwitch: This component is associated with the Consolidated Roll task. It has two inputs and one alarm output Out. In this analysis we have not associated any failure mode with this component so its status is always equal to correct. If both inputs are erroneous then the output is erroneous else if only one input is erroneous or both inputs are lost the output is lost else the output is correct. 6. Bus: This component is associated with all the data ows. It has one input and one output. If the
status is correct then the output is equal to the input, if the status is equal to lost then the output is lost, if the status is equal to erroneous then the output is erroneous. Once all components are created they can be plugged together to form a complete model. We have used the graphical model editor from the OCAS Altarica toolset developped by Dassault Aviation to build the TF/TA failure failure propagation model described in gure 4.
Figure 4. TF/TA Altarica model When compared with the functional description of the TF/TA of gure 2, gure 4 contains extra connections that go from the ight controls back to the navigation and radioaltimeter components. These connections model the propagation of an erroneous altitude back from the ight controls to these sensors through the environment of the TF/TA. We suppose that there is a one time step delay between the production of an erroneous Altitude by the ight controls and the arrival of an erroneous Altitude input on the navigation and radioaltimeter components. 4.
Safety Requirement Validation
We consider two situations to be catastrophic. The provision of an erroneous vertical acceleration to the ight controls without an emergency climb alarm is catastrophic. This is because this situation could lead the aircraft to crash. We also consider that the provision of an erroneous consolidated roll to the ight controls is catastrophic. This is because, in the case of an
emergency climb the roll angle has to be set to zero. So we have to check that no single failure can lead to both of these situations. Formula N on_Detect == (V erAccComp.Out = erroneous ∧ ClAlarmComp.Out = f alse)
models the instantaneous non-detection of an erroneous vertical acceleration. But, as we explained it in the previous section, there is a one-time step delay between the provision of an erroneous acceleration and the detection of an erroneous Altitude by the Emergency Climb Alarm task, so the failure condition we are interested in is an erroneous acceleration that remains undetected during two consecutive states. We use Linear Temporal Logic operator X (such that Xφ means that φ is true in the next state) to model that the erroneous acceleration is undetected during two consecutive states : N on_Detect ∧ XN on_Detect. We use Cadence Labs SMV model-checker [15] to prove that the failure propagation model enforces the following formula: F (N on_Detect ∧ X(N on_Detect)) → F (f ailure_count ≥ 2) where temporal operator F is such that F φ means that φ will be true in a future state and f ailure_count counts the number of independent failures that occurred. We rst prove formula: F (N on_Detect ∧ X(N on_Detect)) → ((F (V erAccComp.f ail_error)∧F (ClAlarmComp.f ail_lost)) ∨ (F (Radar.f ail_error)∧F (ClAlarmComp.f ail_lost)) ∨ (F (T F T AP anel.f ail_error)∧F (ClAlarmComp.f ail_lost)) ∨ (F (N avigation.f ail_error)∧F (ClAlarmComp.f ail_lost)) ∨ (F (Roll1.f ail_error) ∧ F (Roll2.f ail_error)) ∨ (F (RadioAltimeter.f ail_error)∧F (N avigation.f ail_error)))
On the right hand side of this formula we have the set of necessary pairs of failure events that are associated with the failure condition of interest. To be able to prove the safety requirements we need to be sure that each pair of failure events are independent. So we derive independence assumptions of the form F (eventi ) ∧ F (eventj ) → F (f ailure_count ≥ 2) (eventi , eventj ) is a pair of failure events that appears in the right hand side of the previous formula. Using these assumptions and an extra assumption stating that only task failures occur we are able to prove
that the safety requirements are met.
5.
The previous formula only contains pairs of task failures. We have also proved a similar formula related with data ow failures: F (N on_Detect ∧ X(N on_Detect)) → ((F (Alarm.f ail_error ∨ Alarm.f ail_lost) ∧ F (ConsoRoll.f ail_error)) ∨ (F (Alarm.f ail_error ∨ Alarm.f ail_lost) ∧ F (V erAcc.f ail_error)) ∨ (F (Alarm.f ail_error ∨ Alarm.f ail_lost) ∧ F (Speed.f ail_error)) ∨ (F (Alarm.f ail_error ∨ Alarm.f ail_lost) ∧ F (newSpeed.f ail_error)) ∨ (F (Alarm.f ail_error ∨ Alarm.f ail_lost) ∧ F (T errainInF o.f ail_error)) ∨ (F (Alarm.f ail_error ∨ Alarm.f ail_lost) ∧ F (SHeight.f ail_error)) ∨ (F (R1.f ail_error) ∧ F (R2.f ail_error)))
At this stage, we know that if the independence assumptions are met then the failure propagation model satises the safety requirements. But this model does not include details about the architecture and resource sharing. We want to study architectures made of interconnected communication and computation resources and resource allocations that preserve the safety requirements. We could rst design an architecture and allocate the data ows and tasks to the resources and then verify whether the safety requirements are met. But we prefer to use constraint satisfaction tools to directly generate safe allocations that preserve the safety requirements. The constraint satisfaction problem we want to solve is related with the following variables:
Using this formula, independence assumptions associated with each pair of data ow failures and an extra assumption stating that only data ow failures can occur then we can prove the safety requirement. In this section we have shown how to use a model checker to identify necessary independence assumptions. The identication is not mechanized but it can be supported by recognizing typical patterns of safety architectures such as an architecture made of command channel and a monitoring channel. We have dened a library of safety patterns (see [14]) that associates witn each pattern a set of generic independence requirements. These generic independence requirements could be instantiated with the names of components implementing the command channel (i.e. Radar, TFTAPanel, VerAccComp) and with components implementing the monitoring channel (i.e. RadioAltimeter and ClAlermComp). An alternative approach to mechanize the identication of independence requirements could be to use the fault tree generator included in the OCAS tool that produces a boolean formula which atoms are failure events of the Altarica model. The failure condition holds if and only if this formula is true. The fault tree tool can be used to compute the minimal cut sets for this boolean formula, this gives a list of combinations of failure events that lead to the failure condition. To prove the safety requirement, it is sucient that all combinations contain at least two failures and that these two failures can be assumed to be independent.
Safe Allocation Generation
• variable allot(t ,r) is equal to 1 if task t is allocated to computation resource r and is equal to 0
otherwise. • variable allod(d ,b) is equal to 1 if data ow d is allocated to communication resource b and is equal to 0 otherwise. • variable connected(b, r) is equal to 1 if computation resource r is connected to bus b and is equal to 0 otherwise. To describe the allocation constraints we use several constants: • indept (ti , tj ) is equal to 1 whenever tasks ti and tj are independent and is equal to 0 otherwise.
From the independence assumptions between tasks identied in the previous section we derive the following independence relation: indept(ClAlarmComp, V erAccComp) = 1,
indept(ClAlarmComp, Radar) = 1, indept(ClAlarmComp, T F T AP anel) = 1, indept(ClAlarmComp, N avigation) = 1, indept(Roll1, Roll2) = 1, indept(RA, N avigation) = 1,
We add another constraint that is not necessary to enforce the safety requirement of the TF/TA system, the FlightControl task is independent from all other tasks. ∀t, 6= F lightControlindept(F lightControl, t) = 1, • indepd(di , dj ) is equal to 1 whenever data ows di and dj are independent and is equal to 0 other-
wise. From the independence assumptions between data ows identied in the previous
section we derive the following independence relation: indept(Alarm, ConsRoll) = 1, indept(Alarm, V erAcc) = 1, indept(Alarm, Speed) = 1, indept(Alarm, newSpeed) = 1, indept(Alarm, T errainInF o) = 1, indept(Alarm, SHeight) = 1, indept(R1, R2) = 1,
Furthermore two functions are needed to describe the connection of data ows with tasks in the functional description of the TF/TA system: orid(d) is equal to the task t that is the origin of data ow d and destd(d) is equal to the task t that is the destination of data ow d. A safe allocation should enforce the following constraints: • Unique Allocation: a task (or a data ow) should
be allocated to one and only one resource. X ∀ti , allot(ti , rj ) = 1 rj
∀di ,
X
allod(di , cj ) = 1
cj
• Independence
Preservation: Two tasks (or dataows) that are independent should not be allocated to the same resource. ∀ti , tj , rk , allot(ti , rk ) + allot(tj , rk ) + indept(ti , tj ) ≤ 2 ∀di , dj , ck , allod(di , ck ) + allod(dj , ck ) + indepd(di , dj ) ≤ 2
• Connection Compatibility: a data ow that con-
nects two tasks should be allocated to a communication resource connected to the computation resources allocated to these tasks. ∀di , cj , rk , allod(di , cj ) + allot(orid(di ), rk ) − connected(cj , rk ) ≤ 1 ∀di , cj , rk , allod(di , cj )+allot(destd(di ), rk )− connected(cj , rk ) ≤ 1
We used the OPL Studio tool from ILOG to model this set of constraints and to look for safe allocations, several allocations exist. To produce a preferred allocation a criterion is needed, we choosed to minimize the number of used resources. A possible solution is an architecture made of three computation resources: CPU_1, CPU_2, CPU_3 and two communication resources: Bus_1 and Bus_2. All computation resources are connected to all communication resources. Allocation of tasks CPU_1 −→ FlightControls
CPU_2 −→ RA, Roll1, ConsRollComp, ClAlarmComp CPU_3 −→ Navigation, VerAccComp, Radar, Roll2, TFTAPanel Allocation of data ows Bus_1 −→ TerrainInFo, Speed, Alt, R1, VerAcc, ConsRoll, SHeight Bus_2 −→ R2, Vspeed, Alarm 6.
Incremental Safe Allocation Generation
In this section we describe three scenarios where the allocation is generated progressively: the rst scenario is related with the renement of virtual resources, the second scenario is related with the composition of subsystems and, nally, the third scenario is related with the integration of various viewpoints.
6.1. Renement The safe allocation generated in the previous section is related with virtual resources and not with actual resources of an IMA architecture. We propose to apply iteratively the three steps of the approach in order to generate safe allocations for more detailed descriptions of the shared resources. At each iteration we progressively rene the architecture denition by detailing the resources used and we generate a new safe allocation. F P Modelling Model SR Validation Independence Assumptions Safe Allocation Gen Allocation F P Modelling Model SR Validation Independence Assumptions Safe Allocation Gen
New Failure Propagations New Safety Requirements New Architecture Constraints
Allocation
Figure 5. Iterative generation of safe allocation
After a rst application of the proposed method a failure propagation model, a set of independence assumptions and a safe allocation are available. We ex-
tend the failure propagation model to include details about the allocation. For each resource we add a component that contains two failure events f ail_error and f ail_lost. Then we use the synchronisation construct of Altarica to glue together the failure events of a resource with the failure events of all task or data ow that were allocated to this resource. This gluing of failure events is such that all glued events are red simultaneously. With respect to the safety requirements and independence assumptions, we can replace in these formulae failure events of tasks and data ows by failure events of the resources. For instance, we would replace any instance of RA.f ail_error ,Roll1.f ail_error, ConsRollComp.f ail_error, ClAlarmComp.f ail_error by CP U _2.f ail_error. The two formulae we described in section 4 become after replacement of task and data ow names : F (N on_Detect ∧ X(N on_Detect)) → ((F (CP U _3.f ail_error) ∧ F (CP U _2.f ail_lost)) ∨ (F (CP U _3.f ail_error) ∧ F (CP U _2.f ail_error)))
and
F (N on_Detect ∧ X(N on_Detect)) → ((BU S _1.f ail_lost ∧ F (BU S _2.f ail_lost)) ∨ (F (BU S _1.f ail_error) ∧ F (BU S _2.f ail_error)))
So now the right hand side of the formulae only contains necessary pairs of failure events of resources. Similarly the independence assumptions can be simplied : we now just need to assume that CP U _3 and CP U _2 fail independently and that BU S _1 and BU S _2 are also independent. The resulting model and safety assumptions can be used as the starting point of a new iteration of the approach. We can introduce new types of failure propagation, new safety requirements or new architecture constraints. We have studied how to allocate the virtual resources on a communication architecture made of switches, gateways, buses and end systems. We have dened new allocation constraints that deal with the routing of the virtual communication resources on links that are made of sequences of switches. At this level, we have considered that independence is preserved as long as independent vitual communication resources are not allocated to links that share a common switch. The main benet of this approach is that we have limited the number of variables to be taken into account when trying to nd an allocation. In the previous example, we have to route 2 resources (BU S _1 and BU S _2) on the switched network instead of routing 10 data ows on this network. The associated lim-
itation of this iterative approach is that the resulting allocation of data ows on the network could be less optimal than allocation found by trying directly to route the 10 data ows on the network.
6.2. Composition of sub-systems It is also possible to combine several safe allocations in order to nd a safe allocation for system made of several sub-systems. F P Modelling Model SR Validation Independence Assumptions
F P Modelling Model SR Validation Independence Assumptions
Safe Allocation Gen
Safe Allocation Gen Allocation
Allocation F P Modelling Model SR Validation Independence Assumptions
Safe Allocation Gen
Combined Failure Propagations Combined Safety Requirements Combined Architecture Constraints
Allocation
Figure 6. Safe allocation composition Figure 6 shows a situation where two sub-systems are rst analysed separately. So for each sub-system, a model, a set of independence assumptions and a safe allocation are available. Then, we want to study a system made of these two sub-systems and try to nd a common safe allocation. To illustrate this idea, we suppose that we want to build a safer TF/TA by duplicating the TF/TA system we have studied so far. The vertical accelerations and alarms computed by both copies of TF/TA are merged and sent to the ight control. The new safety requirement that should be enforced by this safer TF/TA is that neither a single failure nor a double failure shall lead to an undetedcted erroneous vertical acceleration. At the end of the rst iteration we have obtained two safe allocations: one using resources CPU_11, CPU_12, CPU_13, Bus_11 and Bus_12 and the other using resources CPU_21, CPU_22, CPU_23, Bus_21 and Bus_22. We would like to know if there is an allocation that perserves the new safety requirement and where some resources are shared between the two copies of the TF/TA system. So we have to nd what are the independence assumptions needed to satisfy the new safety requirement: F (N on_Detect_1 ∧ X(N on_Detect_1) ∧ N on_Detect_2 ∧ X(N on_Detect_2)) → F (f ailure_count ≥ 3)
During the previous iteration, several formulae were already proved. For computation resource failures, we have : F (N on_Detect_1 ∧ X(N on_Detect_1)) → ((F (CP U _13.f ail_error) ∧ F (CP U _12.f ail_lost)) ∨ (F (CP U _13.f ail_error)∧F (CP U _12.f ail_error)))
and
F (N on_Detect_2 ∧ X(N on_Detect_2)) → ((F (CP U _23.f ail_error) ∧ F (CP U _22.f ail_lost)) ∨ (F (CP U _23.f ail_error)∧F (CP U _22.f ail_error)))
So it is easy to see that we can prove F (N on_Detect_1 ∧ X(N on_Detect_1) ∧ N on_Detect_2 ∧ X(N on_Detect_2)) → ((F (CP U _13.f ail_error) ∧ F (CP U _12.f ail_lost) ∧ F (CP U _23.f ail_error) ∧ F (CP U _22.f ail_lost))
∨ (F (CP U _13.f ail_error) ∧ F (CP U _12.f ail_lost) ∧ F (CP U _23.f ail_error)∧F (CP U _22.f ail_error)) ∨ (F (CP U _13.f ail_error) ∧ F (CP U _12.f ail_error) ∧ F (CP U _23.f ail_error) ∧ F (CP U _22.f ail_lost)) ∨ (F (CP U _13.f ail_error) ∧ F (CP U _12.f ail_error) ∧ F (CP U _23.f ail_error)∧F (CP U _22.f ail_error)))
The right hand side of the previous formula contains combinations of four failure events, it is sucient that three out of these four failures are independent to enforce the new safety requirement. Using this new independence constraint we can generate a safe allocation of private resources CPU_11, CPU_12, CPU_13,Bus_11 and Bus_12, CPU_21, CPU_22, CPU_23,Bus_21 and Bus_22 on common resources. An architecture made of 3 communication resources: Bus_c1, Bus_c2 and Bus_c3 and 4 computation resources: CPU_c1, CPU_c2, CPU_c3, CPU_c4 preserve the safety requirement. The safe allocation is: Allocation of tasks CPU_c1 −→ CPU_11, CPU_12 CPU_c2 −→ CPU_12, CPU_22 CPU_c3 −→ CPU_13 CPU_c4 −→ CPU_23 Allocation of data ows Bus_c1 −→ Bus_11 Bus_c2 −→ Bus_12, Bus_22 Bus_c3 −→ Bus_21
6.3. Viewpoint Integration By now the allocations we have proposed do not take into account the load of the resources. We have developed techniques similar to the one presented in this pa-
per to schedule periodic tasks and data ows. A set of allocation constraints are built on the basis of classical schedulability conditions that are related with task periods and worst-case execution time (see [12]). We have tried to, rst, generate separately allocations using independence constraints on one hand and schedulability constraints on the other hand and then combine the resulting allocations. The experiments we performed showed that optimal allocations with respect to independence are seldom optimal with respect to performances. For instance, the allocation that was proposed in the previous section seems to load more heavily resource BU S _1 (7 data ows including TerrainInfo that is a big volume data ow) than BU S _2 (3 data ows). Similarly, the Radar task is a big resource consumer that according to the allocation found by the schedulability analysis cannot share its computing resource with other tasks. Conversely, we studied allocations that fullled the performance constraints using a criterion trying to load as equally as possible the resources. The resulting architecture is made of 3 computation resources and 2 communication resources. In the proposed allocation was not safe because independent data ows R1 and R2 shared the same communication resource. So we had to go back to the constraint solver, use both sets of independence and schedulability constraints to nd new allocations. The resulting safe allocation is made of 2 communication resources and 4 computation resources. So this allocation is less optimal than the allocation found in section 5 but it is compatible with the performance constraints. 7.
Related Work
The basic ingredients of our approach are well established methods. For instance, the concept of failure propagation model is inspired from the work on the FPTN notation at York University (see [10]). Using model-checkers for the verication of safety requirements or for the generation of failure sequences is documented in [6, 13, 3]. The use of constraint solver to optimize the reliability of system is also well established [2]. The original problem consisted to nd the best combination of redundancy and component reliability to meet or exceed the performance goals (reliability) at the lowest cost. In this case an analytical model represents the entire system, for example a reliability block diagram represents the reliability relationships of the components in the system. The optimization parameters are the number of redundant components or individual reliability gures.
More recently, these methods have been improved [16, 9, 8, 7] with respect to the performance of algorithms and the dimension of problems involved. Various criteria such as the cost, the weight were used. In the previous references the studied problems were only concerned with safety characteristics (reliability, dependability). In the following ones, the authors deal simultaneously with at least two complementary aspects of problem design: safety and real-time performance. In [19] the authors focus on providing ecient communication among distributed computation nodes; the eciency is dened by the reliability and the constraints concern the limitation of capacity of links between computers. In [18], Nicholson deals with the design of safetycritical real-time control systems which have to take into account both technical constraints (functionality, dependability and timing) and reduce resource elements. The design space is dened by the set of functions to implement, the set of logical implementations of these functions and the set of physical resources supporting these implementations. The authors propose an algorithm which produce a topology (a relation between the three previous mentioned sets) that maximises reliability while fulling cost, size, timing and reliability constraints. The work by Sorel and Girault [11] also combines functional, safety and real-time aspects but they propose to use specialized heuristics instead of a general purpose constraint solver in order to cope with combinatorial explosion of constraint solvers. Although our approach is based on well-established techniques, we think that the tight combination of all these ingredients into one process for the safety assessment of avionics systems is original. References
[1] A. G. A. Arnold, G. Point and A. Rauzy. Altarica: Manuel méthodologique. version 2.0. Technical report, LaBRI, Université Bordeaux I et CNRS (UMR 5800), 2000. [2] K. Aggarwal. Redundancy optimization in general systems. IEEE Trans Reliability R-25, pages 330332, 1976. [3] O. Akerlund, S. Nadjm-Tehrani, and G. Staalmarck. Integration of formal methods into system safety and reliability analysis. In 17th International System Safety Conference. [4] A. Arnold, A. Griault, G. Point, and A. Rauzy. The altarica formalism for describing concurrent systems. Fundamenta Informaticae, 40:109124, 2000. [5] N. Blackwell, S. Leinster-Evans, and S. Dawkins. Developing safety cases for integrated ight systems. In IEEE Aerospace Conference. IEEE, 1999.
[6] M. Bozzano and alter. Esacs:an integrated methodology for design and safety analysis of complex systems. In ESREL 2003 European Safety and Reliability Conference, 2003. [7] A. Cabarbaye and R. Laulheret. De l'évaluation à l'optimisation en sûreté de fonctionnement. In Qualita 2005 - Congrès international pluridisciplinaire qualité et sûreté de fonctionnement, pages 125131, 2005. [8] D. Coit and A. Smith. Reliability optimization of series-parallel systems using a genetic algorithm. IEEE Transactions on Reliability, 45, 1996. [9] D. W. Coit and J. Liu. System reliability optimization with k-out-of-n subsystems. International Journal of Reliability, Quality and Safety Engineering, 7:129142, 2000. [10] P. Fenelon, J. McDermid, M. Nicholson, and D. Pumfrey. Towards integrated safety analysis and design. ACM Computing Reviews, 2(1), 1994. [11] A. Girault, H. Kalla, and Y. Sorel. An active replication scheme that tolerates failures in distributed embedded systems - processors and communication links failures. In 2004 IFIP World Computing Conference, 2004. [12] S. Goplakrishnan. Managing communication in integrated modular avionics. In 18th International Parallel and Distributed Processing Symposium (IPDPS'04), 2004. [13] C. Kehren, C. Seguin, P. Bieber, C. Castel, C. Bougnol, J.-P. Heckmann, and S. Metge. Advanced simulation capabilities for multi-systems with altarica. In International System Safety Conference, 2004. [14] C. Kehren, C. Seguin, P. Bieber, C. Castel, C. Bougnol, J.-P. Heckmann, and S. Metge. Architecture patterns for safe design. In Proceedings of Complex and Safe Systems Engineering, 2004. [15] K. L. McMillan. The SMV language, Mars 1998. http://wwwcad.eecs.berkeley.edu/∼kenmcmil/psdoc.html. [16] Y. Y. Minoru Mukuda and M. Gen. Reliability optimization problems using adaptive hybrid genetic algorithms. Journal of Advanced Computational Intelligence and Intelligent Informatics, pages 437441, 2004. [17] J. Moore. The Avionics Handbook, chapter Advanced Distributed Architectures. CRC Press, 2001. [18] M. Nicholson and A. Burns. Structuring architectural topologies for real-time safety-critical systems, 1997. [19] C. C. C. YI-Shiung Yeh and R.-S. Chen. A genetic algorithm for k-node set reliability optimization with capacity constraint of a distributed system. Proc. Natl. Sci. Counc. ROC(A), 25:2734, 2001.