A Tool for Examining the Behaviour of Faults and Errors in ... - CiteSeerX

0 downloads 0 Views 435KB Size Report
Dec 6, 2000 - evaluation of software for single node embedded control systems, ...... The experiment readout files are named experiment.pxr and contain the actual data obtained ..... Computing and Fault-Tolerant Systems series, Vol.
A Tool for Examining the Behaviour of Faults and Errors in Software

MARTIN HILLER Technical Report No. 00-19 Revision 0.8

Department of Computer Engineering

CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden, 2000

REPORT NO. 00-19

A Tool for Examining the Behaviour of Faults and Errors in Software

MARTIN HILLER

Department of Computer Engineering CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden 2000

A Tool for Examining the Behaviour of Faults and Errors in Software © Martin Hiller, 2000

Technical report No. 00-19 Department of Computer Engineering Chalmers University of Technology SE-412 96 Göteborg Sweden Phone: +46 (0)31-772 10 00

Author contact information: Martin Hiller Department of Computer Engineering Chalmers University of Technology Hörsalsvägen 11 SE-412 96 Göteborg Sweden Phone: +46 (0)31-772 52 28 Fax: +46 (0)31-772 36 63 Email: [email protected] URL: http://www.ce.chalmers.se/staff/hiller/

Chalmers Göteborg, Sweden 2000

Abstract

This report describes the Propagation Analysis Environment (PROPANE) which is a desktop environment for conducting experiments with error injection and fault injection in order to analyse the propagation and effects of errors and faults in software systems. PROPANE supports the injection of a variety of errors types into variables of a software system, as well as controlled injection of faults (by mutation of the source code). PROPANE also has support for various types of probes that can be used to log the values of variables and the occurrences of events during software execution. PROPANE is mainly aimed at, and was specifically developed for the analysis and evaluation of software for single node embedded control systems, although due to its general nature it may be used in many other areas.

Keywords: software implemented fault injection, software dependability, error propagation, embedded control systems

Contents

1. INTRODUCTION ...................................................................................................................................... 1 2. TARGET SYSTEM MODEL.................................................................................................................... 2 3. PROPANE OVERVIEW ........................................................................................................................... 3 3.1 BASIC SYSTEM STRUCTURE ..................................................................................................................... 3 3.2 WORKING WITH PROPANE.................................................................................................................... 4 4. PROPANE SETUP AND TARGET SYSTEM INSTRUMENTATION ............................................... 5 4.1 FAULTS AND ERRORS............................................................................................................................... 5 4.2 PROBES ................................................................................................................................................... 6 4.3 TEST CASES ............................................................................................................................................. 6 4.4 TARGET SYSTEM INSTRUMENTATION ...................................................................................................... 6 4.4.1 Instrumenting for probes ................................................................................................................. 7 4.4.2 Instrumenting for fault injection.................................................................................................... 10 4.4.3 Instrumenting for error injection................................................................................................... 12 4.5 PROPANE AND ENVIRONMENT SIMULATORS ....................................................................................... 14 4.6 FILE FORMATS ....................................................................................................................................... 14 4.6.1 Database Description.................................................................................................................... 15 4.6.2 Campaign Description................................................................................................................... 15 4.6.3 Experiment Description................................................................................................................. 16 5. EXECUTING EXPERIMENTS.............................................................................................................. 18 5.1 MAN-MACHINE INTERFACE .................................................................................................................. 18 5.1 LOG FILES ............................................................................................................................................. 20 5.2 READOUT FILES ..................................................................................................................................... 21 6. ANALYSING DATA................................................................................................................................ 23 6.1 GOLDEN RUN COMPARISONS ................................................................................................................ 23 6.1.1 Virtual samples.............................................................................................................................. 24 6.1.2 Error margins................................................................................................................................ 25 6.2 CHANNEL LOGS ..................................................................................................................................... 26 6.3 INJECTION INFORMATION ...................................................................................................................... 26 7. PROPANE ARCHITECTURE ............................................................................................................... 26 7.1 THE PROPANE CAMPAIGN DRIVER..................................................................................................... 26 7.2 THE PROPANE LIBRARY ..................................................................................................................... 28 8. SUMMARY............................................................................................................................................... 29 REFERENCES ............................................................................................................................................. 29

1. Introduction In the area of embedded control systems, as well as in other areas, software has an increasing influence on the level of dependability of a system. Many techniques and mechanisms for constructing dependable software have been devised during the past decades. The state-of-the-art method of verifying and evaluating the embedded software is using fault and error injection. That is, injecting an abnormal state into the system, with the intention of testing the capability of system to cope with that state and verifying the design of error detection and recovery mechanisms. This is illustrated in Figure 1.1 below. 5. Error recovery successful

Detected error

Fault injection 4. Error Detection

Fault 1. Fault activation

6. Error recovery failed or incomplete

Undetected error

Failure 2. Error activation

3. Error overwritten

Error injection

Figure 1.1. The states of a system from a dependability point-of-view.

In Figure 1.1 we can see the process of going from a fault in the system to having a system failure (adapted from [Laprie92]). When a fault is activated, e.g. a defective piece of code is executed, the system state may end up containing an undetected error. That is, one or more internal variables do not have the same value they would have had if the fault had not been activated. The undetected error may affect the system in such a way that a failure is unavoidable. The definition of failure is often one of the following three: 1. the system output no longer fulfils some predefined specification; 2. the system output differs from the output of an error free system; or 3. the system output is not acceptable in the environment; The first definition is the tightest and the last is the loosest. In the first definition, an erroneous output may be acceptable but the system has still failed. In the second definition, no regard is taken to either the specifications or the environment. The third definition is the most liberal one, since it allows the system output to vary within some boundaries. If a system contains mechanisms for detecting errors and recovering from them, the system may recognise that its state is in some way incorrect. The error recovery process will then attempt to bring the system back to the original error free state. If the recovery fails or is incomplete in some way, the system will still contain erroneous values. Fault injection is the process of introducing artificial faults in the system (artificial in the sense that they are explicitly placed on known locations). This is indicated in Figure 1 by the flash pointing at the Fault state. Error injection is the process of inserting undetected errors into the system. This is indicated by the flash pointing at the Undetected error state in Figure 1.1. Injection experiments, as a means of dependability validation, address the actions of fault removal and fault forecasting. In the case of fault removal, injection experiments can be used for detecting defects in the detection and recovery mechanisms. In the case of fault forecasting, injection experiments can be used for obtaining a probabilistic view of the system behaviour under the influence of faults and errors, e.g. various coverage values.

1

There are a number of different ways to artificially inject faults and errors into a system, and the injected faults and errors may be of different kinds, e.g. physical faults, design faults, etc. [Arlat90][Arlat93][Chillarege89][Iyer95][Karlsson91]. Many tools for conducting injection experiments have been presented over the years. Some examples of injection tools are XCEPTION [Carreira95], FERRARI [Kanawati95], DOCTOR [Han95], MEFISTO [Jenn94], FIMBUL [Folkesson98], FIAT [Segall88], HYBRID [Young92], FAUST [Hudak93], FINE [Kao94], FTAPE [Tsai96] and FIC3 [Christmansson97]. The Propagation Analysis Environment is a logging and injection tool for use in dependability validation of software in embedded control systems. PROPANE supports various ways of probing a system, i.e. logging internal variables and events during system operation, as well as ways of injecting software faults and data errors. Although the tool is designed with software for embedded systems in mind the fact that it is target system independent makes it useful for a range of applications. Examples of other studies with injection of software faults are found in [Kao94], [Hudak93] and [Christmansson98]. Fault injections are performed by instrumenting the source code with both the correct code and the defect that is to be injected. With every fault, there is a fault activation check, which decides whether the faulty code or the correct code is to be executed. All inserted faults are inactive by default – the experiment descriptions used for setting up the PROPANE system then specify which faults are to be activated. Error injections are performed by using predefined error types (defined individually for each target system), predefined locations in the software (equivalent to software traps) and then in the experiment descriptions specifying combinations of locations and error types. When the specified locations are reached during system operation, the specified errors are injected. PROPANE is mainly designed for use with target system simulations eliminating the real-time requirements, i.e. where actual (real world) execution times are of no concern. This means that from the simulated target-system’s point-of-view, all injections conducted by PROPANE are instantaneous. However, given the general nature of PROPANE it may be used for a broader range of applications. Actually, only the imagination of the user is the limit. The rest of this report is structured as follows: Chapter 2 contains a brief description of the target system model attended for PROPANE. In chapter 3 is an overview of the PROPANE tool and the basic work process involved in using the tool. Chapter 4 describes in detail the various steps required to successfully set up the PROPANE tool and instrument the target system software. Chapter 5 describes how injection campaigns are executed and how the logs and readouts generated during experiment execution are formatted. In chapter 6 is a description of the internal architecture of the PROPANE tool. A summary is provided in chapter 7.

2. Target system model This section describes the main types of target system that the PROPANE tool was designed for. However, since the tool is in part a function library, the range of systems and application it may be used on is likely broader than is described here. PROPANE is designed with dependability validation of software for single node embedded control systems in mind. However, this single node may be just one node in a set of nodes in a distributed system. A software module performs computations using the provided inputs to generate the outputs. At the lowest level, such a black-box module may be a procedure or a function but could also conceptually be a basic block or particular code fragment within a procedure or function (at a finer level of software abstraction). A number of such modules constitute a system and they are linked to each other via signals, much like signal pathways between hardware components on a circuit board. Of course, this system may be seen as a larger component or module in an even larger system. Signals can originate internally from a module, e.g., as a calculation result, or externally from the hardware itself, e.g., a sensor reading from a register in an A/D-converter. The destination of a signal 2

may also be internal, being part of the input set of a module, or external, as for example the value placed in a hardware register for physical transmission or D/A-conversion. Software constructed according to the above is found in numerous embedded systems. For example, most control applications controlling physical events such as the systems found in automobiles are traditionally built up in such a way. Our studies mainly focus on software developed for embedded systems in consumer products, i.e., high-volume and low-production-cost systems. The target systems that PROPANE is aimed at are mainly scheduled using so called slot-based scheduling. That is, the application consists of a number of software modules/functions that are executed in a slot-based schedule. A typical slot-based system is constructed around for example 1ms-slots. In each slot, a number of modules/functions are called which perform parts of the service the system is to provide. This type of scheduling prohibits pre-emption between tasks, but on the other hand, there is no need for a real-time kernel.

3. PROPANE overview This section provides an overview of how the PROPANE tool is structured and how it is used.

3.1 Basic system structure The PROPANE tool is a desktop tool (currently only available in a Windows-implementation) and consists of a Campaign Driver, a Function Library (as illustrated in Figure 3.1), and a Data Extractor. The Function Library is used by the target system to gain access to the probing and injection functionality of PROPANE and is written in the C programming language. The Campaign Driver is responsible for handling the actual execution of experiments and is in a sense the main administrator of the PROPANE tool. It has a man-machine interface by which the user can control and follow the experiments. The Data Extractor may be used during analysis to extract specific data from the experiment readout files. The Campaign Driver and the Function Library are highly integrated with each other, whereas the Data Extractor is a separate tool. Of course, other tools may be constructed that may be tailor-fit for the investigation at hand. The campaign driver calls the target executable to run experiments.

PROPANE Library

PROPANE Campaign Driver

Setup files

Environment simulator

Log files

Target system

Readout files

Figure 3.1. An overview of PROPANE together with a target system and environment simulator. The Data Extractor is a separate tool and is not shown.

The basic structure of how the tool is set up and what results it generates closely resembles that of the FIC3 injection tool previously developed at our department [Christmansson97], but a description is provided here nonetheless. The tool is set up with a number of description files. There are three types of description files: database description files, campaign description files, and experiment description files. A database is a set of campaigns, and a campaign is in turn a set of experiments. More on the description files is 3

found in section 4 below. During the execution of the experiments, log files and readout files will be created. The log files contain information regarding the execution of the experiments, and the readout files contain the data obtained by the inserted probes and the performed injections. More information on the log files and on the readout files is found in section 4. Since the environment simulator is designed by the user of the PROPANE tool, it may or may not use description files and may or may not create log files and/or readout files. For each experiment specified in the description files, the Campaign Driver spawns a new process running an executable file containing everything necessary for conducting one experiment. This executable contains the PROPANE library performing the actual injection of errors and logging of variables. The executable also has to contain everything necessary to run the target system and the environment simulator. The reason for running experiments in individual processes is twofold: •

Parallel execution may shorten the entire time for executing experiments. This is especially true for systems that have so much computing capacity that there is a lot of idle time during the execution of a single experiment, primarily multi-processor systems.



Every experiment begins execution from the same surrounding conditions. This ensures that there are no residual artefacts that make one experiment affect another experiment.

3.2 Working with PROPANE The typical work process when using PROPANE is can be divided into three phases (as illustrated

Original target system Fault and error data

Instrumented target

SETUP

Usage profile data Description files

INJECTION

Log files

Readout files

ANALYSIS

in Figure 3.2): 1) the setup phase, 2) the injection phase, and 3) the analysis phase. Figure 3.2. The basic working process when using PROPANE

In the setup phase, the input data is used to generate description files and an instrumented target system. The input data includes the original source code of the target system, information on how faults and/or errors are distributed and what they typically look like, and information how the target system is used. The fault and error information is used to decide which faults and errors to inject during the experiments, and the usage information is used to generate test cases that exercise the target system with a realistic operational profile. During the setup phase it is also decided which probes are to be inserted into the target systems, i.e. which variables and events that are to be traced and recorded during the experiment. The output of the phase is a set of description files and the instrumented target system. The description files contain information on which faults are to be injected, which errors are to be injected at which locations, and which test cases are to be used by the environment simulator during the execution. The instrumented target system contains injection locations (which may be regarded as high-level software traps) and probes logging the desired variables and events. 4

During the injection phase, the PROPANE Campaign Driver is set up with the description files generated in the setup phase. During the actual execution of the experiments, the Campaign Driver calls the instrumented target system and generates readout files containing detailed information on the results of the experiments. During the experiment, the specified faults and/or errors are injected and the specified variables and events are logged. Log-files are generated recording the actions of the PROPANE tool itself. The readout files generated in the injection phase are then analysed in the analysis phase to generate measures for the target systems. These measures may include coverage values, propagation information, etc. One part of the analysis may be to compare traces from two different runs with each other (e.g., compare a golden run with an injection run). PROPANE contains the Data Extractor which generates a set of data-files containing specific extracted data, such as Golden Run comparisons, and injection information.

4. PROPANE setup and target system instrumentation Setting up an experiment in PROPANE includes the following steps: 1. selecting faults and/or errors to inject (f/e); 2. selecting variables and events to log (p); 3. selecting test cases for the environment simulator (u); 4. instrumenting target system source code; and 5. generating description files. Figure 4.1 illustrates the necessary data that must be specified for an experiment. One experiment requires faults (f) and/or errors (e), test cases (u) and probes (p). Fault data

f

Error data

Usage profile data

e

u

p

Experiment objectives

Campaign description Experiment Experiment description Experiment description description

Figure 4.1. Setup phase, inputs and outputs

4.1 Faults and errors Before any faults can be selected for injection, one must know which different faults are possible to occur in the target system. Producing a set of possible faults may be done in several ways. The quality of the results obtained from the experiments may depend on how good this set of possible faults, Fp, is at representing the real world faults that may occur in the target system. The set of possible faults is actually a subset of the entire set of faults F, i.e. Fp ⊂ F. From the set of possible faults, one must select a fault set F* ⊆ Fp ⊂ F for injection in the target system. Each fault f ∈ F* is then manually inserted into the target system together with an activation clause. That is, every fault f is inserted into the same executable of the target system, and the experiment descriptions then specify which of these faults that are to be activated during the execution of the experiment. For errors, the situation is similar. There is an abstract set E containing all errors. Then there is a set Ep ⊂ E containing all the errors that the experimental environment is capable of reproducing. Then, one must select an error set E* ⊆ Ep ⊂ E for use in the injection experiments. The selection of 5

errors in E* depends on what the objective of the error injections is. If the goal is to mimic the effects of faults, the errors must be selected using the set of selected faults F*. From the description of the faults in F*, parameters for the errors that are to be injected are obtained. Other goals may produce other errors. When the set of selected errors is obtained each error e ∈ E* is analysed for type and location. The error types are specified in the description files and the locations are specified in the target system by means of indicators (high-level software traps) which tell the PROPANE library when the execution has reached a certain location. Upon reaching an indicated location, PROPANE injects any errors specified for injection at that location. Faults may be selected in a variety of ways, the PROPANE tool does not really have any preference. In our previous work, we have developed a tool called the C Fault Locator (CFL). Given a C source file, CFL locates all lines that may be modified to contain a fault according to some fault classification. We have chose to let CFL look for a subset of faults from the Orthogonal Defect Classification (ODC) [Chillarege92]. The fault classes we have chosen to look for are those of Assignment (A), Interface (I), and Checking (C). More details on CFL and on how faults are selected are found in [Christmansson97]. Error selection can also be made in a variety of ways. One way is to let errors mimic the effects of faults. This eliminates the task of physically instrumenting the code since the system behaviour may be the same as for faults. However, it may be difficult to mimic every type of fault by injecting errors. For example, faults directly affecting the control flow of a program may be hard to mimic using only error injection into variables (as was the case in [Christmansson98]. Errors may also be selected in such a way that they resemble intermittent hardware faults or stuck-at faults.

4.2 Probes The probes to be used in the experiment are dictated by the objectives of the experiments. That is, the probes must be selected so that the data necessary for obtaining the desired measures in the analysis phase is collected. The target system may be instrumented with all sorts of probes. The description files can be used to activate only those probes that are relevant for each individual experiment. PROPANE supports two different kinds of probes: variable probes and event probes. The variable probes are used for logging the value of a variable, and the event probes are used for logging the occurrences of events. The target system must be instrumented with the probes.

4.3 Test cases The test cases that are to be used for the experiments are highly application specific and depend on the target system as well as the intended operational environment. Since the environment simulator is developed separately from the PROPANE tool, the parameters included in a test case may be different between different target systems. Generally, it is important to get a set of test cases, which closely resembles the intended usage profile of the target system. Using test cases that are not representative of how the target system is used in reality may decrease the value of the obtained results.

4.4 Target system instrumentation In order to make use of the support for probes and injections provided by the PROPANE tool, the target system must be instrumented. Instrumenting a target system includes: 1) inserting probes to log variables and events, 2) inserting faults and fault activation checks for fault injection, and 3) inserting injection locations (a form of high-level software traps) for error injection. In addition, the PROPANE Library must be linked together with the target system software.

6

PROPANE configuration source file

Compile

Instrumented Object files Instrumented target source target filessource files

Instrumented Instrumented target source Instrumented target source files source target files files

Link

Instrumented target system executable

PROPANE Library

Figure 4.2. The basic work flow of target system instrumentation An illustration of the work required for constructing an instrumented target system executable is found in Figure 4.2. The PROPANE configuration source file contains information needed in the PROPANE Library, such as probes, faults, locations, etc. This information is constant information that will remain the same between different experiments. The PROPANE configuration source file and the instrumented target source files are compiled and linked together with the PROPANE Library to form the instrumented target system executable. This executable is then used by the PROPANE Campaign Driver when conducting experiments. This section describes how to construct the PROPANE configuration file and how to instrument the target system source code.

4.4.1 Instrumenting for probes In order to be able to log variables and events, probes must be inserted into the target system. Probes are inserted in two steps. First, the required probes must be set up in the PROPANE Library configuration source file. Then, special calls to the library must be made from the application to actually log the variable or event. If variable probes are used, the configuration source file must contain the data items shown in Figure 4.3 below. /* This variable contains the number of defined probes */ unsigned int propane_no_of_log_vars = n; /* This array contains information about the defined probes */ PROPANELogVarInfo propane_log_var_info[n] = { /* Each probe must be defined with a line as shown * below. */ { handle, type, name, size, channel }, { handle, type, name, size, channel }, ... }; /* This array is used by the library during run-time */ PROPANELogVarData propane_log_var_data[n];

Figure 4.3. Constants and structures for variable probes.

The variable propane_no_of_log_vars holds the number of defined probes in the variable probe information and data arrays. It is very important that the arrays contain exactly the number of entries as specified in this variable. The array contains information about the defined probes that will not change over time. For every probe that is to be inserted into the system, there must be one entry containing the following information: •

handle



probe type



name

7



size (size of logged data in bytes – only used for probes of type PROPANE_AREA)



channel

The channel will be created automatically during the PROPANE Library setup-phase and will not need to be entered in the configuration source file. The handle is an integer value and is used in the call made from the target system. The handle must be equal to the index in the array, i.e. the first probe must have handle 0, the second handle 1, and so on. The nth probe must have handle (n-1). It is a good idea to create a pre-processor constant for each handle and then use that constant in the function calls in the target system. The type of the probe indicates the type of the variable that is logged by that probe. The type may be one of the following: •

PROPANE_CHAR – the variable is of type char.



PROPANE_UCHAR – the variable is of type unsigned char.



PROPANE_SHORT – the variable is of type short.



PROPANE_USHORT – the variable is of type unsigned short.



PROPANE_INT – the variable is of type int.



PROPANE_UINT – the variable is of type unsigned int.



PROPANE_LONG – the variable is of type long.



PROPANE_ULONG – the variable is of type unsigned long.



PROPANE_FLOAT – the variable is of type float.



PROPANE_DOUBLE – the variable is of type double.



PROPANE_AREA – the variable is a pointer to a memory area. This is a special type, which, for instance, may be used for injecting errors into random locations in the memory areas of the target software.

The name of the probe may be any normal C-string not containing white space. This name must then be used in the PROPANE setup files (described below) to activate a probe during experiment execution. The size value is only required for probes of type PROPANE_ARE. This value indicates the size (in bytes) of the area. For all other probe types, the size will be automatically calculated since they are standard data types. In Figure 4.4 is an example showing how variable probes are defined in the configuration source file and how they are inserted into the source code of the target system. /* We have two probes. This is entered in the * configuration source file. */ #define PROBE_SETVALUE (0) #define PROBE_ISVALUE (1) unsigned int propane_no_of_log_vars = 2; PROPANELogVarInfo propane_log_var_info[2] = { { PROBE_SETVALUE, PROPANE_INT, “SetValue”, 0, “” }, { PROBE_ISVALUE, PROPANE_INT, “InValue”, 0, “” }, { PROBE_T_ARRAY, PROPANE_AREA, “t[]”, sizeof(Type_t) * 10, “” }, }; PROPANELogVarData propane_log_var_data[2];

Figure 4.4. Example of variable probes in the PROPANE configuration source.

When all the desired probes have been defined in the configuration source file, they must be inserted into the target system. The prototype for the function that must be called from the target system is shown in Figure 4.5 below: 8

PROPANEReturnCode propane_log_var( PROPANESignalID handle, void * value );

Figure 4.5. Function call for variable probes to be inserted into target source code.

The first parameter is the handle defined in the information structure above, and the second parameter is a pointer to the variable that is to be logged. The same function is used for all variable types. The function returns either PROPANE_OK if everything went fine, or PROPANE_FAILURE if something went wrong. Errors during this function call can only occur if the content of the probe information structure has been corrupted. In Figure 4.6 below is an example of how the instrumentation for variable probes may look in the source code of the target system. /* This is how the function calls are made in the * target system source code. */ int SetValue; PROPANEReturnCode probe_rc; ... probe_rc = propane_log_var( PROBE_SETVALUE, &SetValue ); if( PROPANE_OK == probe_rc ) { /* Everything went fine! */ } else { /* Something went wrong! */ }

Figure 4.6. Example of target system instrumentation for variable probes.

Note that the steps above are not sufficient for getting a working variable probe during experiment execution, the probe must also be activated in the Experiment Description (see below). The actions required for event probes are similar to those required for variable probes. Structures for information and for data have to be declared in the configuration source file, as illustrated in Figure 4.7 below. /* This variable contains the number of defined probes */ unsigned int propane_no_of_log_events = n; /* This array contains information about the defined probes */ PROPANELogEventInfo propane_log_event_info[n] = { /* Each probe must be defined with a line as shown * below. */ { handle, name, channel }, { handle, name, channel }, ... }; /* This array is used by the library during run-time */ PROPANELogEventData propane_log_event_data[n];

Figure 4.7. Constants and structures for event probes.

The variable propane_no_of_log_events holds the number of defined probes in the event probe information and data arrays. It is very important that the arrays contain exactly the number of entries as specified in this variable. The array contains information about the defined probes that will not change over time. For every event probe that is to be inserted into the system, there must be one entry containing the following information: a handle, a name, and a channel. The only difference between variable probes and event probes from this point of view is that event probes do not have a type. That is, all events are considered being type-less. The remaining information works as described for variable probes above. 9

In Figure 4.8 is an example showing how event probes are defined in the configuration source file and how they are inserted into the source code of the target system. /* We have two probes. This is entered in the * configuration source file. */ #define PROBE_EA1_DETECT (0) #define PROBE_EA2_DETECT (1) unsigned int propane_no_of_log_events = 2; PROPANELogEventInfo propane_log_event_info[2] = { { PROBE_EA1_DETECT, “EA1_detection”, “” }, { PROBE_EA2_DETECT, “EA2_detection”, “” }, }; PROPANELogEventData propane_log_event_data[2];

Figure 4.8. Example of event probes in the PROPANE configuration source file.

When all the desired event probes have been defined in the configuration source file, they must be inserted into the target system. The prototype for the function that must be called from the target system is shown in Figure 4.9 below: PROPANEReturnCode propane_log_event( PROPANEEventID handle );

Figure 4.9. Function call for event probes to be inserted into target source code.

The parameter is the handle defined in the information structure above. The function returns either PROPANE_OK if everything went fine, or PROPANE_FAILURE if something went wrong. In Figure 4.10 below is an example of how the instrumentation for event probes may look in the source code of the target system. /* This is how the function calls are made in the * target system source code. */ PROPANEReturnCode probe_rc; ... if( assertC( SetValue ) == ERROR_DETECTED) { probe_rc = propane_log_event( PROBE_EA1_DETECT ); if( PROPANE_OK == probe_rc ) { /* Everything went fine! */ } else { /* Something went wrong! */ } }

Figure 4.10. Example of target system instrumentation for event probes.

Note that the steps above are not sufficient for getting a working event probe during experiment execution, the probe must also be activated in the Experiment Description (see below).

4.4.2 Instrumenting for fault injection Fault injection with PROPANE requires three tasks: 1) defining the faults in the PROPANE configuration source file, 2) inserting the faults and fault activation checks into the target source code, and 3) activating faults in the experiment description files. Since the injected faults are modifications of the actual source code of the target system, the only limitation on what a fault can 10

actually be is set by the imagination of the experimenter. All faults are always present in the system at run-time in an inactive state and the faults that are to be injected in an experiment are activated in the experiment description file. If fault injection is used, each fault must be specified in the PROPANE configuration source file in the data items shown in Figure 4.11 below. /* This variable contains the number of defined faults */ unsigned int propane_no_of_faults = n; /* This array contains info about the defined faults */ PROPANEFaultInfo propane_fault_info[n] = { /* Each fault must be defined with a line as shown * below. */ { handle, name, channel }, { handle, name, channel }, ... };

Figure 4.11. Constants and structures for faults.

The variable propane_no_of_faults holds the number of defined faults in the fault information array. It is very important that the array contains exactly the number of entries as specified in this variable. The array propane_fault_info contains information about the defined faults that will not change over time. For every fault, there must be one entry containing the following information: a handle, a name, a filename, and a file pointer. The channel will be created automatically during the PROPANE Library setup-phase and will not need to be entered in the configuration source file. The handle is an integer value and is used in the call made from the target system for fault activation check. The handle must be equal to the index in the array, i.e. the first fault must have handle 0, the second handle 1, and so on. The nth fault must have handle (n-1). It is a good idea to create a preprocessor constant for each handle and then use that constant in the function calls in the target system. The name of a fault may be any normal C-string not containing white space. This name must then be used in the PROPANE setup files (described below) when specifying the faults that shall be activated in an experiment. In Figure 4.12 is an example showing how faults are defined in the configuration source file. /* We have two faults. This is entered in the * configuration source file. */ #define FAULT_F001 (0) #define FAULT_F002 (1) unsigned int propane_no_of_faults = 2; PROPANEFaultInfo propane_fault_info[2] = { { FAULT_F001, “Fault_f001”, “” }, { FAULT_F002, “Fault_f002”, “” }, };

Figure 4.12. Example of fault declaration in the PROPANE configuration source file.

When all the desired faults have been defined in the configuration source file, the corresponding faulty code segments must be inserted into the target system. Each segment of code implementing a fault must be surrounded by a fault activation check. In order to check from the target system if a particular fault is activated, the function shown in Figure 4.13 is called.

11

PROPANEReturnCode propane_fault_is_active( PROPANEFaultID handle );

Figure 4.13. Function call fault activation check.

The parameter is the handle defined in the information structure above. The function returns (1u) if the specified fault is activated and (0u) if it is not (or if the specified fault does not exist). In Figure 4.14 is an example showing how a fault is inserted into the target system, and how the fault activation check is implemented. /* This is how the function calls are made in the * target system source code. */ int SetValue PROPANEReturnCode probe_rc; ... if( propane_fault_is_active( FAULT_F001 ) ) { /* Fault F001 is activated. Execute faulty code. */ ... } else { /* Fault F001 is not activated. Execute correct code. */ }

Figure 4.14. Example of target system instrumentation for fault injections.

Note that the steps above are not sufficient for injecting faults. The faults that are to be injected must also be activated in the Experiment Description (see below).

4.4.3 Instrumenting for error injection Error injection with PROPANE requires the specification of three things: error type, error target, and injection location. Error types are specified in the Experiment Descriptions (see below). The error target is the variable (or rather memory location) where the error is to be injected, and the injection location is where the injection itself is to be performed. By dividing an error injection into these three parts, the number of items that have to be specified can be reduced and the experimental freedom is increased. For example, if an error type is very common for many different error targets, it is sufficient to specify one error type and use this error type on all error targets. Error targets and injection locations are specified by instrumenting the source code of the target system. If error injection is used, the configuration source file must contain the data items shown in Figure 4.15 below. /* This variable contains the number of defined locations */ unsigned int propane_no_of_locations = n; /* This array contains info about the defined locations */ PROPANELocationInfo propane_location_info[n] = { /* Each location must be defined with a line as shown * below. */ { handle, name, filename, file pointer }, { handle, name, filename, file pointer }, ... };

Figure 4.15. Constants and structures for injection locations.

The variable propane_no_of_locations holds the number of defined locations in the error information array. It is very important that the array contains exactly the number of entries as specified in this variable. 12

The array propane_location_info contains information about the defined locations that will not change over time. For every location, there must be one entry containing the following information: a handle, a name, a filename, and a file pointer. The filename and file pointer will be created automatically during the PROPANE Library setup-phase and will not need to be entered in the configuration source file. The handle is an integer value and is used in the call made from the target system for error injection. The handle must be equal to the index in the array, i.e. the first location must have handle 0, the second handle 1, and so on. The nth location must have handle (n1). It is a good idea to create a pre-processor constant for each handle and then use that constant in the function calls in the target system. The name of a location may be any normal C-string not containing white space. This name must then be used in the PROPANE setup files (described below) when specifying injections to be made during an experiment. In Figure 4.16 is an example showing how event probes are defined in the configuration source file and how they are inserted into the source code of the target system. /* We have two locations. This is entered in the * configuration source file. */ #define LOCATION_CALC (0) #define LOCATION_VALVREG (1) unsigned int propane_no_of_locations = 2; PROPANELocationInfo propane_location_info[2] = { { LOCATION_CALC, “Location_CALC”, “”, NULL}, { LOCATION_VALVREG, “Location_VALVREG”, “”, NULL}, };

Figure 4.16. Example of locations in the PROPANE configuration source file.

When all the desired locations for error injection have been defined in the configuration source file, the corresponding high-level software traps must be inserted into the target system. The prototype for the error injection function is shown in Figure 4.17 below:

PROPANEReturnCode propane_inject( PROPANELocationID handle, void * value, PROPANEValueType type );

Figure 4.17. Function call for error injection.

The first parameter is the handle defined in the information structure above, the second parameter is a pointer to the error target, and the third parameter indicates the type of the error target. The type can take on the same values as the variable probes (see above). The function returns either PROPANE_OK if everything went fine, or PROPANE_FAILURE if something went wrong. The error injected at the location is specified in the Experiment Descriptions as described below. If several errors are defined for the same location, all errors will be injected with the same call to the injection function. Also, the same location may be used for several error targets. Figure 4.18 below shows an example of how target system instrumentation may look for error injection.

13

/* This is how the function calls are made in the * target system source code. */ int SetValue PROPANEReturnCode probe_rc; ... probe_rc = propane_inject( LOCATION_CALC, &SetValue, PROPANE_INT ); if( PROPANE_OK == probe_rc ) { /* Everything went fine! */ } else { /* Something went wrong! */ }

Figure 4.18. Example of target system instrumentation for error injections.

Note that the steps above are not sufficient for injecting errors, the combinations of locations and error types (called injections) must also be specified in the Experiment Description (see below).

4.5 PROPANE and environment simulators The environment simulator used in the experiments has to provide a function, which the PROPANE Library calls during its setup phase to initialise the simulator. This function must have the following interface:

PROPANEReturnCode propane_setup_simulator( char * filename )

Figure 4.19. The interface of the external simulator setup function

The parameter passed to the function will be the name of the simulator setup file specified in the Experiment Description (see below for details). The function must return either PROPANE_OK if the simulators were successfully set up, or PROPANE_FAILURE if an error occurred during setup. All necessary types and constants are provided in the header file propane.h included in the PROPANE tool. There are no restrictions as to how the simulators are designed or how the simulator setup file is formatted. PROPANE only calls the provided function with a file name and expects a return code as described above.

4.6 File formats The files used when setting up PROPANE of injection campaigns are organised as illustrated in Figure 4.20 below. At the top level, we have the Database Description containing information on where the remaining setup files are found and where the obtained readouts are to be stored. In the Database Description is also a listing of the campaigns that make up the database. Each campaign has a Campaign Description containing information regarding the execution of the experiments, such as the name of the executable file that shall be used and a listing of the experiments that make up the campaign. For each experiment, there is an Experiment Description containing details for the experiment, such as which probes shall be activated, which injection shall be performed, and which setup file is to be used for the configuration of the environment simulator. The simulator may be different for each environment simulator and are specified by the designer of the simulators. They are not part of the PROPANE tool and are therefore not described here.

14

Database description

...

Campaign description

Campaign description

Campaign description

...

Experiment description

Experiment description

Experiment description

Simulator setup file

Simulator setup file

Simulator setup file

Figure 4.20. Organisation of PROPANE setup files

The description files are ASCII text files and contain a set of parameters, one parameter per line. If several parameters are specified on the same line in the file, only the first one will be recognised. All commands start with a ‘>’ in the leftmost position of the line followed by the parameter name. All lines not starting with a ‘>’ are ignored.

4.6.1 Database Description The database description file has the format shown in Figure 4.21 below. >database >work directory >readout directory >campaign



Figure 4.21. The file format of the Database Description.

The database parameter is a numerical value used for giving different databases individual identifications. The work directory parameter tells PROPANE where all the remaining setup files can be found. This is also where all log files will be stored. The readout directory tells PROPANE where to store all obtained readouts. These three parameters must be specified before any campaigns can be specified. A campaign that is to be included in the database is specified with the parameter campaign. After the parameter name comes the filename of the campaign description file for that campaign. PROPANE will look for the specified file in the directory specified by the parameter work directory. The file extension used for Database Descriptions is .pdd.

4.6.2 Campaign Description The experiment description file has the format shown in Figure 4.22 below. >campaign >executable >execution width >experiment >use



Figure 4.22. The file format of the Campaign Description.

The campaign parameter is a numerical value used for giving different campaigns individual identifications. The executable parameter tells PROPANE which file to use as the executable file for the experiments. The execution width tells PROPANE how many individual processes for experiment execution to spawn at the same time, i.e. this number is how many experiments will be 15

started in parallel. On fast computers, this will reduce the total amount of time required for executing the entire campaign. These three parameters must be specified before any experiments can be specified. An experiment that is to be included in the campaign is specified with the parameter experiment. After the parameter name comes the filename of the experiment description file for that experiment. PROPANE will look for the specified file in the work directory specified in the database description file. The use parameter tells PROPANE to use the specified file in the campaign setup process, i.e. it is similar to the #include pre-processor directive in C. This file must have the same format as described here. The file extension used for Campaign Descriptions is .pcd.

4.6.3 Experiment Description The experiment description file has the format shown in Figure 4.23 below. >experiment >simulator >probe (|ALL) >error () >error injection >fault injection >use

Figure 4.23. The file format of the Experiment Description.

The experiment parameter is a numerical value used for giving different experiments individual identifications. The simulator parameter tells PROPANE which file to use as the setup file for the environment simulators. PROPANE uses an externally provided simulator setup function, which is called during experiment setup. The name of the simulator setup file is passed to this function as parameter. A probe in the target system is activated with the parameter probe. Each probe may be activated individually with the command probe , or all probes may be activated with a single command, probe ALL. The probes must be inserted in the target system manually, and entered in the PROPANE configuration source file (as described earlier). Although variable probes and event probes look slightly different in the configuration source file, the activation looks identical. The probe name must be the name specified in the PROPANE configuration source file. An error type is specified with the parameter error. An error type is specified with a number of information fields. The is the alphanumeric name of the error type and follows the specifications of C-strings. The can be one of the following: •

E_SETMIN –set error target to the minimum value that the type of the target in question can hold. This error type does not take any parameters.



E_SETMAX –set error target to the maximum value that the type of the target in question can hold. This error type does not take any parameters.



E_SETVALUE –set error target to a constant value. The constant value is specified in . is not used.



E_FACTOR – multiply the value of the error target with a factor specified in . is not used.



E_OFFSET – add an offset to the error target. The offset is specified in . is not used.



E_FACTOR_AND_OFFSET – first multiply the value of the error target with a factor and then add an offset. The factor is specified in and the offset is specified in .

16



E_OFFSET_AND_FACTOR – first add an offset to the error target and then multiply with a factor. The offset is specified in and the factor is specified in .



E_BITFLIP – flip bits in the bit-vector representation of the error target. The bits to flip are specified as a bit-mask in . is not used.



E_BITSET – set bits in the bit-vector representation of the error target. The bits to set are specified as a bit-mask in . is not used.



E_BITCLEAR – clear bits in the bit-vector representation of the error target. The bits to clear are specified as a bit-mask in . is not used.



E_SETVALUE_A – the equivalent of E_SETVALUE but for variables of type PROPANE_AREA. The value specified in is the offset from the start of the area, and is the value to which the memory location shall be set. The start of the area is specified in the call to the injection function (as described in section 4.4).



E_BITFLIP_A – the equivalent of E_BITFLIP but for variables of type PROPANE_AREA. The value specified in is the offset from the start of the area, and contains a bit-mask indicating which bits to flip. The start of the area is specified in the call to the injection function (as described in section 4.4).



E_BITSET_A – the equivalent of E_BITSET but for variables of type PROPANE_AREA. The value specified in is the offset from the start of the area, and contains a bit-mask indicating which bits to set. The start of the area is specified in the call to the injection function (as described in section 4.4).



E_BITCLEAR_A – the equivalent of E_BITCLEAR but for variables of type PROPANE_AREA. The value specified in is the offset from the start of the area, and contains a bit-mask indicating which bits to clear. The start of the area is specified in the call to the injection function (as described in section 4.4).

The parameter error injection specifies an actual injection to be performed during the execution of an experiment. An injection is specified with a number of information fields. The specifies where the error is to be injected. The location must be entered in the PROPANE configuration source file (as described earlier). The name used here is the name string specified in the source file. The is the name of an error type specified as described above. The can be one of the following: •

I_ALWAYS – the error is injected every time the specified location is reached. The information field is not used.



I_ONCE_TIME – the error will be injected once at the specified location when the PROPANE timer reaches the value specified in .



I_ONCE_CYCLE – the error will be injected once at the specified location when that location has been reached a certain number of times. This number is specified in .



I_PERIOD_TIME – the error will be injected periodically at the specified location with a period specified in . The period is counted on the PROPANE timer. The first injection will be made the first time the PROPANE timer reaches the period value.



I_PERIOD_CYCLE – the error will be injected periodically at the specified location with a period specified in . The period is counted as the number of times the location is reached. The first injection will be made when the location has been reached so many times as specified in the period value.



I_PROBABILITY – the error will be injected at the specified location with the probability specified in . The PROPANE library uses the built in random number 17

generator of the C Standard Library (srand() and rand()) to calculate a probability value for comparison with the specified probability.. The parameter fault injection specifies a fault that shall be activated. The code that makes up the fault will be executed during the experiment when the corresponding fault activation check is reached. The fault name must be a name specified in the PROPANE configuration source file. Of course, several faults may be activated simultaneously in one experiment. The use parameter tells PROPANE to use the specified file in the experiment setup process, i.e. it is similar to the #include pre-processor directive in C. This file must have the same format as described here. This way of including experiment description files is useful if there are many experiments which have for example the same probes activated. In that case, only one file containing all necessary >probe parameters is necessary. This file is then included with the >use parameter in all experiments which have the same active probes. The same can be done with error types and injections. The file extension used for Experiment Descriptions is .pxd.

5. Executing experiments When all preparations for running an experiment have been completed it is time to enter the injection phase. During the execution of the experiments, a number of log files and readout files will be generated. For every description file – Database Descriptions, Campaign Descriptions, and Experiment Descriptions – there is a corresponding log file and a corresponding readout file. The log files contain records of the actions performed by the PROPANE tool including error messages if anything should go wrong during the execution of the experiments. The readout files contain data obtained from probes, injections and simulators, and form the input for the analysis phase.

5.1 Man-Machine Interface The interface of the Campaign Driver is a simple console-based interface. The interface contains a number of menus and information displays, which will guide the user through the injection phase. The main menu, which is displayed upon starting the Campaign Driver, is shown in Figure 5.1. ============================================================================== PROPANE Campaign Driver / MAIN MENU ==============================================================================

L) Load database R) Run campaign(s) A) About PROPANE E) Exit Current database: (n campaign(s)) Enter your choice:

Figure 5.1. The main menu of the Campaign Driver

The menu provides four options the user may choose: •

Load a Database Description;



Run one or more campaigns;



View an information display about the PROPANE tool; and



Exit the PROPANE Campaign Driver

The main menu also displays the Database Description currently loaded in the PROPANE tool. If no Database Description is loaded, the line will not be seen on the display. When a Database 18

Description has been loaded, it is time to proceed to the menu for controlling campaigns (see Figure 5.2) by choosing Run campaign(s). ============================================================================== PROPANE Campaign Driver / RUN CAMPAIGNS ==============================================================================

A) All campaigns S) Select campaigns B) Back to main menu

Enter your choice:

Figure 5.2. The menu for running campaigns

This menu is the main starting point of actually executing the experiments in the campaigns. The options here are: •

run all campaigns in the database;



select some of the campaigns in the database and run them; and



go back to the main menu.

If only a selection of the available campaigns is to be run, the campaign selection menu is displayed (see Figure 5.3). ============================================================================== PROPANE Campaign Driver / SELECT CAMPAIGNS ==============================================================================

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

[ [ [ [ [ [ [ [ [ [

] ] ] ] ] ] ] ] ] ]



============================================================================== N = Next | P = Prev | (-) = Toggle single/range | C = Cancel | R = Run ============================================================================== Enter your choice:

Figure 5.3. Selecting campaigns to run.

In the campaign selection display, 10 campaigns are shown at once. If there are more campaigns than can be displayed on the screen, it is possible to browse through the campaign list. Browsing forward is done by choosing n or N, and browsing backward is done by choosing p or P. The selection status of campaigns can be toggled by entering their numbers in the list shown on the screen. A range of campaigns can be toggled by entering the range, e.g., for toggling the selection of campaigns 12 through 36 the range 12-36 is entered. If a campaign is selected, an asterisk (*) is shown in the brackets located between the number and the campaign name. The entire selection process can be cancelled by choosing c or C. When the selection process is completed, the selected campaigns are run by choosing r or R. When the actual execution of campaigns has started a combined information and menu display is shown on the screen (see Figure 5.4).

19

============================================================================== PROPANE Campaign Driver / EXECUTION INFORMATION ============================================================================== Database................: target.pdd (5 campaign(s)) Current campaign........: camp_1.pcd (1 of 5, 5 exp., w = 3) Execution status........: RUNNING Experiments.............: started 3, completed 0 (successful 0, failed 0) Total elapsed time......:

0d 00:00:21

(avg 7 s/exp, 85 s/camp)

Campaign elapsed time...:

0d 00:00:21

(avg 7 s/exp)

Estimated time remaining:

0d 00:12:28

============================================================================== U = Update info | P = Pause | C = Continue | A = Abort all | S = Skip campaign ==============================================================================

Figure 5.4. Running and controlling campaigns.

During the execution of the campaigns, information is displayed showing the progress of the experiments. The information is updated automatically every 5 seconds or when one of the options shown in the lower part of the screen is chosen. The execution of experiments can be paused and continued. There are also two levels of abortion: abort all or skip campaign. The former aborts all the campaigns selected for execution and the latter skips the current campaign and starts executing the next (if there is any). Since every experiment is executed in a separate process, an abortion will not be effective until the on-going experiments have been completed, i.e., until the individual processes have terminated.

5.1 Log files The log files contain records of the actions performed by the PROPANE tool during the execution of experiments. These files can be used, for example, to get more information if any errors occurred during the execution of experiments. Recall that every description file used in the execution of experiments generates a log file. The name of the log file will be the same name as that of the description file, but with another extension. A Database Description named database.pdd will generate a log file named database.dlog, a Campaign Description named campaign.pcd will generate a log file named campaign.clog, and an Experiment Description named experiment.pxd will generate a log file named experiment.xlog. All log files will be placed in the work directory specified in the Database Description. There are two types of entries in a log file: normal entries describing completed actions, and error entries describing an error that occurred. The formats of the two entry types are shown in Figure 5.5. Normal entry: @ {Date and time} >>> {OBJECT/ACTIVITY}: {Text} Error entry: @ {Date and time >>> ERROR: {Text}

Figure 5.5. The formats of log entries.

In a log entry, the object/activity indicates the origin of the entry, i.e. either which internal object or which activity the entry comes from. An example of a campaign log file is shown in Figure 5.9.

20

#info: Campaign Log File. Created Wed Aug 02 10:02:49 2000 @Wed Aug 02 10:02:49 2000 >>> SETUP: Campaign ID set to [2003]. @Wed Aug 02 10:02:49 2000 >>> SETUP: Executable set to [target_bc10.exe]. @Wed Aug 02 10:02:49 2000 >>> ROFILE: Initialisation of campaign readout file complete. @Wed Aug 02 10:02:49 2000 >>> SETUP: Added experiment description [exp_11.pxd]. @Wed Aug 02 10:02:49 2000 >>> SETUP: Added experiment description [exp_12.pxd]. ... @Wed Aug 02 10:04:12 2000 >>> CONTROLLER: Process for experiment [exp_14.pxd] completed successfully. @Wed Aug 02 10:04:12 2000 >>> CONTROLLER: Process for experiment [exp_15.pxd] completed successfully. @Wed Aug 02 10:04:12 2000 >>> CONTROLLER: Started 5 experiments. Successfully completed 5, failed 0. @Wed Aug 02 10:04:12 2000 >>> CONTROLLER: Campaign elapsed time: 0d 00:01:23. @Wed Aug 02 10:04:12 2000 >>> ROFILE: Shut down of campaign readout file complete. #info: Campaign Log File closed Wed Aug 02 10:04:12 2000

Figure 5.6. Example of a campaign log file.

5.2 Readout files Readout files are hierarchical, in the same manner as description files are. All generated readout files will be placed in the readout directory specified in the Database Description (see above). The organisation of readout files is illustrated in Figure 5.7 below. Database readouts

...

Campaign readouts

Campaign readouts

Campaign readouts

...

Experiment readouts

Experiment readouts

Experiment readouts

Figure 5.7. Organisation of readout files

A database readout file corresponding to the description file database.pdd is named database.pdr. This file contains references to the campaign readout files of the campaigns in that database. A campaign readout file corresponding to the description file campaign.pcd is named campaign.pcr. This file in turn contains references to experiment readout files of the experiments in the campaign. A reference in these files has the format: >file {filename}. There is one reference per line. The experiment readout files are named experiment.pxr and contain the actual data obtained from the probes, fault injections, error injections and possibly simulators (depending on the environment simulator). The data stored in the experiment readout files is structured in channels. Each channel contains data from a probe, an injection or the environment simulator. The names of the channels follow the format: channel-type.name. Here, channel-type is a variable probe (vp), an event probe (ep), a fault injection (fi), or an error injection (ei). The name is the name of the probe, the injected fault or the location of the error injection. This name is taken from the PROPANE configuration source file. The environment simulators may also add channels to the experiment readout files. The channels should follow the naming convention used by the PROPANE library. The functions used by the environment simulator to add channels to the experiment readout files are shown in Figure 5.8.

21

PROPANEReturnCode propane_open_channel ( char * channel, PROPANEChannelType type ); PROPANEReturnCode propane_channel_info( char * channel, char * info ); PROPANEReturnCode propane_channel_entry( char * channel, char * entry );

Figure 5.8. The functions for handling channels in the experiment readout files.

The function propane_open_channel() is used to open a channel in the experiment readout file. The parameters of this function are the name of the channel and the channel type. The environment simulator shall use the type CT_SIMULATOR – all other types are only used internally in the PROPANE library. The function propane_channel_info() can be used to add information about a channel to the readout file. The PROPANE library uses this function to add information regarding the formatting of the various channels it opens. This function takes the channel name and a string containing the channel information as parameters. The propane_channel_entry() function is used for adding readout data to the readout file. The parameters to this function are the channel name and the readout data that is to be stored. Note: The functions when in Figure 5.8 are only used during initialisation of PROPANE, i.e., during experiment setup. Thus, the only references to these functions that are not internal to the Function Library are those used by the environment simulator when setting up readouts of the simulation (if the simulators do not have readout files of their own). The structures of the channels opened by the PROPANE library are shown in Figure 5.9. // The file begins with an section containing information // regarding the various channels, namely channel names and // entry structures. #info: Experiment Readout File. Created Wed Dec 06 11:36:14 2000 // Channels created by variable probes #channel vp. VARIABLE_PROBE #info vp. [time, value] // Channels created by event probes #channel ep. EVENT_PROBE #info ep. [time] // Channels created by error injections #channel ei. ERROR_INJECTION #info ei. [time, value before, error, value after] // Channels created by fault injections #channel fi. FAULT_INJECTION #info fi. [time] // The entries always come after the channel information section

Figure 5.9. The format of the channels in the experiment readout files.

The data stored in a variable-probe channel contains the global PROPANE timer when the value was captured, and the value. Each time a variable probe is called, there is a check to see whether the value has changed since the last call. If the value has changed, an entry is recorded in the variableprobe channel, otherwise nothing will be recorded. The data stored in an event-probe channel is only the PROPANE timer when the event probe is called. For event probes, every call will be recorded. Only the probes that have been activated in the experiment description file have a channel in the readout file. The data stored in a channel for fault injection contains the global PROPANE timer when the fault was activated, i.e. when the fault activation check for that fault was called. Only the faults that have been activated in the experiment description have a channel in the readout file. The channel entries for error injections contain a number of data fields: the PROPANE global time, the value of the variable before the injection, the name of the injected error, and the value after the injection. An entry will be stored in the channel for each injected error.

22

6. Analysing data The analysis phase of an injection experiment contains such things as estimating coverage values extracting failure error and data, establishing propagation paths of the errors, etc. To facilitate some of the data required for these analyses, PROPANE contains a tool called the Data Extractor. With this tool, one is able to perform golden run comparisons, extract injection information, and create trace files which can easily be imported into spreadsheet programs, such as Microsoft ExcelTM. The tool is invoked in the directory containing the raw data in the readout files created during the execution of the experiments. Before the extraction begins, the tool will ask a number of questions, as illustrated in Figure 7.1. Here we give a short overview of the actions that can be performed by the tool, the details are described in the subsequent sections. ============================================================================== PROPANE Data Extractor & Golden Run Comparator ============================================================================== Database readout file: .pdr Perform Golden Run Comparison? Use error margins? Error margin file: Reading error margin data in []. Write channel logs? Write injection information?

Figure 6.1. Answering the questions asked by the Data Extractor

The database readout file is the overall readout file created during the execution of the experiments. This file contains all references necessary for the Data Extractor to find the remaining readout files. When answering YES to the question regarding golden run comparison, the first campaign referenced in the database readout file will be treated as the golden run and all subsequent campaigns will be compared to that. The golden run comparison is performed at the probe level and only variable probes are considered. The trace of a variable probe in an experiment is compared to the trace of the corresponding variable probe in the golden runs. The basic concept of golden run comparison is that the golden run is considered as a reference to which other runs are compared. Any mismatch between the golden run and the injection run is flagged as an error. However, in the Data Extractor one may choose to utilise error margins which will flag an error during comparison only when the injection run value is too far off the golden run value. The details of golden run comparison are described in section 6.1. Answering YES on the question regarding channel logs makes the Data Extractor create one file for each probe of each experiment. These files will be in a semicolon-delimited file format which is easily imported in most spreadsheet programs. These individual traces can be useful for illustrating certain event which may be of special interest. Bear in mind that, since each probe in each experiment generates one such file, this may generate a lot (really – a lot!) of files and will be rather time consuming. When answering YES on the question regarding injection information, the Data Extractor will generate injection information files containing information on which errors were injected and when they were injected.

6.1 Golden Run Comparisons A Golden Run Comparison (GRC) is performed by comparing the data logged for a so called Golden Run (GR), i.e. a reference run, with the data logged in an injection run (IR), i.e. a run in which errors were actually injected. In the readout files created by PROPANE, all data are logged in channels – one channel for each probe or injection. The probes used in the GRC are only the variable probes. 23

The Data Extractor reads references to campaign readout files specified in the database readout file and will treat the first referenced campaign as the GR campaign and all subsequent references as IR campaigns. It is important that the number of channels in each experiment in the GR campaign is the same as the number of channels in the IR campaigns. If this is not the case, the GRC will not be completed and error messages will be displayed. During the comparison, each GR channel is compared to the corresponding IR channel. The comparison is performed sample by sample and the first mismatch between the GR and the IR is flagged as an error. Only the first mismatch will be taken into account, and the GRC is aborted once the first error is detected. The comparison can be performed requiring identical values between GR channels and IR channels or make use of error margins (see section 6.1.2). The results of the GRC will be written in extraction result files – one such file for each individual campaign (except for the GR campaign). The filename will be the same as for the campaign readout file, but the extension will be changed from .pcr to .per. So, a campaign readout file with the name ei_001_flip_b0.pcr will generate an extraction result file with the name ei_001_flip_b0.per. The extraction result file will contain one line of information for each experiment in the campaign. This information includes for each probe the time stamp of the sample that was found not to match the golden run, the index (only applicable to probes of type PROPANE_AREA, where the index shows the number of bytes from the base address of the area where the error was found), the golden run value and the injection run value. The file format of extraction result files is shown in Figure 6.2. // The file begins with line containing information regarding the // various fields that are found in the file section containing // information. // // The first column contains the name of the experiment from which // the data on that row were generated. Then follow a number of columns // with data for each channel. There are four columns per channel. The // names of the channels are used in the headings of the columns. In // this example the channels are called C1 and C2. // // If no error is detected in a channel, the corresponding columns for // that experiment will be set to –1. experiment name;vp.C1.time;vp.C1.index;vp.C1.gr;vp.C1.inj;vp.C2.time;vp.C2.index;vp.C2.gr;vp.C2.inj; ei_001_flip_b0_01;-1;-1;-1;-1;5075;0;44426;44551; ei_001_flip_b0_02;-1;-1;-1;-1;6045;0;50067;50344; ei_001_flip_b0_03;-1;-1;-1;-1;7102;0;320;321;

Figure 6.2. The format of the extraction result files The following two subsections contain more details on how the GRC is performed by the PROPANE Data Extractor.

6.1.1 Virtual samples The logging performed by the PROPANE variable probes is such that only changes in the variables that are being probed are actually entered into the readout files. This reduces the amount of data stored for each experiment, but requires some extra effort during analysis in order to recreate the “ignored” samples since the kth sample for the GR channel may not correspond in time to the kth sample of the IR channel. Consider the example in Figure 6.3; here we have the trace of a GR channel and the traces of two corresponding IR channels. There are two different scenarios that can arise when comparing the samples of two corresponding channels: I) two samples correspond temporarily, and II) two samples do not correspond temporarily. Scenario I is illustrated in Figure 6.3 where sample #4 for the GR channel (at time t3) corresponds temporarily with sample #4 (also at time t3) of IR channel 2. In the same figure, scenario II is illustrated by sample #4 of the GR channel not corresponding temporarily with sample #4 if IR channel 1.

24

IR1

II

GR IR2 Virtual sample

I

t0

t1

t2

t3

t4

t5

t6

t7

Figure 6.3. An example of GRC performed by the PROPANE Data Extractor In scenario I, the Golden Run Comparison is simply performed as a comparison between the values of the two sample. Scenario II, however, requires a little more work before a comparison can be made. Here a virtual sample is created using the time stamp of sample #4 of the GR channel and the value of the previous IR sample, in this case sample #3 of IR channel 1. This can be done because we know that the samples only show the changes of their channels, so at time t3, IR channel 1 must have had the same value it had at the previous sample point. After the comparison between the virtual sample and sample #4 of the GR channel is complete, sample #4 of IR channel 1 will be compared to sample #5 of the GR channel. Virtual samples would of course also be created if the GR channel would “miss” a sample as compared to the sequence of samples in the IR channel.

6.1.2 Error margins The Data Extractor is capable of applying error margins to the comparison performed during a GRC. With error margins, the two values that are compared do not have to be identical. Each channel can be given individual error margins and the margins can be either absolute or relative. The Data Extractor reads the various error margins from an error margin configuration file. Each channel that is to be compared using an error margin must have an entry in the file, and this entry has to comply to the format shown in Figure 6.4. >margin

Figure 6.4. The file format of error margin configuration files. The name of a channel is the name given to the channel by PROPANE and is always vp., where the probe name is the name chosen for the probe by the user. The margin type can be either ABSOLUTE or RELATIVE. The up-value will be the margin upper bound from the golden run value, and the down-value will be the lower bound. The two values may be different, i.e. the up-value may be different from the down-value. Golden Run Comparison can be performed for all value probe types except those of type PROPANE_AREA. If the margin type is ABSOLUTE, then the up and down-value will be considered as constant offsets within which an injection run may be compared to the corresponding golden run and still be treated as “correct”. For example, if a channel has an absolute error margin of 5 up and 10 down, and a golden run sample of that channel has the value 100, then the injection run sample of that channel will be considered correct as long as it is within the range 100-10 and 100+5 ! within 90 and 105. If the golden run sample were instead 200, the range would be between 190 and 205. For RELATIVE error margins, the up- and down-values will be used as factors on the golden run value to get the upper and lower bound of the approved range. For example, if we have a relative error margin for a channel with an up-value of 0.05 and a down-value of 0.10, and a golden run sample of that channel has the value 100, then the injection run sample of that channel will be considered correct as long as it is within the range 100·(1.0 - 0.10) and 100·(1.0 + 0.05) ! within 90 and 105. If the golden run sample would instead be 200, the range would be between 180 and 210.

25

GR

GR

IR

IR

First mismatch outside the error margin. First mismatch outside the error margin.

t0

t1

t2

t3

t4

t5

t6

t7

t0

t1

t2

t3

t4

t5

t6

t7

Figure 6.5. Two comparisons of the same GR and IR trace using ABSOLUTE margins (left) and RELATIVE margins (right).

In Figure 6.5, the difference between the two types of error margins is illustrated. Here we have a Golden Run channel and a corresponding Injection Run channel. The left figure illustrates the GRC using absolute error margins, and the right figure illustrates the GRC using relative margins. In this example, with absolute error margins an error will be detected at sample #5, whereas with relative error margins, an error is not detected until sample #6.

6.2 Channel logs Channel logs may be useful for doing a more detailed analysis of different channels than the Data Extractor can provide. However, one should bear in mind that one file will be generated for each individual channel of each individual experiment, i.e. for 10 experiments with 10 channels each, the Data Extractor will generate 100 channel logs containing the samples of the channels. The files are delimited text files that are easily imported into a spreadsheet tool, such as Microsoft Excel, where further analysis or graphical representation may be performed.

6.3 Injection information When injection information files are generated, the Data Extractor creates one file for each campaign (except the GR campaign). The injection information files will have one line of information for each experiment in their corresponding campaign containing the time stamps. This information is sometimes useful for filtering out values and events that are logged before the actual injection, since these may not be of any interest. The filename for each injection information file will be the same as for the campaign readout file, but the extension will be changed from .pcr to .pii. So, a campaign readout file with the name ei_001_flip_b0.pcr will generate an injection information file with the name ei_001_flip_b0.pii.

7. PROPANE architecture This chapter describes the internal architecture of the PROPANE tool. The tool consists of two separate parts: the PROPANE Campaign Driver and the PROPANE Library. We will first describe the Campaign Driver and then the Library.

7.1 The PROPANE Campaign Driver The PROPANE Campaign Driver is the main desktop part of the PROPANE tool and consists of six objects (see Figure 7.1): •

the Menu Handler;



the Database Manager;



the Executor; 26



the Controller;



the Log Unit; and



the Readout Unit.

MENU HANDLER IN: Commands

Campaign list

DATABASE MANAGER Log entries Readout file entries

Database information

Campaign index list OUT: Information

EXECUTOR

Control commands

Database Description

Log entries Readout file entries

Campaign Description

Setup files

LOG UNIT Log files

Log entries Readout file entries

CONTROLLER

READOUT UNIT Readout files

Experiment description Readout directory

Experiment Experiment Experiment process process process

Figure 7.1. The internal architecture of the PROPANE Campaign Driver In Figure 7.1, the objects belonging to the PROPANE Campaign Driver are found inside the dotted box. The other objects are external. The Menu Handler is in charge of the menus presented to the user. From here, the user can load Database Descriptions, select campaigns, and initiate campaign execution. When a Database Description is to be loaded, the filename specified by the user is passed to the Database Manager, which reads the file and sets up the internal database. If the database is successfully set up, a list of the available campaigns is returned to the Menu Handler. From this campaign list, the user may choose to select a subset of campaigns to execute or to execute all campaigns. When a set of campaigns has been selected for execution, a list containing information about these campaigns is passed to the Executor, which then starts the actual execution. During the execution it displays an information screen and allows the user to control the campaigns. For each campaign the Executor sets up the Controller and starts a separate thread for the Controller. The Controller reads information about the campaigns from the Database Manager and uses this information when executing the experiments. For each Experiment Description in a campaign, the Controller will spawn a new process in which the target system executable file is executed. The Controller passes the Experiment Description and the Readout Directory on the PROPANE Library, which is a part of the executable. Executing each experiment in its own process guarantees that the target system is reset for each experiment so that the starting conditions are the same for all experiments. Several processes may be started simultaneously, depending on the execution width specified in the description files. During the execution of campaigns, the user may choose to send control commands via the Executor to the Controller in order to manipulate the execution of campaigns. The user may choose to pause and continue execution, or to skip the current campaign or abort all campaigns. The Database Manager, the Executor and the Controller all use two support units: the Log Unit and the Readout Unit. The Log Unit handles the database log files and campaign log files, and the Readout Unit handles the database readout file and campaign readout files. The other units send entries to the Log Unit and the Readout Unit, which are then stored in the log files and the readout files correspondingly.

27

7.2 The PROPANE Library The PROPANE Library is a function library enabling the PROPANE Campaign Driver to communicate with the experiment processes. It also contains everything necessary for the user to instrument a target system for variable and event logging, fault and error injection, and environment simulator control. The library is to be linked together with the target system and is mainly a passive component. The experiment executable may be executed manually outside of the control of the Campaign Driver in which case it asks on the console for the information it would otherwise receive from the Campaign Driver. The PROPANE Library consists of 5 objects (see Figure 6.2): •

the Experiment Handler;



the Probe Manager;



the Injector;



the Log Unit; and



the Readout Unit. From the Campaign Driver

Experiment description Readout directory

EXPERIMENT HANDLER Setup files

Error information Injection information

INJECTOR

Readout data Log entries Readout file entries

LOG UNIT Log files

Readout data

Probe information

PROBE MANAGER

READOUT UNIT Log entries Readout file entries

Injection calls

Probe calls

Log entries Readout file entries

Target system

Simulator setup file

Readout files

Environment simulator

Readout data

Figure 7.2. The internal architecture of the PROPANE Library In Figure 7.2, the objects belonging to the PROPANE Library are found inside the dotted box. The other objects are external. The Experiment Handler is the main interface unit. It receives information from the Campaign Driver on which experiment to run and where to put the generated readouts. The Experiment Handler reads the specified Experiment Description and extracts the information needed for the experiment. Information regarding faults and errors is passed to the Injector, information regarding activated probes is passed to the Probe Handler, and the name of the simulator setup-file is passed to the external environment simulator. The Experiment Handler also initiates the Log Unit and the Readout Unit so that the experiment log file and experiment readout file is generated. Note that the experiment readout file is not the file in which the actual readout data is stored. This data is stored in a number of files, one for each readout collection point (i.e. variable probe, event probe, decision point, or injection location). The Injector receives fault and error information from the Experiment Handler and uses this 28

information to set up the injections that are to be performed during the experiment. Once it is activated, it will wait for the target system to call either the fault activation check routine or the error injection routine. When the fault activation check routine is called, it will decide which path the execution shall take, based on fault activation information in the setup of the experiment. When the error injection routine is called the errors specified for the location from which the routine is called will be injected. Whenever an injection is performed, an entry with readout data in the readout file for the fault or the error location will be made. The Log Unit and the Readout Unit are support units and work in much the same way as their equivalents in the PROPANE Campaign Driver do, i.e. they handle the experiment log file and the experiment readout file respectively. These two units are used by the internal PROPANE units but can also be used by the external environment simulator if it is programmed to do so.

8. Summary This report describes the Propagation Analysis Environment (PROPANE). This tool is developed for analysing the propagation and effect of errors in embedded software systems. The tool is a desktop environment and contains support for conducting fault and error injections in a simulation of a target system. The tool also provides support for inserting probes into the target system enabling the logging of variables and events during injection experiments. PROPANE is totally target system independent, i.e. it may be used on any target system providing that you are able to simulate it in a desktop environment. PROPANE requires that the target system used in the experiment be such that one may construct a single executable containing the target system and any environment simulators needed to feed the target system with test cases during the experiments. The tool is mainly designed for use with simulations of single-node embedded control systems. The tool is intended for software that has a modularised structure. Furthermore, communication between the modules is thought to be conducted using signals of some kind. That is, a module passes values to another module by means of e.g. shared memory, a mailbox system or parameter passing in function calls. PROPANE is capable of controlling fault injections as well as error injections. Fault injections are performed by instrumenting the source code of the target system with segments of faulty code and with fault activation checks. All faults are present in the target system in an inactive mode. For each experiment, the desired faults are then activated in the experiment setup. Error injections are performed by manipulating the contents of memory locations according to a number of different ways. Error injection requires target system instrumentation in the form of high-level software traps. The actual errors that are injected are defined in the experiment setup. There is also support for two types of probes: variable probes and event probes. Variable probes are used to record the values of a variable during the execution of the experiment. Event probes are used to record the occurrence of different events in the target system. For analysis, the toolkit contains the PROPANE Data Extractor, which can perform Golden Run Comparisons for each channel created by a variable in the readout files. The results will be stored in a text file with a spreadsheet format that is easily imported into other tools for further analysis. The tool can also extract injection information from the readout files and store this in separate files, and create channel logs for each individual channel of each individual experiment if a more detailed analysis or graphical representation is desired.

References [Chillarege89]

R. Chillarege, N. S. Bowen, "Understanding Large System Failures - A Fault Injection Experiment", Proc. 19th Int. Symp. On Fault Tolerant Computing, pp. 356-363, June, 1989

[Iyer95]

R.K. Iyer, “Experimental Evaluation”, Special Issue FTCS-25 Silver Jubilee, 25th Int. 29

Symp. on Fault Tolerant Computing, pp. 115-132, June, 1995 [Laprie92]

J.C. Laprie (ed.), "Dependability: Basic Concepts and Terminology", Dependable Computing and Fault-Tolerant Systems series, Vol. 5, Springer-Verlag, 1992

[Arlat90]

J. Arlat et al., "Fault Injection for Dependability Validation: A Methodology and Some Applications", IEEE Trans. On Software Eng., vol. 16, no. 2, pp. 166-182, February, 1990

[Arlat93]

J. Arlat, Y. Crouzet, J.C. Laprie, “Fault Injection and Dependability Evaluation of Fault Tolerant Systems”, IEEE Trans. on Computers, Vol. 42, No. 8, pp. 913-923, August, 1993

[Karlsson91]

J. Karlsson, P. Lidén, P. Dahlgren, R. Johansson, U. Gunneflo, "Using Heavy-ion Radiation to Validate Fault-Handling Mechanism", Proc. Int. Test Conference, pp. 140149, 1991

[Kao94]

W.L. Kao, "Experimental Study of Software Dependability”, Ph.D. thesis, Technical report CRHC-94-16, Department of Computer Science, University of Illinois at UrbanaChampaign, Illinois, 1994

[Hudak93]

J. Hudak, B.H. Suh, D. Siewiorek, Z. Segall, "Evaluation & Comparison of FaultTolerant Software Techniques", IEEE Trans. on Reliability, Vol. 42, No. 2, pp. 190-204, June 1993

[Segall88]

Z. Segall, et al., “FIAT – Fault-Injection based Automated Testing environment”, Proc. 18th Int. Symp. On Fault Tolerant Computing, pp. 102-107, June, 1988

[Kanawati95]

G.A. Kanawati, N.A. Kanawati, J.A. Abraham, “FERRARI: A Flexible Software-Based Fault and Error Injection System”, IEEE Trans. on Computers, Vol. 44, No. 2, pp. 248260, February, 1995

[Young92]

L. Young, R.K. Iyer, K. Goswami, and C. Alonso, “Hybrid Monitor Assisted Fault Injection Environment”, Proc. Third IFIP Working Conference on Dependable Computing for Critical Applications, pp. 163-174, September, 1992

[Tsai96]

T.K. Tsai, R.K. Iyer, “An approach towards Benchmarking of Fault-Tolerant Commercial Systems”, Proc. 26th Int. Symp. On Fault Tolerant Computing, pp. 314325, June, 1996

[Carreira95]

J. Carreira, et al., “Xception: Software Fault Injection and Monitoring in Processor Functional Units”, Proc. Fifth IFIP Working Conference on Dependable Computing for Critical Applications, pp. 135-149, September, 1995

[Han95]

S. Han, K.G. Shin, H.A. Rosenberg, “DOCTOR: An IntegrateD SOftware Fault InjeCTiOn EnviRonment for Distributed Real-Time System”, Proc. IPDS’95, pp. 204213, 1995

[Christmansson97]

J. Christmansson, M. Rimén, “A fault injection campaign control computer (FIC3)”, Technical report no. 298, Chalmers University of Technology, Göteborg, Sweden, December, 1997

[Christmansson98]

J. Christmansson, M. Hiller, M. Rimén, “An Experimental Comparison of Fault and Error Injection”, Proc. 9th International Conference on Software Reliability Engineering (ISSRE-98), pp. 369-378, November, 1998

[Chillarege92]

R. Chillarege, I. Bhandari, J. Chaar, M. Halliday, D. Moebus, B. Ray, M. Wong, "Orthogonal Defect Classification - A Concept for In-Process Measurements", IEEE Trans. on Software Eng., Vol. 18, No. 11, pp. 943-956, November, 1992

[Jenn94]

E. Jenn, J. Arlat, M.Rimén, J. Ohlsson, J.Karlsson, “Fault Injection into VHDL Models: The MEFISTO Tool”, Proc. 24th International Sumposium on Fault Tolerant Computing (FTCS-24), pp. 66-75, June, 1994

[Folkesson98]

P. Folkesson, S. Svensson, J. Karlsson, “A Comparison of Simulation Based and Scan Chain Implemented Fault Injection”, Proc. 28th International Symposium on Fault Tolerant Computing (FTCS-28), pp. 284-293, June, 1998

30

Suggest Documents