A source-to-source compiler for generating dependable software

0 downloads 0 Views 83KB Size Report
Nov 10, 2001 - Published in IEEE Int. Workshop on Source Code Analysis and Manipulation (SCAM2001), Florence, Italy, ... Fault Tolerance approach is proposed, based on a set of ..... standard free-software compiler construction tools Bison ..... 1984, pp. 67-80. [4]. A. Avizienis, “The N-Version Approach to Fault-.
Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

A source-to-source compiler for generating dependable software Maurizio Rebaudengo, Matteo Sonza Reorda, Massimo Violante Dip. Automatica e Informatica Politecnico di Torino, C.so Duca degli Abruzzi, 24 I-10129 Torino, Italy {reba, sonza, violante}@polito.it Abstract1 Over the last years, an increasing number of safetycritical tasks have been demanded to computer systems. In particular, safety-critical computer-based applications are hitting market area where cost is a major issue, and thus solutions are required which conjugate fault tolerance with low costs. In this paper, a source-to-source compiler supporting a Software-Implemented Hardware Fault Tolerance approach is proposed, based on a set of source code transformation rules. The proposed approach hardens a program against transient memory errors by introducing software redundancy: every computation is performed twice and results are compared, and controlflow invariants are checked explicitly. By exploiting the tool’s capabilities, several benchmark applications have been hardened against transient errors. Fault Injection campaigns have been performed to evaluate the fault detection capability of the hardened applications. In addition we analyzed the proposed approach in terms of space and time overheads.

1. Introduction The adoption of computer-based systems in new areas (such as automotive or medical devices) where safety is a serious constraint asks for the availability of new methods for developing dependable systems. In particular, in some of the new areas where computer-based dependable systems are currently being introduced, the cost (and hence the design and development time) is often a major concern, and the adoption of commercial hardware components is a common practice. As a result, SoftwareImplemented Hardware Fault Tolerance (SIHFT) is often 1

This work has been partially supported by ASI (Agenzia Spaziale Italiana): Ricerca Fondamentale 1999

Marco Torchiano Dept. Computer Science (IDI) NTNU O.S.Bragstads Plass 2B N-7491 Trondheim, Norway [email protected] adopted. This technique addresses transient hardware faults, caused by the physical environment of the system, in particular those caused by charged particles hitting the circuit [1] [2]. Fault tolerance is achieved by modifying the original software into a functionally equivalent hardened version. Notice that we do not consider the issue of eliminating software bugs: we assume that the code is correct, and the faulty behavior is only due to transient faults affecting the system. The adoption of SIHFT techniques allows the implementation of dependable systems without incurring in the high costs coming from designing custom hardware or using hardware redundancy. On the other side, relying on software techniques for obtaining dependability often means accepting some overhead in terms of increased size of code and reduced performance. Finally, when adopting a software approach for building a dependable system, designers need some simple way for writing the code so that the required level of dependability is guaranteed.

1.1. Background In the following we assume that the system we are considering is composed of a general-purpose microprocessor executing a program whose source code is available to the designer. Moreover, we concentrate on the definition and analysis of a SIHFT technique to detect program miss-behaviors induced by the environment. In particular, due to its practical relevance, we focused on the fault model called upset or bit-flip, which results in the modification of the content of a storage cell during program execution. This perturbation is the result of ionization provoked by either incident charged particles or daughter particles issued from the interaction of energetic particles (i.e., neutrons) and atoms in the silicon substrate. This fault model is also known as Single Event Upsets (SEU). Despite its relative simplicity, SEU is widely used in the Fault Tolerance community to model real faults

Published in IEEE Int. Workshop on Source Code Analysis and Manipulation (SCAM2001), Florence, Italy, November 10th, 2001, pp 33-42

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

since it closely matches real faulty behavior. Several approaches to SIHFT have been proposed in the past. Some of them rely on the concept of design diversity, i.e., the independent generation of two or more different software modules to satisfy a given requirement [3]. An example of design diversity is the N-Version Programming [4], where different designers develop independent versions of the same program in order to avoid common design errors, and the outputs coming from the different programs are compared to identify un-safe behavior. Other approaches to SIHFT have been presented in [5] and [6]. The former paper proposes Recovery Blocks to save the state of a system before the execution of any software module. A roll back operation to the saved state is then performed in the case of errors affecting the module. Conversely, the latter approach is based on error detection and correction codes to protect the data and code segments of applications. A software task is periodically run to verify the content of the system memory and, if required, to correct the effects of soft errors. Recently, some new approaches to SIHFT have been proposed based on the idea of introducing local changes to the source code of un-hardened programs in order to guarantee fault tolerance. For example, ABFT [7] allows hardening programs operating on regular structures such as matrices, while methods based on control flow checking [8] and assertions [9] are used for hardening programs against errors modifying the correct instruction flow. Another approach conjugating the idea of design diversity and source code modification has been recently proposed in [10], where both diverse data and duplicated instructions are used to detect hardware errors. When considering methods based on design diversity, it must be noted that it is up to the programmer to decide which procedures to duplicate and what to compare. These approaches require that programmers explicitly introduce the code duplication, as well as the proper checks on the results. These code modifications are executed manually and thus they may introduce errors. The same limitation holds for the methods based on source code modification. ABFT is a very effective approach but lacks generality. It mainly focuses on faults affecting the data variables, and it is well suited for applications using regular structures; therefore, its applicability is valid for a limited set of problems, only. The use of Assertions, i.e., logic statements inserted at different points in the program that reflect invariant relationships between the variables of the program, can lead to different problems, since assertions are not transparent to the programmer and their effectiveness largely depends on the nature of the application and on the programmer’s skill. The basic idea of Control Flow Checking is to partition the application program in basic blocks, i.e., branch-free parts of code.

For each block a deterministic signature is computed and faults can be detected by comparing the run-time signature with a pre-computed one. In most control flow checking techniques a main problem consists in tuning the test granularity that should be used. An important factor in selecting the technique that best fits the designer needs is represented by the availability of automatic tools. Such a feature enables the application of the considered technique to existing systems with a reduced effort. Furthermore, during system design, developers can focus on the functional requirements, leaving to the tool the task of taking care of the satisfaction of dependability related issues.

1.2. Contribution of the paper The approach presented in this paper exploits data and control flow duplication, in particular it is based on introducing data and code redundancy according to a set of transformation rules originally presented in [11] performed on high-level code for detecting errors affecting both variables and code. The main advantages of the proposed approach are: 1. It is based on transformation rules that can automatically be applied to a high-level source code, thus freeing the programmer from the burden of guaranteeing its robustness against errors. 2. It is completely independent on the underlying hardware, and the hardened code is thus portable over different hardware platforms (e.g., from one processor to another), as already shown in [12] 3. It possibly complements other already existing error detection mechanisms. 4. It is able to detect a wide range of faults, and is not limited to a specific fault model (e.g., faults in the data, or faults affecting the control flow, only); therefore, it concurrently deals with faults affecting the data as well as the program flow. For example, faults induced in the system by highly energized particles such as the one produced by radioactive sources [13]. The results we gathered to analyze the method effectiveness show that programs hardened according to our rules attain high fault coverage figures, at a cost of an increase in the code size and a slow-down in the performance. During our experiments we adopted as fault model the Single Event Upsets model, where transient errors are injected in memory locations storing either data or instructions. Even if the method we propose is expensive in terms of both memory occupation and speed, several applications (e.g., space or biomedical ones) exist where fault detection capability prevails over performance and memory constraints, and where our method can thus be effectively

- 2 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

exploited. To overcome the drawback stemming from the introduced overhead, in [14] an approach has been proposed which is based on similar ideas, and aims at reducing the performance degradation. The method presented in [14] shows reduced fault detection capabilities, and it does not consider faults affecting the memory area storing the code, nor the variables stored in registers. As a result, it is limited to faults in the data memory, while faults in the program memory and in the processor are neglected. The remainder of the paper is organized as follows. Section 2 describes the proposed transformation rules, while Section 3 presents the tool which automatically applies them to a high-level source code. Section 0 evaluate the effects of the rules and discusses the level of fault detection they guarantee. Finally, Section 5 draws some conclusions.

2. Transformation Rules In this section we will show how the high-level source code of a program can be automatically transformed enabling the error detection. We do not make any strict assumption either on the cause or on the type of the fault: without any loss of generality, we can assume that an error corresponds to one or more bits whose value is erroneously changed while they are stored in memory, cache, or register, or transmitted on a bus. Our method, although devised for transient faults, is also able to detect most permanent faults possibly existing in the system. All the transformations we propose, being performed on the high-level code, are independent on the host processor that executes the program, as well as on the system organization (e.g., presence of caches, disks, memory size, etc.). Nevertheless, the optimization flags of the compiler used to produce the executable code software have to be disabled in order to maintain the introduced data and code redundancy through the compilation process.

2.1. Assumptions For the purpose of this paper, we will consider programs written in C, and propose rules to transform the basic constructs of a C program: the extension to the whole language, as well as to other high-level languages is mostly straightforward. The proposed approach is mainly directed towards embedded systems. Typically such a category of systems is designed and developed by means of hardware-software codesign tools. The code synthesized by these tools usually use only variables with primitive types and array avoiding pointer variables and dynamic memory. In

addition such programs don’t use recursion. Based on such considerations we don’t address programs containing pointer variables and recursive functions. The problem with pointers is checking the equality of memory areas they reference. New deep equality operators (or functions) must be introduced to solve this issue. In addition, each structured data type containing pointer fields (such as linked lists) require a custom equality operator. A suitable tool can use the knowledge on data types to automatically generate the set of required equality operators.

2.2. Errors in data A first set of the rules concern the variables defined and used by the program. We refer to high-level code, only, and we do not care whether these variables are stored in the main memory, in a cache, or in a processor register. The proposed rules complement other Error Detection Mechanisms that can possibly exist in the system (e.g., based on parity bits or on error correction codes stored in memory). It is important to note that the detection capabilities of our rules are much higher, since they address any error affecting the data, without any limitation on the number of modified bits or on the physical location of the bits themselves. Moreover, it is important to note that this kind of information redundancy is often implemented in external memories, while it is very seldom adopted to protect memory elements inside a processor (e.g., registers and caches). The following rules can be formulated: • Rule #1: every variable x must be duplicated: let x1 and x2 be the names of the two copies • Rule #2: every write operation performed on x must be performed on x1 and x2 • Rule #3: after each read operation on x, the two copies x1 and x2 must be checked for consistency, and an error detection procedure should be activated if an inconsistency is detected. Here, the term variable is used in a broad sense to indicate: local variables, global variables, function parameters. The above rules mean that any variable x must be split in two copies x0 and x1 that should always store the same value. A consistency check on x0 and x1 must be performed each time the variable is read. The check must be performed immediately after the read operation in order to block the fault effect propagation. Please note that variables should be checked also when they appears in any expression used as a condition for branches or loops, thus allowing a detection of errors that corrupt the correct control flow of the program. Each instruction that writes

- 3 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

variable x must also be duplicated in order to update the two copies of the variable. Every fault that occurs in any variable during the program execution is detected as soon as the variable is the source operand of an instruction, i.e., when the variable is read, thus resulting in minimum error latency, which is approximately equal to the period between the fault occurrence and the first read operation. Errors affecting variables after their last usage, and thus not modifying the program behavior, are not detected. Original code int a,b;

Modified Code int a0,b0, a1, b1;

a0 = b0; a1 = b1; if (b0 != b1) error(); a = b + c; a0 = b0 + c0; a1 = b1 + c1; if((b0!=b1)||(c0!=c1)) error(); Fig. 1: Modification for errors affecting data. a = b;

Two simple examples are reported in Fig. 1, which shows the code modification for an assignment operation and for a sum operation involving the variables a, b and c. Original code int res,a;

Modified code int res0, res1, a0, a1;

res = search (a); … int search (int p) { int q; … q = p + 1; … return(1); }

search(a0, a1, &res0, &res1); … void search(int p0,int p1,int *r0,int *r1) { int q0, q1; … q0 = p0 + 1; q1 = p1 + 1; if (p0 != p1) error(); … *r0 = 1; *r1 = 1; return; }

Fig. 2: Transformation for errors affecting procedure parameters. The parameters passed to a procedure, as well as the returned values, should also be considered as variables. Therefore, the rules defined above can be extended as follows: • Every procedure parameter is duplicated • Each time the procedure reads a parameter, it checks the two copies for consistency

• The return value is also duplicated (in C, this means that the addresses of the two copies are passed as parameters to the called procedure). Fig. 2 reports an example of application of Rules #1 to #3 to the parameters of a procedure.

2.3. Errors in the code The proposed approach addresses errors affecting the code of instructions no matter whether these are stored in memory, in cache, in the processor instruction register, or elsewhere. Several processors have built-in Error Detection Mechanisms (EDMs) able to detect part of these errors, e.g., by activating Illegal Instruction Exception procedures. Other faults can be detected by software checks (implementing non-systematic additional EDMs) introduced by the programmer. We propose a set of transformation rules to make the code able to detect most of the faults not detected by the other EDMs (if any) existing in the system. A representation of the possible transformations caused by errors is reported in Fig. 3 in which arrows represent a transformation of a statement into a different one. For the purpose of this paper, statements can be divided in two types: • Type S1: statements affecting data, only (e.g., assignments, arithmetic expression computations, etc.) • Type S2: statements affecting the control flow (e.g., tests, loops, procedure calls and returns, etc.). On the other side, errors affecting the code can be divided in two types, depending on the way they transform the statement whose code is modified: • Type E1: errors changing the operation to be performed by the statement, without changing the code control flow • Type E2: errors changing the control flow. E1 errors transform a S1 statement into a statement of the same class (e.g., by changing an add operation into a sub). They are represented by upper left arrow in Fig. 3. Errors transforming a S2 statement into another S2 statement (e.g., by transforming a jump operation into a conditional jump), transforming a S1 statement into a S2 statement (e.g., by transforming an add operation into a jump) or vice versa are all E2 errors. 2.3.1. E1 errors affecting S1 statements As far as errors of type E1 affecting the statements of type S1 are considered, they are automatically detected by simply applying the transformation rules introduced above for errors affecting data. For example, if we consider a statement executing an addition between two operands, Rule #2 and #3 also guarantee the detection of any error of

- 4 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

type E1 that transforms the addition into another operation. Statements affecting execution flow

Statements affecting Data

E1

E2

E2

S1

S2 E2

Fig. 3: Classification of the effects of the errors. 2.3.2. E2 errors affecting S1 statements When an error of type E2 affects a statement of type S1 (e.g., the error transforms an addition operation into a jump), the proposed solution is based on tracking the control flow, trying to detect differences with respect to the correct behavior. This task is performed by first identifying all the basic blocks composing the code. A basic block is a sequence of statements that are always indivisibly executed (i.e., they are branch-free). The following rule #4 is then introduced, in order to check whether all the statements in every basic block are executed in sequence: • an integer value ki is associated with every basic block i in the code • a global control check flag (ecf) variable is defined; a statement assigning to ecf the value of ki is introduced at the very beginning of every basic block i; a test on the value of ecf is also introduced at the end of the basic block. The aim of the above rule is to check whether any error happened, whose effect is to modify the correct control flow, and to introduce a jump to an incorrect target address. An example of this situation is an error modifying the field containing the target address in a jump instruction. As a further example, consider an error that changes an ALU instruction (e.g., an add) into a branch one: if the instruction format includes an immediate field, this may possibly be interpreted as a target address. Unfortunately, the above rules have an incomplete detection capability: there are some faults, which can not be detected by the proposed rules, e.g., any error producing a jump to the first assembly instruction of a basic block (the one assigning to ecf the value corresponding to the block). Figure 4 provides an example of application of rule #4. 2.3.3. Errors affecting S2 statements When errors affecting S2 statements are considered, the

issue is how to verify that the correct control flow is followed. In order to detect errors affecting a test statement, we introduce the following rule #5: • For every test statement the test is repeated at the beginning of the target basic block of both the true and (possible) false clause. If the two versions of the test (the original and the newly introduced) produce different results, an error is signaled. Figure 5 provides an example of application of the above rule. In order to simplify the presentation of each rule, we do not consider in the examples the combined application of different rules: as an example, in Figure 5 we did not apply Rule #1 and #2 to the variable named condition, which should be duplicated and checked for consistency after the test. Modified Code /* beginning of basic block */ … /* basic block end */

Original Code /* beginning of basic block #371*/ ecf = 371; … if (ecf != 371) error(); /* basic block #371 end */

Fig. 4: Example of code transformation for E2 errors affecting S1 statements. Original code if (condition) {/* Block A */ … } else {/* Block B */ … }

Modified Code if (condition) {/* Block A */ if(!condition) error(); … } else {/* Block B */ if(condition) error(); … }

Fig. 5: Code transformation for a test statement. The code modification for the other S2 statements can be obtained starting from the solution proposed for the test statement. Special attention has to be devoted to procedure call and return statements. In order to detect possible errors affecting these statements, we devised the following rule #6: • an integer value kj is associated with any procedure j in the code

- 5 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

• immediately before every return statement of the procedure, the value kj is assigned to ecf; a test on the value of ecf is also introduced after any call to the procedure. Fig. 6 shows the code modification for the procedure call and return statements. As for the previous Figure, we just applied Rule #6 to the considered piece of code, ignoring the other previously defined rules. Rules #6 allows the detection of a number of errors, including the following ones: • Errors causing a jump into the procedure code • Errors causing a jump to the statement following the call statement • Errors affecting the target address of the call instruction • Errors affecting the register (or stack location) storing the procedure return address. Original code … ret= my_proc(a); /* procedure call */ … /* procedure definition */ int my_proc (int a) {/* procedure body */ … return(0); }

Modified Code … /*call of procedure #790 */ ret = my_proc(a); if( ecf != 790) error(); … /* procedure definition */ int my_proc (int a) { /* procedure body */ … ecf = 790; return (0); }

Fig. 6: Code transformation for the procedure call and return statements (transformations for parameter passing are not shown).

3. The translator tool An automatic tool, named ThOR (Translator for Reliable Software), has been implemented in order to transform safety-critical source code into a safe one by implementing the transformation rules described in the previous section. Thor is essentially a source-to-source compiler: it reads a C source module and generates a functionally equivalent C source module, which implements the hardening rules. ThOR falls in the broad category Program Comprehension Tools (PCT), i.e., tools able to build a representation of a program and then to perform some operation on it. Usually such operations are properties verification, generation of related code and code

restructuring [15]. The rules implemented in ThOR operate at source level, thus a custom intermediate representation has been developed which records the high-level information, only. Such an approach abstracts the machine level details, making the tool suitable for any specific hardware architecture. The overall software architecture of the ThOR tool is composed of two modules: a Front-end and a Redundancy Engine. The purpose of the Front-end is to translate a source C code into a suitable internal representation, which can be processed by the Redundancy Engine in order to generate hardened C code according to the rules of Section 2. The separation of the two modules gives several benefits: the tool is better structured and thus more robust and maintainable, the input and output languages are virtually independent and, finally, new rules can be added without modifying the front-end module. The Front-end has been developed by means of the standard free-software compiler construction tools Bison and Flex. Syntax and lexical analysis are based upon a slightly modified version of the C grammar and lexicon by Jeff Lee [16]. The ThOR tool amounts to about 8,300 lines of standard C code. An important element in source analysis tools is the intermediate representation (IR) and many possible IR has been proposed, particularly in the reverse engineering literature, for example [19]. A driving factor in the selection of the IR was the fact that the transformation rules presented in the previous section require mostly high level information. We thus adopted an IR based on the Abstract Syntax Trees [17] technique. Such a solution is widely known in the compiler community and lends itself to easy implementation of transformations. The Redundancy Engine module takes the intermediate representation as its input, applies the transformation rules on it and generates the safe code as its output. ThOR can be customized by the user, which can specify the set of rules to be applied by the engine. The task performed by the redundancy engine can be divided into two different steps: • Transformation into canonical form, • Application of hardening rules. The former phase applies some preliminary transformations to the internal representation, their purpose is to obtain code in a suitable form for the second phase. Each transformation in each phase has been accurately designed to be applied as independently from the other as possible. Such choice makes it is possible to selectively apply the hardening rules and ease the task of verifying that the semantics of the code is preserved along the transformation. In the following we will illustrate the operations

- 6 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

performed by the tool in a particular case. We will deal with function calls; it is a complex case. In this case the transformation into canonical form is called function call fixing. The application of the data duplication rule (Rule #1) requires the function prototypes to be transformed in the following canonical form: function(parameter, &result) The function call fixing step makes all functions compliant with the above form, by turning the return value of each function into an output parameter. Once a function is in the canonical form, it can be seamlessly translated into the following hardened form: function(parameter1, parameter2, &result1, &result2) where all parameters have been duplicated according to Rule #1. Function call fixing is made up of two steps, which will be examined in details in the next subsections. Each step preserves the consistency of the program being processed and its original semantics.

caller’s responsibility to allocate space for the variable and callee’s responsibility to store the result value into that location. An example of the transformations performed in the second step is shown in the following figure.

3.1. Function Call Fixing: step 1

The purpose of the experimental evaluation of the transformation rules is twofold. On the one hand, we intend to measure the overhead introduced by code hardening rules. We thus adopted code and data segments size increase as measures of the space overhead and program execution time increase as a performance loss indicator. On the other hand, we are interested in measuring the level of protection against transient errors that the approach guarantees; we measured this property as the number of faults a hardened program is able to detect versus the ones detected by its un-hardened version. A fault is detected when it either triggers an error detection mechanism the processor already embeds (e.g., it produces an invalid instruction or a division by zero interrupt) or it triggers one of the software error detection mechanisms our approach provides. We do point out that in our experiments we concentrate in transient errors induced in the system by the environment. In particular, due to its practical relevance, we focused on a particular transient error type, called upset or bit-flip, which results in the modification of the content of a storage cell. To perform the experiments, the translator described in Section 3 was used to harden a set of simple C benchmark programs. Then, a set of Fault Injection experiments was conducted.

The purpose of the first step is to isolate the function call, which possibly appears inside a complex expression. Therefore, for each call a new temporary variable is defined, which will hold the return value of the function; then the function calls are replaced by the associated temporary variables in the initial expression. An example of the transformations performed in the first step is shown in the following figure. Initial source code int f(int i) {...}

Output of Step 1 int f(int i) {...}

void main(void){ int i, j ; int tmp; i = j + f(1); tmp = f(1), i = j + tmp; } } Fig. 7: Example of Function Call Fixing, Step 1 void main(void){ int i, j ;

3.2. Function Call Fixing: step 2 In the second step both the prototypes of the function and the related function calls are modified. The return value of the function is turned into an additional output parameter of the function. Output parameters are implemented in a standard way, in the C programming language, by means of pointers. A pointer is passed to the variable that will hold the result on function return; it is

Code resulting from Step 1 int f(int i) {...}

Output of Function Call fixing void f(int i, int& result) {...}

void main(void){ void main(void){ int i, j ; int i, j ; int tmp; int tmp; tmp = f(1), f(1, &tmp), i = j + tmp; i = j + tmp; } } Fig. 8: Example of Function Call Fixing, Step 2

4. Experimental evaluation of the transformation rules

4.1. The Fault Injection environment To assess the effectiveness of the adopted software fault tolerance technique in terms of fault detection capabilities, we performed a set of Fault Injection

- 7 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

Matrix Quicksort LU decomposition FFT Average

A source-to-source compiler for generating dependable software

Tab. 1: Effects of ThOR transformations. Code Segment Data Segment Executable Code size increase size increase size increase 5.42 2.05 2.48 5.03 2.07 4.60 5.12 2.10 4.72 4.86 2.09 3.78 5.10 2.07 3.89

experiments on a commercial M68KIDP Motorola board hosting a M68040 microprocessor with a 25Mhz frequency clock, 2 Mbytes of RAM memory, and some peripheral devices. The Fault Injection environment is based on an external board (Fault Injection Support Board, FISB) [18], connected to the target system bus and able to count the number of executed instructions by monitoring the values on the processor status pins. In this way, FISB is able to trigger an interrupt procedure at a given point in the program execution (in terms of number of executed instructions): the activated procedure can be exploited to inject a fault, observe a value in the system, trigger a timeout condition, etc. Running concurrently with the target processor, the board allows obtaining nearly null intrusiveness. In particular, the execution speed of the target system is not slowed-down during Fault Injection experiments. The only intrusiveness stems from the interrupt procedure in charge of injecting faults and from the code in charge of observing the system behavior once the fault has been injected. Every fault is classified on the basis of its effect on the program, according to the following categories: Effect-less: the injected fault does not affect the program output behavior or the fault is detected by one of the proposed rules Detection: either the error is detected by the program or it produces an illegal operation (e.g., illegal jumps in the code program) or it causes the program not to reach the end of the execution (e.g., because it entered an endless loop) Wrong answer: the error is not detected and the result is different from the expected one. For the purpose of our experiments, the following benchmark programs have been adopted: • Matrix: multiplication of two matrices composed of 10x10 integer values • Quicksort: an implementation of the well-known sort algorithm, running on a vector of 10 integer elements • LU decomposition: the Gaussian elimination method of factoring a matrix to solve the linear system Ax = b, with a matrix A of 6x6 integer

Performance slow-down 4.56 3.65 3.22 3.05 3.62

values • Fast Fourier Transform: the algorithm adopted in digital signal processing to increase the computing efficiency for obtaining large discrete Fourier transform, applied to an 8-point FFT network.

4.2. Analysis of the overheads Tab. 1 reports, for all the benchmarks, the ratio between the code segment, the data segment and the executable code sizes (in terms of bytes) before and after the application of ThOR. The adopted compiler is the SingleStep™ 7.4 by SDS Inc. for a Motorola 68040 processor. The average increase in the size of the segment code is 5.1, the average increase in the size of the data segment is 2.07 and the average increase in the overall executable code is 3.9. Table 1 also reports the effects of the transformations on the program execution speed. An average slow-down of about 3.6 times is observed. Some consideration should be done about this performance decay. A simple duplication of operations should give as a result a slow-down factor of about 2 because each operation has to be performed twice. Observing the code resulting from the application of the rules we can see that a lot of code is added, mostly dealing with consistency checks. As an example the code redundancy rule dictates the addition of an “if” statement after each expression. Such structure of the hardened code introduces two main performance penalties: • the amount of code to be executed is far more than the double of the original; • a high number of conditional jumps are inserted drastically lowering the efficiency of modern processor’s pipeline and speculative execution strategies.

4.3. Analysis of the fault detection capability Tables 2 and 3 report results assessing the fault detection capabilities of our approach. The reader should note that the number of faults injected in each session was 2,000 for the original version of the programs. Conversely, in the modified version we injected a number of faults obtained by multiplying 2,000 by a factor measuring the

- 8 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano

A source-to-source compiler for generating dependable software

Tab. 2: Fault Injection results in the data area. Programs Total Effect-less Detected Matrix Multiplication

Wrong Answer

Original

2,000

1,964

7

29

ThOR

4,100

17

4,082

1

Original

2,000

1,501

92

407

ThOR

4,140

1,659

2,479

2

Original

2,000

1,752

124

124

ThOR

4,200

1,470

2,714

16

Original

2,000

1,653

11

336

ThOR

4,120

2,205

1,870

43

Quick sort

LU decomposition

FFT

Tab. 3: Fault Injection results for faults in the code area. Programs Total Effect-less Detected Wrong Answer Matrix Multiplication

Original

2,000

1,160

588

252

10,840

4,083

6,732

25

2,000

817

766

417

10,260

3,609

6,624

27

2,000

1,119

779

102

10,240

3,586

6,615

39

Original

2,000

1,203

563

234

ThOR

9,720

3,783

5,971

53

ThOR Original

Quick sort

ThOR LU decomposition

Original ThOR

FFT

memory size increase, thus accounting for the higher probability that a fault affects the memory. When analyzing the results of tables 2 and 3, the reader should mainly look at the last column, reporting the number of faults causing the system to produce a Fail Silent Violation, i.e., to complete the execution producing a wrong answer. The percentage of faults leading the un-hardened programs to wrong answers is close to 10%, while it is about 0.5% for the hardened ones. Our method is thus able to guarantee nearly complete fault coverage with a limited overhead both in terms of memory and performance. As far as Fail Silent faults are considered, we observed that for the un-hardened programs most of the injected faults belong to this category. Conversely, the percentage of faults not affecting the program behavior is close to 50% for the hardened benchmarks. Being the un-hardened programs quite simple, we have a high probability of injecting faults corrupting memory locations no longer used. Conversely, being the memory occupation of the hardened program larger, faults are likely to affect

memory location still in use and thus software detection mechanisms may be triggered.

5. Conclusions In this paper, a SIHFT methodology was described and a software tool supporting it was presented. The method is based on a set of transformation rules which can be automatically applied to a high-level source code and which introduce data and code duplication. An extensive experimental study on the effectiveness of the proposed approach was performed, showing the costs and benefits stemming from the application of our method. We are currently working on three issues not yet covered in this paper: • Reduction of the space/time overheads by reducing the number of replicated variables as well as the number of additional control statements. • Definition of new rules allowing Fault Tolerance by detecting and correcting error induced by transient faults.

- 9 /10 -

Rebaudengo, Sonza Reorda, Violante, Torchiano



A source-to-source compiler for generating dependable software

Analysis of the state-of-the art in order to include in our methodology already existing techniques that are highly optimized for particular data structures (i.e., matrices).

[11]

6. References [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

M. Nicolaidis, “Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies”, IEEE VLSI Test Symposium, 1999, pp. 86-94. L. Anghel, M. Nicolaidis, “Cost Reduction and Evolution of a Temporary Faults Detecting Techniques”, IEEE Design Automation and Test in Europe, 2000, pp. 591-598. A. Avizienis, J. P. J. Kelly, “Fault Tolerance by Design Diversity: Concepts and Experiments”, IEEE Computer, Aug. 1984, pp. 67-80. A. Avizienis, “The N-Version Approach to FaultTolerant Software,” IEEE Trans. On Software Engineering, Vol. 11, No. 12, Dec. 1985, pp. 1491-1501 B. Randell, “System Structure for Software Fault Tolerant,” IEEE Trans. On Software Engineering, Vol. 1, No. 2, Jun. 1975, pp. 220-232. P. Shirivani, N. Saxena, E. J. McCluskey, “SoftwareImplemented EDAC Protection Against SEUs”, IEEE Transactions on Reliability, Special Issue on FaultTolerant VLSI Systems, June 2000. K. H. Huang, J. A. Abraham, “Algorithm-Based Fault Tolerance for Matrix Operations”, IEEE Transaction on Computers, vol. 33, Dec 1984, pp. 518-528. Z. Alkhalifa, V.S.S. Nair, N. Krishnamurthy, J.A. Abraham, “Design and Evaluation of System-level Checks for On-line Control Flow Error Detection,” IEEE Trans. On Parallel and Distributed Systems, Vol. 10, No. 6, Jun. 1999, pp. 627-641. S.S. Yau, F.-C. Chen, “An Approach to Concurrent Control Flow Checking,” IEEE Trans. On Software Engineering, Vol. 6, No. 2, March 1980, pp. 126-137. N. Oh, S. Mitra, E. J. McCluskey, “Error Detection by

[12]

[13]

[14]

[15]

[16] [17] [18]

[19]

- 10 /10 -

Diverse Data and Duplicated Instructions”, Center for Reliable Computing, Technical Report, available at the following URL: http://wwwcrc.stanford.edu/users/ejm/trs.html M. Rebaudengo, M. Sonza Reorda, M. Torchiano, M. Violante, “Soft-error Detection through Software FaultTolerance techniques”, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 1999, pp. 210-218. M. Rebaudengo, M. Sonza Reorda, M. Violante, P. Cheynet, B. Nicolescu, R. Velazco, “Evaluating the effectiveness of a Software Fault-Tolerance technique on RISCand CISC-based architectures”, IEEE International On-Line Lest Workshop, 2000. P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda, M. Violante, “Experimentally evaluating an automatic approach for generating safetycritical software with respect to transient errors”, IEEE Transactions on Nuclear Science, Vol. 47, No. 6, December 2000, pp. 2231-2236. A. Benso, S. Chiusano, P. Prinetto, L. Tagliaferro, “A C/C++ Source-to-Source Compiler for Dependable Applications”, International Conference on Dependable Systems and Networks, June 2000. B. di Martino, C. W. Kessler, “Two Program Comprehension Tools for Automatic Parallelization”, IEEE Concurrency, Vol. 8, No. 1, January-March 2000 ftp://iecc.com/pub/file/c-grammar.gz A. Aho, R. Sethi, J. Ullman, “Compilers - Principles, Techniques and Tools”, Adison-Wesley, 1986. A. Benso, M. Rebaudengo, M. Sonza Reorda, P.L. Civera, “An Integrated HW and SW Fault Injection Environment for Real-Time Systems”, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 1998, pp. 117-122. R. Koschke, J.-F. Girard and M. Würthner. “An Intermediate Representation for Reverse Engineering Analyses”, Proc. of the Working Conference on Reverse Engineering, 1998.

Suggest Documents