CFCSS without Aliasing for SPARC Architecture

2 downloads 0 Views 362KB Size Report
CFCSS without aliasing is implemented under GCC 4.2.1 for SPARC architecture, and the ... Key words: Aliasing, CFCSS(Control Flow Checking by Software Signatures), COTS(Commercial-Off- ..... Innovation Foundation in HIT, Astronautics.
2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010)

CFCSS without Aliasing for SPARC Architecture Wang Chao* Fu Zhongchuan* Chen Hongsong** Ba Wei* Li Bin* Chen Lin* Zhang Zexu*** Wang Yuying* Cui Gang* [email protected], [email protected] (*Department of Computer Science and Technology, *** Astronautic School, Harbin Institute of Technology, Harbin 150001 **

Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083)

Abstract With the increasing popularity of COTS (commercial off the shelf) components and multi-core processor in space and aviation applications, software fault tolerance becomes attractive to overcome the primary bottleneck of their susceptibility to transient faults. CFCSS (Control Flow Checking by Software Signatures) is one of the most important pure software fault tolerance techniques in mitigating control flow errors in harsh environment. As the most prominent deficiency, aliasing is the research focus of this paper, and a novel algorithm, namely CFCSS without aliasing, is put forward. First and foremost, the cause of aliasing - the existence of branch-fan-in nodes in program control flow graph – is investigated in depth, and the minimal flow graph structure giving birth to aliasing, namely “3-2 structure”, is extracted. The typical “3-2 structure” can be extended to a broader class of flow graph, named “n-(n-1) structure” by this paper, which can not be settled by previous CFCSS algorithms. Second, basing on thorough analysis of the traditional CFCSS algorithm, a method of inserting an additional basic block in program control flow graph is proposed, and the algorithm of CFCSS without aliasing is elaborately designed. The feature of independence of the program flow graph makes this algorithm more general, and in theory any kinds of flow graph structures can be dealt with it, such as “n-(n-1) structure” and other typical flow graphs that are not covered by traditional algorithms. Third, the compilation time of the algorithm is in linear with the number of basic blocks of the program control flow graph. CFCSS without aliasing is implemented under GCC 4.2.1 for SPARC architecture, and the delay slot is supported. By fault injection campaigns carried out for representative integer-dominated benchmarks from MiBench and SPEC CINT2000, the correctness, fault detection capability, and overhead of this algorithm are investigated in great details.

Key words: Aliasing, CFCSS(Control Flow Checking by Software Signatures), COTS(Commercial-OffThe-Shelf), multi-core, transient fault components. In ST-8 (Space Technology) [1], NASA proposed advanced space computing project aiming at constructing the first generation space supercomputer[2] in part by COTS multi-core products, such as CELL[3] , TRIPS[4] , Monarch[5] , and Tile64[6] etc. Above all, the adoption of multi-core processor and COTS components in spaceflight, aviation, and military systems has become an inevitable trend, and software fault tolerance is of vital importance to overcome the bottleneck of their susceptibility.

I. Introduction As technology advances, processor is facing with an inevitable pace of multi-core era. WaveScalar and TRIPS are the leading representatives in multi-core academic researches. Intel, IBM, Sun and other manufacturers are striving to offer COTS (Commercial-Off- The-Shelf) multi-core products. Recently,much attention has been paid on the researches of spaceflight, aviation, and military systems constructed by multi-core and COTS 978-0-7695-4108-2/10 $26.00 © 2010 IEEE DOI 10.1109/CIT.2010.356

2094

destinations, aliasing occurs and CFCSS can not detect this kind of control flow fault [11].

Lately, much work has been done in software fault-tolerance techniques to mitigate control flow errors caused by transient-faults[7]-[10], among which CFCSS is one of the most prospective angles[11]-[15]. But still there exist two cases that can not be managed by traditional CFCSS algorithm - aliasing and some specific control flow graph structures, typically “n-(n-1) structure” – which are the research focuses of this paper. The rest of the paper is organized as follows. Section II introduces aliasing problem of traditional CFCSS algorithm, and the cause is analyzed in detail. A method is proposed in Section III to eliminate aliasing by inserting an extra node in program flow graph, and the algorithm of CFCSS without aliasing is elaborately designed. Section IV describes some implementation intricacies, and the algorithm is evaluated by fault injection experiments in section V. Section VI concludes and prospects the future work.

Figure 1. Aliasing Example Through analysis of a lot of program flow graphs with aliasing problem, a typical flow graph structure is abstracted, namely “3-2 structure” by this paper, as depicted in Fig.2.

II. Aliasing and “n-(n-1)” structure

Figure 2. “3-2 structure” For Vertexes V1 to V5 of “3-2 structure” in the flow graph, taking the Vertex 2 as the reference node, aliasing occurs as follows: V1: s1=001; D1=s2 ⊕s1=011; V2: s2=010; D2=s2 ⊕s2=000; V3: s3=011; D3=s2 ⊕s3=001; V4: s4=100; d4=s2 ⊕s4=110; V5: s5=101; d5=s2 ⊕s5=111; When a transient fault makes the branch instruction at the end of V1 jump to V5, as indicated by the dashed in Fig.2, the fault checking process is as follows: V5: G=G ⊕D=s1⊕D1=001⊕011 = 010; G=G ⊕d = G ⊕ d5 =010⊕111 = 101; G ==s5; Because G equals to s5, so this kind of control flow fault can not be detected by CFCSS. This is the so-called aliasing problem. In fact, the typical “3-2 structure” shown in Fig.1 can be extended to a broader class of flow graph structure, namely “n-(n-1) structure” by us, where n is greater or equal to four. This class of “n-(n-1)” control flow graph structure can not be settled by previous CFCSS algorithms [9][11]. Above all, there exist two cases that can not be

A. Acronyms Vi: Vertex i in program flow graph si: signature of Vi di: signature difference of Vi G: the global signature D: run-time adjusting signature D_flagi: a flag used by the algorithm to avoid the repetitive computation of Di. The signature of Vi, denoted by si, and the signature difference di are calculated and inserted into the fault detection code at compile time. Both the global signature G and run-time adjusting signature D are passed from the predecessor basic block at run time. When the Di of Vi has been calculated, the D_flagi is set to true, else it remains false. This flag is used by the compiler to avoid repetitive computation of Di. B. Aliasing and “n-(n-1)” flow graph structure In traditional CFCSS algorithm, in order to solve the problem of assigning the same signature to multiple predecessors of a branch-fan-in node, a run-time adjusting signature D is introduced. It is just this process that brings up the problem of aliasing. Take an example depicted in Fig.1, because of multiple nodes, for instance V1, V2, and V3, share multiple branch-fan-in nodes V5 and V6 as their

2095

program flow graph aliasing is avoided, whilst by making the algorithm independent of the flow graph the specific flow graph structure problem is solved. And thus the fault detection capability of the CFCSS without aliasing is increased. B. CFCSS without Aliasing Algorithm In the algorithm design, there are three issues to consider: correctness, efficiency, and generality. It is extremely important for this algorithm to be independent of the program flow graph. This makes the algorithm more general, easy-to-implement, and thereby theoretically any kinds of flow graph structures can be handled. The CFCSS Algorithm without aliasing is elaborately designed and described as follows. 1. Assign a unique signature si to node Vi. 2. for each node Vi, i= 1, 2, …, N: 2.1. if Vi is not a branch-fan-in node: di=sp⊕si, supposing the predecessor of Vi is V p. 2.2. else: // Vi is a branch-fan-in node try to select a node Vb whose D_flag equals to 0 in the predecessor set as reference node; 2.2.1 if Vb exists: di=sb ⊕si; D_flagb=1; For each predecessor node Vp of Vi: 2.2.1.1 if D_flagp= =1, then: node Vadd is inserted between Vp and Vi; Vp = Vadd; End if Dp= Sb ⊕Sp; D_flagp=1; End For 2.2.2 else: For each predecessor node Vp of Vi: insert a node between Vp and Vi End For Select one of the inserted nodes as reference and adjust the information of the other added nodes. End if End if End for When a program is compiled, every basic block represented by a node Vi in the program flow graph is identified and assigned a unique signature si. In

dealt with by traditional CFCSS algorithm - aliasing and some specific program flow graph structures, such as “n-(n-1) structure” (n>=4), are the focuses of this paper.

III. CFCSS without Aliasing A. Aliasing Elimination In order to solve the problem of aliasing and specific flow graph structure, an easy-to-implement and cost-effective method is proposed. By inserting an extra basic block into flow graph, the traditional CFCSS algorithm is improved and aliasing is solved. This process is illustrated in Fig.3, supposing s1=001, s2=010, s3=011, s4=100, s5=101.

Figure 3. Aliasing elimination method In contrast with the traditional algorithm, V2 can no longer be selected as the reference node. For the vertex V4, d4=s2⊕s4=110, D2=000, D1=s2⊕s1=011. For the vertex V5, d5=s3⊕s5=110, D3=000. In this case, an extra node V6 must be inserted between V2 and V5, thereby s6=110, d6=s2⊕s6=100, D6= s3⊕s6=101. Now supposing an illegal branch br15 occurs and jumps to the first line of V5. Then Gprev is G1 = s1, and G is updated to G5, G=G5=Gprev⊕D ⊕d5 = G1⊕D1⊕d5 = 100. Therefore, the updated G5 equals to s5 and the illegal branch br15 can be detected by fault detection instructions in V5. Thus, by inserting an extra node in the control flow graph, the aliasing can be avoided. But still there is another case that can not be dealt with by traditional algorithm – some specific flow graph structures in the program, typically “n-(n-1) structure”. Considering with care the algorithm design, it is possible to tackle the specific flow graph structure problem of the traditional CFCSS algorithm by means of making the algorithm independent of the program flow graph. After all, by inserting an extra basic block into the

2096

algorithm step 2.2.1.1 and 2.2.2, an extra node is inserted into the original program flow graph. Note how to insert the fault detection instructions of the extra basic block is of some intricacy. If the node is at the fall through edge of the program control flow graph, the fault detection code is inserted directly. Whereas, if the newly inserted node is the target edge, the fault detection code must be inserted at the end of the function, and at the same time a special branch instruction jumping to the target node must be inserted as well. Above all, the characteristic of independence of program flow graph makes the algorithm more general, cost-efficient and easy-to-implement. Therefore, in theory, any kind of program control flow graph structures can be dealt with it, and the specific flow graph structure problem of the original algorithm is solved.

Like the traditional algorithm, CFCSS without aliasing requires two registers as well, so the same register pressure of the improved algorithm remains unchanged. In SPARC architecture, the local registers L6 and L7 with lower priority are reserved to keep the value of global signature G and run-time adjusting signature D respectively. Register reservation involves macros modifications of both FIXED_REGISTERS and CALL_USED_REGISTERS in MD file. B. Fault Detection Code Insertion The fault detection code insertion is of somewhat intricacy because of delay slot in SPARC architecture. There are two situations described as follows. If the basic block is a branch-fan-in basic block, the following instructions are inserted at the beginning of the basic block. xor %G ,%D, %G xor %G, d, %G cmp %G, S bne %icc, .error or %g0, D, %D If the basic block is a non-branch-fan-in basic block, the following code is inserted at the beginning of the basic block. xor %G, d, %G cmp %G, S bne %icc, .error or %g0, D, %D The error hander is written in SPARC assembly language and it is inserted at the header portion of the asm file as described bellow. .section ".rodata",#alloc,#progbits .align 8 .Lerror: .asciz "Error detected by CFCSS\n" .section ".text" .align 4 .error: sethi %hi(.Lerror),%i5 or %i5,%lo(.Lerror),%o0 call puts nop or %g0,-1,%o0 call exit nop C. GCC Optimizations Leaf-routine optimization does not incur register

C. Discussions 1) Fault Coverage In addition to the control flow faults covered by traditional CFCSS algorithm, the aliasing as well as specific control flow structures difficult to deal with for traditional CFCSS, are covered by our algorithm, and thus the fault coverage is increased. 2) Compilation Time Complexity The algorithm scans each basic block only once to process each edge of it. Thus, the processing time of the compilation algorithm equals to the number of edges in control flow graph, denoted by E which is less or equal to 2*n. The time overhead of di calculation, proportions to the number of basic blocks, described as n. And the time overhead of checking instruction insertion is in direct proportion to the number of edges E in program control flow graph. Above all, the time complexity of the CFCSS without aliasing is linear in number of basic blocks in program control flow graph, denoted by O(n).

IV. Implementation Details In this section we will introduce some implementation details in CFCSS without aliasing, including the registers reservation, fault checking instructions insertions, and GCC optimizations etc. A. Registers Reservation

2097

constitute the type of fault that can not be detected by either CFCSS or operating system, shown as incorrect output undetected. This is the first type of fault that the CFCSS want to prevent because of its fatalness. Detected by operating system fault can be covered by OS, so it is

window switching in SPARC architecture, consequently it should be switched off in CFCSS without aliasing. In fact, the tradeoffs between CFCSS and other optimizations should be investigated further in our future work.

less harmful. Table 2. Fault injection result of benchmarks with

V. Evaluations

improved CFCSS

A. Environmental Setup GCC 4.2.1 is used as our compiler infrastructure, and assembly codes are targeted for SPARC V9 architecture[16]. One of the three types of fault branch deletion, branch creation, or branch operand change - is randomly injected into the assembly code. Fault injection campaigns are carried out under Virtutech Simics 3.0.27[17], a full system simulator, running on a simulated UltraSPARC II processor supporting SPARC V9 ISA. Representative integer-dominated benchmarks from MiBench[18] and SPEC CINT2000[19], such as FFT, Mcf, Stringsearch, Quick sort, Dijkstra, and Gzip are evaluated with ref inputs for 500 iterations each. B. Error Detection Capability The fault injection results of the original program and CFCSS without aliasing are shown respectively in Table.1 and Table.2.

For original programs, an average of 49.9% of the injected branch faults produced undetected incorrect outputs, whilst for CFCSS hardened versions only 10.2% produced undetected incorrect outputs, meaning higher error detection capability. There are four primary cases account for undetected incorrect output. 1) The faulty branch is in the same basic block as its target. In this case, if the branch is in front of its target, incorrect result may produce and fault remains undetected. On the contrary, if the target is in front of the branch, infinite loop may produce and this fault can not be undetected. 2) The target of the faulty branch is the first instruction in its successor basic block. This fault can not be detected by the algorithm, and incorrect result produces. 3) If a conditional branch, being the last instruction of basic block, is deleted, CFCSS cannot deal with this type of fault. 4) If the target of the conditional branch is changed to the first instruction of other successor basic block, this kind of fault can not be detected by CFCSS either. C. Implementation Overhead Table 3 shows the code size and block overhead introduced by CFCSS without aliasing algorithm. The following metrics are listed in table 3: number of instructions of the original program, number of the extra fault checking instructions, and the number of basic blocks and extra basic blocks inserted by this

Table 1. Fault Injection Result of Original Program Stringsearch

FFT

Qsort

Dijkstra

incorrect result

34.2%

40.3%

43.2%

29.8%

execute timeover

15.7%

16.5%

10.6%

8.2%

detected by OS

26.1%

31.1%

16.0%

36.6%

correct result

24.0%

11.1%

30.2%

25.4%

100%

100%

100%

100%

49.9%

56.8%

53.8%

38.0%

total incorrect output undetected

The incorrect result indicates the fraction of faults that cause the program to produce incorrect result going undetected at any level. Timeover denotes an infinite loop fault and must be stopped manually. Segmentation fault and bus error are typical faults of detected by operating system. Correct result means the faults injected have no effect on the final output of the benchmark, indicating the fault tolerance capability of the program itself, namely ‘Y-behavior’ or benign fault at application level. Incorrect result and timeover

2098

Consequently in theory any kinds of flow graph structures can be dealt with it. Third, the compilation time of the algorithm is in linear with the number of basic blocks in program control flow graph. CFCSS without aliasing is implemented under GCC 4.2.1 for SPARC architecture, the delay slot is supported. Then some implementation aspects are introduced in detail. By fault injection campaigns of representative integer-dominated benchmarks from MiBench and SPEC CINT2000, the correctness, fault detection capability, and overhead of algorithm are evaluated. But still there are some other issues need further exploration: 1) The algorithm should be optimized, and the fault detection latency and some other aspects should be investigated further in the future. 2) The portability of pure software fault-tolerance techniques, such as CFCSS and EDDI, makes it difficult to be portable to different platforms. Implementation in compiler machine independent front end can solve this problem[20]. 3) Research from Illinois shown that the classical micro-architecture level stuck-at fault model were inaccurate for capturing the actual system level fault effects, and two new micro-architecture level fault models basing on probability theory are proposed. The exploration and validation of the system-level manifestations of transient fault on SPARC architecture is our future work [21].

algorithm. Instruction overhead is described by the ratio of number of inserted instructions to that of original programs, whilst block overhead is denoted by the ratio of number of inserted basic blocks to that of the original programs. Table 3. Overhead of CFCSS without aliasing

As depicted in table 3, the large basic block size leads the Basicmath to have lower instruction overhead of 20.8%. An average of 39% instruction overhead and 5.6% block overhead are introduced by CFCSS without aliasing algorithm. The representative benchmarks with simple control flow structures, such as FFT, Qsort, and Basicmath, have very low block overhead. The average lower basic block overhead of benchmarks with complicated control flow graph structures means that the aliasing causing situation is relatively rare.

VI. Conclusions and Future Work CFCSS is one of the most important pure software fault tolerance techniques in mitigating control flow errors. Two cases that can not be managed by traditional CFCSS algorithm - aliasing and some specific control flow graph structures – are the research focuses of this paper, and a novel algorithm is put forward. First, the cause of CFCSS aliasing is analyzed and the minimal flow graph structure that leads to aliasing, namely “3-2 structure”, is extracted. Then it is extended to a broader class of flow graph, namely “n-(n-1) structure” by us. Second, a method of inserting an additional basic block in program control flow graph is proposed, and the algorithm of CFCSS without aliasing is elaborately designed. The independence feature of the program flow graph makes the algorithm more general and cost-efficient.

ACKNOWLEDGMENTS We thank all the staff and students of our research group, such as Zhao Shengyi, Zha Bin, Hong Bin, Gao Yang, Ji Shi, Cao Han, Zhu Dongjie, and Wang Yan etc., for their diligence contributions. We also thank all the anonymous reviewers for their sincere support and valuable advice. This work is supported by the National Natural Science Foundation of China under grant No.90818016, Heilongjiang Provincial Natural Science Foundation of China under grant No.F200822, China Postdoctoral Science Foundation under grant No. 20070420868, National Key Laboratory of Science and Technology on Avionics System Integration & Foundation of Avionics Science under grant No.20095577012, Project(HIT.NSRIF. 2009066) Supported by Natural Scientific Research

2099

Innovation Foundation in HIT, Astronautics Innovation Fund CASC200902-4, Project(HIT.KLOF. 2009070) Supported by Key Laboratory Opening Funding of Deep Exploration Landing and Returning Control.

References [1] http://nmp.jpl.nasa.gov/st8/index.html [2] http://nmp.jpl.nasa.gov/st8/tech/eaftc_tech1.html [3] M. Gschwind et al. Synergistic Processing in Cell's Multicore Architecture. IEEE Micro, 2006. pp.10~24. [4]www.cs.utexas.edu/~TRIPS/ [5]Raytheon Technology Today. Issue 2:26, 2006. http://www.raytheon.com/technologytoday/ current/ archive.html [6] http://www.tilera.com/ [7]B. Nicolescu, Y. Savaria, and R. Velazco. Software Detection Mechanisms Providing Full Coverage against Single Bit-Flip Faults. IEEE Transactions on Nuclear Science, Vol. 51, No. 6. Dec. 2004. [8]Nahmsuk Oh. Software Implemented Hardware Fault Tolerance. Stanford University Dissertation. Dec. 2000 [9]Huang Zhenyuan. Research and Implementation of a Kind of SIHFT for Astronautic Applications. Master Thesis of Harbin Institute of Technology. 2006. [10]G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, etal. SWIFT: Software Implemented Fault Tolerance. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 2005. [11]N. Oh, P. P. Shirvani, and E. J. McCluskey. Control-flow checking by software signatures. In IEEE Transactions on Reliability. Vol 51, March 2002. pp.111~122. [12] LI Jian-ming, TAN Qing-ping, XU Jian-jun, JIANG Cheng. Control Flow Detection Based on Path Tracking. Computer Engineering, Vol. 35 No. 20. Oct. 2009. pp.68~70. [13]Yanxia Wu, Guochang Gu, Shaobin Huang, Jun Ni. Control Flow Checking Algorithm using Soft-based intra-/Inter-block Assigned signature. Second International Multisymposium on Computer and Computational Sciences, 2007. pp.412~415 [14]Yan-xia wu, guo-chang gu, ke-hui wang. An improved CFCSS control flow checking algorithm. 2007 [15] Yan-xia wu, guo-chang gu, ke-hui wang.

2100

Power-aware control flow checking compilation using less branches to reduce power dissipation. The Sixth International Conference on Machine Learning and Cybernetics, Hong Kong. August 2007. pp.2986~2989 [16] http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/ [17] Getting Started With simics. http://www.virtutech .com/products/manuals [18] M.Guthaus, J.Ringenberg, D.Ernst, T.Austin, T.Mudge, R.Brown, “MiBench: A Free, Commercially Representative Embedded Benchmark Suite”. Proc. of the 4th IEEE Workshop Workload Characterization. pp.10~22, Dec.2001. [19] http://www.spec.org/cpu2000/CINT2000/ [20] Jing Yu, Mar´ıa Jes ´us Garzar´an, Marc Snir. EsoftCheck: Removal of Non-vital Checks for Fault Tolerance. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2009. [21] Man-Lap Li, Pradeep R., etal. Accurate Microarchitecture-Level Fault Modeling for Studying Hardware Faults. 15th International Symposium on High-Performance Computer Architecture. Feb.2009.