Soft Error Detection Technique in Multi-threaded Architectures ...

1 downloads 0 Views 402KB Size Report
Abstract—This paper presents a software-based error detection technique through monitoring flow of the programs in multi- threaded architectures.
2011 14th Euromicro Conference on Digital System Design

Soft Error Detection Technique in Multi-threaded Architectures Using Control-Flow Monitoring Mohammad Maghsoudloo, Hamid R. Zarandi, Saadat Pour Mozafari, Navid Khoshavi Department of Computer Engineering and Information Technology Amirkabir University of Technology (Tehran Polytechnic) Tehran, Iran {m.maghsoudloo, h_zarandi, saadat, navid.khoshavi}@aut.ac.ir basic block which is a branch-free group of instructions terminated by a branch. An inter node CFE is an illegal movement between two different basic blocks (nodes), or an illegal movement from a basic block to an unused spaces of the memory which is called partition block. Several Control-Flow Checking (CFC) methods have been designed for detecting each type of the CFEs in a program code. From implementation point of view, methods are either software-based or hardware-based. Softwarebased techniques make use of redundant software and are based on signature assignment to each basic block. Signatures are calculated at runtime and then compared with the original ones which were calculated at design time [4]. On the other hand, hardware-based methods use an extra hardware such as watchdog processor instead of software to monitor state and performance of the master processor [5]. In the general-purpose computing market, low cost and a quick time to market are most important. The design and implementation of new redundant hardware is costly and may not be feasible in cost-sensitive markets. Moreover, the inclusion of redundant design elements may negatively impact the design and product cycles of systems and also area- and power- efficiency of the new and modern processor [1]. Therefore, making use of redundant software in current architectures is a more convenient option than redundant hardware. However, there are three serious problems in order to apply previous software CFC techniques on modern processors:

Abstract—This paper presents a software-based error detection technique through monitoring flow of the programs in multithreaded architectures. This technique is based on the analysis of two key ideas: 1) Modifying the structure of traditional controlflow graphs used by control-flow checking methods so that they can be applied on multi-core and multi-threaded architectures. These achievements in designing control-flow error detectors lead to increase their applicability in current architectures. 2) Adjusting the locations of additional checking assertions in a given program in order to increase the ability of detecting possible control-flow errors along with significant reduction in overheads. The experimental results, through taking into account both detection coverage and overheads, demonstrate that on average about 94% of the control-flow errors can be detected by the proposed technique, more efficient compared to previous works. Keywords: On-line error detection, control-flow error, error detection coverage, multi-threaded processor, multi-threaded programs.

I.

INTRODUCTION

In recent decades, performance of the microprocessors has been improved increasingly. This enhancement has been obtained due to two reasons: First, rising in the degree of parallel code execution through spreading multiple threads of a process across the various execution windows can outperform traditional single-threaded ones [1]. Second, smaller transistors with low threshold voltages and tighter noise margins enabled by modern fabrication technologies cause transistors become much faster [1]. However, some issues such as decreasing the supply voltage levels and increasing clock frequency lead to more susceptible microprocessors to hardware transient faults (also called soft errors) [2]. One of the major threats in modern microprocessors is soft errors which are induced by energetic particle strikes, such as high-energy neutrons from cosmic rays, and alpha particles from decaying radioactive impurities in packaging and interconnect materials [2]. Soft errors in a computer system can manifest itself in three different categories: benign, data and control-flow errors [3]. It has been shown that, considerable fraction of them, between 33% and 77%, reflects control-flow errors. According to their incidence, the ability of dealing with them is obviously important and necessary, especially for safety-critical applications. A CFE can affect correct execution of the programs as the sequence of instructions [3]. CFEs are divided into two types: intra node and inter node. An intra node CFE is an illegal movement within a 978-0-7695-4494-6/11 $26.00 © 2011 IEEE DOI 10.1109/DSD.2011.104

A. High Overheads and Low Efficiency Unfortunately, software-based methods are potential of imposing high memory and performance overheads on the systems that would undermine the obtained benefits of the modern processors. Also, Most of them have focused just on one specific type of CFEs (intra-node or inter-node), and their reported results related to detection coverage and method efficiency are based on this primary assumption [4]. However, ignoring the effects of each type of CFEs directly leads to reduce the effectiveness of their techniques. B. False Positive CFEs Traditional assumptions in design of the Control-Flow Graphs (CFGs) are preventing them from implementing in multi-threaded programs running on multi- core or multi threaded processors, because switches and dependencies among the threads of processes are not considered in the structures of the conventional CFGs. As Fig. 1 shows,

789

ignoring dependencies among running threads in phase of designing the CFGs causes that interactions and switches among threads to be considered as CFEs. Moreover, using one shared global variable or register for storing the value of the signatures is another reason of wrong CFE detection, since the running threads can access and alter the value of the signature register, simultaneously and without any limitation. These weaknesses in detection is known as false positive CFEs, when a technique detects some behaviors in execution as CFEs, while in reality they are not. C. False Negative CFEs In addition to the problems observed due to the false positive CFEs, there are other serious problems related to disability of the traditional techniques for detecting new types of inter-thread inter-node CFEs, types of the errors observed in current architectures have arisen because of using new hardware and software components for threads and processes scheduling and management. Finally, this kind of weaknesses in CFE detection is known as false negative CFEs, when a technique is not able to detect some types of real CFEs. Therefore, regarding to the importance of handling any types of the CFEs and also inefficiency of the previous techniques to be utilized in modern processors, a software technique for automatic CFEs detection in multithreaded architectures is presented by this paper. First, a modern method to extract CFGs from program code, a prerequisite step in all of the CFC methods, is proposed with considering criteria of multi-threaded programming. Second, through making some modifications in the locations of added instructions in the programs, an efficient algorithm is implemented to detect both types of CFEs (intra-node and inter-node) with negligible overheads. II.

Figure 1. Effects of context switching between two threads

block before which the join method is called (B3,1). Moreover, a master (parent) thread can suspend the executions of its slave (child) threads, and the hanged threads cannot continue their execution until the parent notifies them. So, regarding to Fig. 2 (b), in phase of designing the CFGs, the basic block of the master thread in which the notification signal is sent to suspended threads (B3,1), should be considered as a prerequisite for the basic blocks of the slave threads in which their executions can resume (B2,2). 2) Communication dependencies: during execution, two or more threads of a same process can communicate with each other through exchanging the data of shared variables. Most of the programming languages and multi-threaded programming’s standards use shared memory to support communications among threads. In the general case, only one thread at a time can execute a given piece of code (critical section) that presumably alters some global data, or reads from or writes to a device, and other threads want to execute the same critical section should wait until the first thread finishes. To apply the effects of this type of dependency in the corresponding CFG, first, the basic blocks including critical sections should be specified as critical blocks, and then, the critical block of the winner thread should be considered as a prerequisite basic block for the critical blocks of the looser ones. So, to capture the communication dependency of two or more basic blocks in different threads (for example between nodes B2,2 and B3,1 in Fig. 2 (b)), some special kind of bidirectional dependence edges are defined. Consequently, the critical block of the winner thread is always identified as a prerequisite for the critical blocks of the looser ones, even if the winner is changed in the next competitions during the execution.

THE PROPOSED TECHNIQUE

A. Integrated CFGs (ICFGs) As mentioned in the previous section, any incorrectness and limitation in capturing the control dependencies among nodes of the CFG causes that the flow of a given program will not be followed in checking phase, precisely. The Integrated CFG (ICFG) of a multi-threaded program is an edge-classified graph which consists of a collection of separated CFGs each representing the execution flow of single threads, and some special kinds of dependence edges to model interactions among different threads. These interactions can be classified into two categories [6]: 1) Synchronization dependencies: during execution, a thread can create more slave threads for performing some subtasks. Furthermore, each slave thread is also able to create more threads. Therefore, as Fig. 2 (a) illustrates, the basic block of the main thread, in which the slave one is created (B1,1), should be considered as a prerequisite for the first basic block of the slave one (B1,2), when designing the CFG of the program is desired. Moreover, a thread can join with other threads, if needs the results which will be provided by them. Therefore, the execution of the thread calling join method may proceed only after the target threads terminate. Therefore, as Fig. 2 (a) shows, in corresponding CFG, the last basic block of the threads, with which the calling thread has joint (B2,2), should be considered as a prerequisite for the calling thread’s basic

B. Adjusting the Locations of the Added Instructions Adding instructions at the beginning and at the end of the basic blocks to check and update the signatures is the principal reason to reach the highest coverage for detecting the inter-node CFEs. In order to detect a number of intranode CFEs along with inter-node ones, the prior methods proposed that, if more coverage needs to be obtained, a node can be divided into sub-nodes, thus detecting intra-node CFEs which now become inter-node ones [4]. Using this idea will lead to enhanced coverage. On the other hand, it

790

average number of the instructions in each basic block. It is considered instead of the numbers of the instructions in each basic block of a program, because these values vary among each basic block in any program. Moreover, regarding to Fig. 3, all of the intra-node CFEs which have occurred from Region 1 to Region 2 and vice versa in the first basic block, and from Region 3 to Region 4 and vice versa in the second basic blocks can be detected by the proposed technique. So, the number of all new detectable intra-node CFEs is calculated as bellow: #

4

1

2,

(2)

where n1 is the average number of instructions in first segment of the basic blocks (Region 1 and Region 3 in Fig. 3), and n2 is the average number of instructions in second segment of the basic block (Region 2 and Region 4 in Fig. 3). Since sum of n1 and n2 is constant (n+2), result of multiplication of n1 and n2 would be maximized when the values of n1 and n2 are equal. Therefore, the best location for inserting the added instructions is in the middle of each basic block. So,

(a)

#

4

2 /2 .

(3)

Obviously, the number of new detectable CFEs, computed by (3), is more than the number of new undetectable CFEs, obtained by (1). Therefore, it seems that the proposed technique is a reasonable alternative solution to conventional methods for determining the locations of the added instructions in the programs. III.

EXPERIMENTAL RESULTS

In order to evaluate the proposed techniques, a functional simulation infra-structure called SIMICS [7] has been used

(b) Figure. 2. Dependencies between two threads

will also impose more significant performance and memory overheads. This paper proposes that, if only one set of checking and updating instructions is inserted into the middle of the basic blocks (instead of two sets at the beginning and at the end), the detection coverage of the CFEs including intra-node and inter-node CFEs will be improved along with decreasing the overhead. The analytical measurements can also prove this claim. Regarding to Fig. 3, all of the inter-node CFEs can be detected by the proposed technique except the ones which have occurred between the second region and the first region of two different consecutive basic blocks, for example the inter-node CFEs which have occurred between Region 2 and Region 3 in Fig. 3, while previous techniques can detect these types of inter-node CFEs. Therefore, the number of new undetectable inter-node CFEs in two consecutive basic blocks is: #

2

2 /2 ,

(1)

because, the new undetectable CFEs are likely to occur from each instruction in Region 2 of the first basic block to all of the other instructions resided in Region 3 of the second basic block (1 to n+2) and vice versa. The variable n refers to the

Figure. 3. The structure of the basic blocks in the proposed method

791

the level of the efficiency among the related methods. According to that, the efficiency of the proposed technique is more than the efficiency of the conventional techniques for multi-threaded programs, especially when number of the running threads grows.

as the simulation environment. The behaviors of six wellknown benchmarks, Single-threaded Matrix Multiplication (SMM), Single-threaded Quick Sort (SQS), Single-threaded Linked List, Multi-threaded Matrix Multiplication (MMM), Multi-threaded Quick Sort (MQS), and Multi-threaded Linked List (MLL) have been studied on a simple quad-core processor simulated in SPARC V9 ISA with a real operating system. Also, in order to implement the real behaviors of soft errors which lead to program sequence changes, a software function was written as a saboteur thread that manipulates the content of some registers (as like as the effects of the bitflips and stuck-at fault models) in a random fashion to produce the traditional models (branch deletion, branch insertion, and branch operand changes) and new models (illegal switching and branches between two threads) of the CFEs. Table 1 compares the related techniques in terms of four parameters. The first one is the percentage of False Negative CFEs (FNC) or undetected CFEs. Regarding to its results, the error detection coverage of the proposed technique for multi-threaded programs is reduced only about 2.7%. On the other hand, the detection coverage of the traditional techniques for multi-threaded programs is decreased noticeably (on average 12%). The second parameter is the number of the False Positive CFEs (FPC), the normal behaviors of the program wrongly detected as the CFEs. The third factor called Memory overhead (M.O.) showing amount of memory imposed to the program by the methods, and the forth parameter is Performance overhead (P.O.) that reveals the amount of performance degradation due to methods’ operation. Finally, in order to give a general comparison among all the methods (as similar as some previous works [4], [8]), which also takes into account all of the impressive parameters, a metric called Method Efficiency is defined to estimate the efficiency of the methods: %

IV.

In this paper, an efficient software-based technique was proposed in order to detect control-flow errors in multithreaded processors. The first goal is to enhance the ability of the techniques in order to be employed in multi-core or multi-threaded processors. Concentrating on detecting both intra- and inter-node control-flow errors is the second goal to achieve high detection coverage along with significant reduction in any kind of the imposed overheads. A metric for estimating and comparing the efficiency of the methods was defined, and it was shown that the proposed techniques are more efficient in compare to conventional methods. REFERENCES [1] G. A. Reis, J. Chang, N. Vachharajani, R. Rangan and D. I. August, [2]

[3]

[4] [5]

[6]

100, (4)

[7]

where overall cost is calculated depending on the type of the applications on which the techniques are applied:

[8]

Overall Cost = α × Performance Cost + β × Memory Cost. (5)

“SWIFT: Software Implemented Fault Tolerance,” 3rd International Symposium on Code Generation and Optimization, pp. 243-254, 2005. N. Aggarwal, P. Ranganathan, N. P. Jouppi and J. E. Smith, “Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors,” 34th Annual International Symposium on Computer Architecture, pp. 340-347, 2007. U. Gunnejlo, J. Karlsson and J. Torin, “ Evaluation of Error Detection Achemes Using Fault Injection by Heavy-Ion Radiation,” 19th International Symposium on Fault Tolerant Computing, pp. 340-347, 1989. R. Vemu, S. Gurumurthy and J. A. Abraham, “ACCE: Automatic Correction of Control-flow Errors,” IEEE International Test Conference, October, pp. 1-10, 2007. A. Rajabzadeh and S. G. Miremadi, "CFCET: A Hardware-Based Control Flow Checking Technique in COTS Processors Using Execution Tracing," Elsevier Journal of Microelectronics Reliability, vol. 46, pp. 959-972, 2006. E. D. Berger, T. Yang, T. Liu and G. Novark, “Grace: Safe Multithreaded Programming for C/C++,” 24th SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, pp.81-96, 2009. Functional Full-System Simulation Infra-Structure (SIMICS), http://www.virtutech.com. H. R. Zarandi, M. Maghsoudloo and N. Khoshavi, “Two Efficient Software Techniques to Detect and Correct Control-flow Errors,” 16th IEEE Pacific Rim International Symposium on Dependable Computing, 2010.

Without loss of generality, let α = β = 0.5, and this means that the importance of the performance and memory costs are considered equal in the estimation. Fig. 4 compares Method Efficiency

0.5

TABLE I. COMPARISON AMONG FOUR SOFTWARE-BASED TECHNIQUES IN TERMS OF THE FOUR FACTORS

Methods

Single-Threaded Benchmarks FNC FPC M.O. P.O. (%) (#) (%) (%)

Multi-Threaded Benchmarks FNC FPC M.O. P.O. (%) (#) (%) (%)

CFCSS

8.5

0.0

49.3

43.7

19.7

7.7

42.8

37.1

YACCA+

2.7

0.0

167.1

162.0

16.3

9.7

154.0

153.0

CEDA

2.8

0.0

106.2

93.4

16.1

10.0

94.8

72.9

Proposed Method

4.3

0.0

52.6

35.5

7.0

0.0

57.2

38.6

CONCLUSION

CFCSS

YACCA

CEDA

Proposed Technique

0.4 0.3 0.2 0.1 0 SMM

SQS

SLL

Single-Threaded Benchmarks

MMM

MQS

Figure 4. Comparison of the method efficiency

792

MLL

Multi-Threaded Benchmarks

Suggest Documents