A Hardware Immune System for Benchmark State ... - CiteSeerX

13 downloads 10064 Views 180KB Size Report
for integration into a hardware fault tolerant system. Inspiration is taken from .... what we now call the immune system – a remarkable collection of organs ..... Version 3.0” Technical report, Microelectronics Center of North. Carolina, January ...
A Hardware Immune System for Benchmark State Machine Error Detection Daryl Bradley, Andy Tyrrell Bio-Inspired Engineering Department of Electronics University of York York, UK www.bioinspired.com

Abstract – A novel error detection mechanism is demonstrated for integration into a hardware fault tolerant system. Inspiration is taken from principles of immunology to create a hardware immune system that runs in real-time hardware and continuously monitors a finite state machine architecture for errors. The work is demonstrated through immunisation of the ISCAS’89 benchmark state machine data set. I.

INTRODUCTION

Ensuring the reliability of computing and electronic systems has always been a challenge. As the complexity of systems increases the inclusion of reliability measures becomes progressively more complex and often a necessity for VLSI circuits where a single error could potentially render an entire system useless. Biologically inspired systems have recently begun investigating both evolutionary and developmental approaches to reliable system design in the form of Evolvable Hardware [1] and Embryonics [2]. This paper demonstrates a completely new approach that takes inspiration from the vertebrate immune system to create the beginnings of a complete hardware immune system. Section II introduces the field of reliability engineering, the importance of fault tolerance and the hardware architectures that are being investigated. Section III provides an overview of the vertebrate immune system, discussing its architecture and methods of achieving biological fault tolerance. Section IV discusses biologically inspired methods of fault tolerance and then section V introduces the field of artificial immune systems that we aim to use to develop a hardware immune system. Section VI discusses the architecture of the hardware immune system with the system applied to a set of benchmark circuits in section VII. The paper in concluded in section VIII. II.

operating in remote environment such as space applications, the effect of a single failure could results in a multi-million pound installation being rendered useless. With safety critical systems such as aircraft the effects are even more severe. Reliability techniques need to be implemented in these applications and many more. The development of fault tolerant techniques was driven by the need for ultra-high availability, reduced maintenance costs, and long life applications to ensure systems can continue to function in spite of faults occurring. The implementation of a fault tolerant mechanism requires four stages [3]: 1) 2) 3) 4)

Detection of the error, Confinement of the error, to prevent propagation through the system, Error recovery, to remove the error from the system, Fault treatment and continued system service, to repair and return the system to normal operation.

We deal with the detection of errors in this paper. The digital systems being analysed are modelled as a finite state machine (FSM) with its associated state table description. In principle, any sequential digital system can be modelled as a FSM, or set of interconnecting FSMs such as the example shown in Figure 1 that shows normal states and transitions and samples of those that could potentially lead to a system failure. The FSM is therefore an ideal representation for developing a hardware immune system. Valid state

q1

tq10

te06 e6 Invalid state Invalid transition te45

RELIABILITY ENGINEERING

Reducing the failure probability and increasing reliability has been a goal of electronic systems designers ever since the first components were developed. No matter how much care is taken designing and building an electronic system, sooner or later an individual component will fail. For systems

qo

tq20

tq40

te21

tq12

tq30 q4

tq32 tq34

q2 tq23

e5 q3 Valid transition

Figure 1: Finite state machine representation of system

We concentrate on a finite state machine representation as they are used throughout all stages of a sequential system design, are a source of comparison with reliability engineering research, and are also used in hardware design packages to permit direct instantiation of their design as a complete system netlist. Faults are represented and analysed through the use of fault models at both the gate and functional level within an electronic system [4]. Gate level fault models describe the effect of an error in terms of individual logic gates and their connections. Functional fault models check the entire function of a system at once, under the premise that if the functionality is correct, then the system under test is fault free. The work presented in this paper concentrates on the development a novel functional approach to error detection. By modelling the faults of a sequential circuit through an analysis of the state table (that describes the functionality of the circuit) it is possible to generate tests before the circuit is even implemented, or with a change to the internal architecture and logic design. This is a feature that could be very useful with biologically inspired hardware systems. III.

THE RELIABLE HUMAN BODY

In contrast to the reliability techniques that have been developed for fault tolerant hardware, biology has managed to solve the problem in a remarkably different way. The fabulously complex defence system in vertebrates has evolved over hundreds of millions of years to give rise to what we now call the immune system – a remarkable collection of organs, ducts, and cells comparable in complexity to the body’s nervous system [5]. The immune system is distributed, layered, and ingenious in its methods of protecting the body from invasion by billions of different bacteria and viruses [6]. If one layer is penetrated, another comes into play, presenting the invader with progressively more complex and clever barriers. We concentrate on the acquired component of the immune system, specifically the humoral immune response that protects the body from bacterial infection. Cells of the body and invaders, or antigens, are distinguished as two different entities, one should be there, and one should not. The immune system achieves this through the concept of self/nonself discrimination. Cells of the body define self, anything else nonself. IV.

BIO-INSPIRED FAULT TOLERANCE

The similarities in requirements imposed on reliable hardware systems and those already achieved by the vertebrate immune system were first highlighted by Avizienis [7]: Distributed detection, autonomous operation, diversity, memory, and imperfect detection are all achieved by the vertebrate immune system and ideal for a hardware immune

system. Many features are already applied to reliable system design. Embryonics has demonstrated one approach to distributed fault tolerance by creating cellular electronic systems for example [2]. If the layers of protection in the human body and existing methods of hardware protection are compared as in Table 1 we find there is a gap that existing hardware protection systems could potentially benefit from filling. TABLE 1: LAYERS OF PROTECTION IN THE HUMAN BODY AND HARDWARE Defence Human immune Hardware protection mechanism system Atomic barrier Skin, mucous Hardware enclosure (physical) membranes (physical/EM protection) Temperature Environmental settings Physiological Acidity (temperature control) N-modular redundancy Innate immunity Phagocytes Embryonics Humoral immunity Acquired immunity ? Cellular immunity

One solution to completing Table 1 is demonstrated with the development of immunological electronics, or Immunotronics – the creation of an artificial hardware immune system. V.

ARTIFICIAL IMMUNE SYSTEMS

Artificial immune systems take their inspiration from the operation of the human immune system to create novel solutions to problem solving. Although still a relatively new area of research, the range and number of applications is already diverse [8][9]. Computer security, virus protection, anomaly detection, process monitoring, pattern recognition, robot control, and software fault tolerance are some of the applications artificial immune systems are being applied to. One key feature links all of these applications – they operate in a software domain. Our approach demonstrates that artificial immune systems can also exist in the hardware domain [10]. Two distinct algorithms have emerged as successful implementations of artificial immune systems: the immune network model hypothesised by Jerne [11] and the negative selection algorithm developed by Forrest [12]. The negative selection algorithm is used to differentiate between normal system operation, i.e. self and abnormal operation, i.e. nonself. This is achieved by generating a set of detectors R, with each detector r ∈ R of length l, that fail to match any self strings s ∈ S, also of length l, in at least c contiguous positions [12]. VI.

IMMUNOTRONICS

A. Domain mapping In transferring the concepts from immunology to a hardware domain we adopt the following analogies:

• • • • •

Self → normal hardware operation and nonself → faulty operation. Memory T cells → Set of stored tolerance conditions (detectors) and antibodies → State/tolerance condition comparator and response initiator. Learning during gestation → Generation of the tolerance conditions. Inactivation of antigen → Return to normal operation. Lifetime of organism → Operational lifetime of the hardware.

Using the FSM description of the hardware shown in Figure 1 under normal conditions (self) only transitions tqx can occur. The advent of a fault could cause an undefined transition tex. Concentrating on the transitions rather than the individual states is very important as it then enables incorrect transitions between two individually correct states to be detected. B. Choice of Algorithm The negative selection algorithm is adopted for the hardware immune system for two reasons: 1)

Complex detector set generation benefits a simple operational phase – ideal for a hardware environment where reduced complexity simplifies the design, reduces component count, and promotes distribution throughout the hardware architecture.

2)

Probabilistic detection permits a trade off between storage requirements and the probability of failing to detect a nonself hardware condition. To cater for changes in system functionality, the use of a reconfigurable platform such as a field programmable gate array (FPGA) enable the operation of a system to be updated or completely changed. The elimination of rigid boundaries between functional and protection is ideal, a requirement provided by probabilistic detection.

C. Architecture of the Hardware Immunisation Suite The hardware immune system is divided into two components: 1) 2)

Software/hardware testbench for data gathering and tolerance condition generation. The run-time hardware immune system to provide real-time error detection to the finite state machine.

The software/hardware testbench permits data to be extracted from an already constructed sequential hardware system where the state table description is not fully known. The system to ‘immunise’ is inserted into a test wrapper that enables the software to initiate a cycle of normal operation

and monitor and record the states of the hardware. The operation of this is discussed further in [13]. This paper concentrates on the benchmark state machines that already have a complete state table description and so this stage is not required here. Self strings are formed as in Figure 2. System inputs / Current state / Next state / (Outputs) 0010 / 01101 / 01110 / (101) Figure 2: Organisation of the strings to be protected. The system outputs may be optionally added.

Tolerance condition generation is also carried out in software by application of the negative selection algorithm using the Greedy Detector generating (GDG) algorithm developed by D’haeseleer [14]. D’haeseleer showed how optimal coverage of nonself strings, or faulty operation in our case, could be achieved with a minimal number of detectors by extracting those that match the most nonself strings first and then those that match the most, as yet not covered nonself strings. This is critical for an application such as this where hardware storage space could potentially be limited. Probabilistic detection also enable high compaction of the set of nonself strings. Generated tolerance conditions are analysed to assess the experimental probability of failing to detect an invalid string, both on a total failure probability, detectable over a number of cycles when an error may have propagated, and also single cycle error detection (SCED) failure probability. Strings are single cycle detectable if both the input and current state bits are contained within a self string, and the next state bits contained within a nonself string. By analysing the next state bits the SCED failure probability can also be determined. This is important for finite state machine architectures where it is desirable to detect the presence of an error before the effects propagate. Section VII demonstrates this for a selection of benchmark state machines. In the operational phase, the hardware immune system acts as a wrapper, monitoring the system inputs, and states (and if required the system outputs) to enable errors to be detected before the system propagates to its next state on the following clock edge. The hardware immune system consists of two components: 1) 2)

Antigen Presenting Cell (B cell). This extracts the data from the FSM and presents it to the T cells, to determine if a response should be initiated. T cell storage. The tolerance conditions (detectors) are stored in a hardware content addressable memory (HCAM) that allows parallel searching of all memory locations [15]. Parallel searching of all memory locations meets the requirement of single cycle detection of nonself strings. (In a reversal of roles, models of the immune system have previously been used to create novel forms of content addressable memory [16][17]).

Figure 3 shows the hardware immune system configured to monitor system inputs and state. The HCAM has been developed as a generic VHDL model allowing resynthesis using standard development tools to create varying sizes of memory depending on the desired storage space. A User input

State machine (Self)

Output

State

Wait

The synthesised hardware immune system was applied to a state machine based decade counter, the architecture and results of which are shown in [13][19].

State recognition (B cells) CAM search and mask (Nonself recognition) Signalling peptide

VII.

Costimulation

A.

CAM (memory) Tolerance conditions (T cells)

Figure 3: Structure of the hardware immune system, incorporating the finite state machine to be protected.

demonstration system was synthesised for a Xilinx Virtex FPGA, 64-bits wide, and 128 words deep to create the CAM organisation in Figure 4. The architecture of the Virtex FPGA is ideal for constructing 4-bit CAMs using the Lookup-tables (LUTs) [18]. The LUTS are then connected together to create greater width data. Parallel matching of all tolerance conditions during operation ensures single cycle error detection whatever speed the hardware system being protected is running at. With no speed optimisations turned on within the Xilinx Foundation synthesis tools, the XCV300 device that contained the hardware immune system was estimated to operate at 45MHz. Considering operational speed for a custom fabricated device, parallel HCAM searching ensures the system would operate at the full

Data word 0

Data

32

Data word 1

32

Data word 127 l

Match 128

Found

Figure 4: Architecture of the partial matching hardware content addressable memory.

BENCHMARK STATE MACHINE IMMUNISATION

Benchmark State Machine Description

The benchmark state machines were developed to create a standard set of finite state machines that could be used for comparing methods of logic synthesis and optimisation capabilities and are used in numerous research papers [20]. They have also been used to compare test sequence generation for circuit validation, fault coverage and error detecting. This section presents results for subset of benchmarks and provides preliminary comparison against other methods of error detection on equivalent state machines. The hardware immune system was applied to the benchmark state machines defined in Table 2. Results are presented for strings created from input, current, and next state. TABLE 2: BENCHMARK STATE MACHINE DEFINITIONS FSM name cse dk14 ex6 mc sse train11

B.

Masking logic Mask

required speed of any system it was implemented to protect. Partial matching capabilities were further added to the HCAM so that c contiguous bit matching can be implemented rather that just complete string matching. A bit string mask is added that allows each bit of the tolerance conditions to be selectively included in a matching operation, or selectively set to a don’t care condition. With the addition of the masking bits the demonstration system built allows a 32-bit width, 128 words deep partial matching HCAM as shown in Figure 4. Matches are selectively made by selecting c contiguous bits to require a match at any one time.

Inputs 7 3 5 3 7 2

Outputs 7 5 8 5 7 1

Product terms 91 56 34 10 56 25

States 16 7 8 4 16 11

Repeated Cycle Error Detection

Table 3 shows the probability of failing to detect an error for a selection of different match lengths over repeated clock cycles during the state machine operation. This represents the complete coverage of errors that will eventually be detected during the state machine operation, but not necessarily within a single clock cycle. The second column in Table 3 defines the string length used during the immunisation process from Figure 2, using binary encoding for the states. The third column (min c) represents the coverage using the minimum length matching distance (c) that allows at least one tolerance condition to be generated. The columns labelled x, y, and z represent three additional match lengths within the data created. The final column shows the results when the match

length is equal to the string length. In each case the failure probability is followed by the match length c in brackets. TABLE 3: FAILURE PROBABILITY FOR A SAMPLE OF MATCH LENGTHS (C) FOR REPEATED CYCLE ERROR DETECTION FSM Name cse

String length l 15

dk14

14

ex6

11

mc

7

sse

15

train11

10

min c

x

0.81 (11) 0.26 (6) 0.91 (8) 0.33 (6) 0.82 (11) 0.39 (4)

y

z

c=l

0.27 (12) 0.01 (8) 0.29 (9) -

0.16 (13) 0.00 (10) 0.06 (10) -

0.04 (14) 0.00 (12) -

0.25 (12) 0.34 (5)

0.06 (13) 0.07 (6)

0.03 (14) 0.02 (8)

-

0.00 (15) 0.00 (14) 0.00 (11) 0.00 (7) 0.00 (15) 0.00 (10)

Table 3 shows that as the match length increases the failure probability reduces as progressively more tolerance conditions are created that are able to match nonself strings but fail to match self strings. The final column of Table 3 is used to confirm that the algorithm works as expected with complete coverage of all errors, i.e. each tolerance condition completely matches one nonself string. C.

Single Cycle Error Detection

Table 4 shows the same set of experiments run, but this time showing the failure probability for single cycle error detection. The difference in failure probability shows the change from the coverage provided for repeated cycle error detection in Table 3, and is noted as the third italicised value in each table position. Table 4: FAILURE PROBABILITY FOR A SAMPLE OF MATCH LENGTHS (C) FOR SINGLE CYCLE ERROR DETECTION

The change in failure probability when concentrating on single cycle error detection is minimal for the benchmark state machines shown in Table 4. D.

Comparing the results against other error detection techniques is difficult. On one hand, many results show the failure probability and error coverage for automated test pattern generators, designed to run as off-line testing systems. The alternative is the design of checkers integrated into sequential systems that check for the presence of errors in real-time. The normal approach for real-time checkers is to concentrate on single errors within the system. Unless the next state logic for each state bit is completely independent, the potential for more than a single error certainly needs to be considered. The challenge and results present in most realtime error detection papers concentrate on minimising the additional logic required to provide single error detection. The approach of this paper is to concentrate on the coverage abilities of the real-time error detection hardware. Jha and Wang [21] have however analysed the failure probability of m-hot codes for unidirectional errors (those that result in a change of any number of bits all to the same logic level), which allows a certain degree of comparison. Table 5 compares the failure probability for single cycle error detection for the benchmark state machines. The first three result columns are for the hardware immune system, with match lengths shown in brackets, and the final three for the results from [21]. TABLE 5: A COMPARISON OF ERROR COVERAGE FOR THE HARDWARE IMMUNE SYSTEM AND M-HOT ENCODING FSM Name cse dk14

FSM Name cse

String length l 15

dk14

14

ex6

11

mc

7

sse

15

train11

10

min c 0.87 (11) +0.06 0.30 (6) +0.04 0.94 (8) +0.03 0.33 (6) +0.00 0.91 (11) +0.09 0.65 (4) +0.26

x

y

z

0.28 (12) +0.01 0.02 (8) +0.01 0.30 (9) +0.01 -

0.11 (13) -0.05 0.00 (10) +0.00 0.06 (10) +0.00 -

0.04 (14) +0.00 0.00 (12) +0.00 -

0.29 (12) +0.04 0.55 (5) +0.21

0.07 (13) +0.01 0.18 (6) +0.11

0.03 (14) +0.00 0.03 (8) +0.01

-

Failure Probability Evaluation

c=l

ex6

0.00 (15) +0.00 0.00 (14) +0.00 0.00 (11) +0.00 0.00 (7) +0.00 0.00 (15) +0.00 0.00 (10) 0.00

mc sse train11

Min c 0.87 (11) 0.30 (6) 0.94 (8) 0.33 (6) 0.91 (11) 0.64 (4)

x

c=l 0.11 (13) 0.02 (8) 0.06 (10) -

0.07 (13) 0.18 (6)

0.00 (15) 0.00 (14) 0.00 (11) 0.00 (7) 0.00 (15) 0.00 (10)

1-hot

2-hot

3-hot

0.06

0.13

0.17

0.1

0.05

-

0.02

0.00

-

0.11

0.00

-

0.06

0.02

0.09

-

-

-

Table 5 demonstrates that similar error detection coverage abilities can be achieved with the hardware immune system, with the benefit of a separate system that can be easily configured to protect different systems by just running the immunisation process again.

VIII.

CONCLUSION X.

This work has demonstrated that taking inspiration from the human immune system, in the form of the negative selection algorithm is suitable for the design of novel error detection mechanisms for integration into reliable hardware system. Error detection is performed in real-time, albeit at present through the use of centralised error detection. Error detection is probabilistic, permitting a trade off between storage requirements and the ability to detect an error within the sequential system. In contrast to existing error detection techniques that concentrate on single bit errors, and can sometimes fail to detect multiple errors, the hardware immune system is adept to detecting at this task. Generation of tolerance conditions is still implemented in software; the unique part of this work is the demonstration of a hardware wrapper that provides fully embedded error detection using principles from immunology. The immune system is also a separate component, permitting integration in a variety of different systems, either built with a hardware immune system in mind, or added at a later point. The results demonstrate the error detecting abilities of the hardware immune system on a range of distinctly different benchmark finite state machines and compare them, although somewhat loosely to the established techniques of m-hot coding. In the paper only error detection coverage results are presented. Another method of comparison would be to assess the storage requirements. The work has concentrated on immunising systems that already possess a complete state description. Future work would benefit from investigating systems already build and configured, treating them as a ‘black box’, or possible even investigating the inclusion of continuous updates to the tolerance conditions whilst the system is in operation. Work in the field if Immunotronics is now progressing in the Bio-Inspired Engineering research group at the University of York into the design of microprocessor immune systems and distributed hardware immune systems. Immunotronics is also currently being integrated with other biologically inspired architectures such as Embryonics and Evolvable hardware as part of a European Commission’s Future and Emerging Technologies project. Information is available at http://www.poetictissue.org . IX.

ACKNOWLEDGEMENTS

This work has been supported by the Engineering and Physical Sciences Research Council, UK and Xilinx Inc.

REFERENCES

[1] A.M.Tyrrell, G.S.Hollingworth, S.L.Smith, “Evolutionary Strategies and Intrinsic Fault Tolerance”, in Proceedings of the 3rd NASA/DoD Workshop on Evolvable Hardware, pp. 98-106, July 2001. [2] D.Mange, M.Sipper, A.Stauffer, G.Tempesti, “Toward Robust Integrated Circuits: The Embryonics Approach”, Proceedings of the IEEE, Vol. 88:4, pp. 516-541, April 2000. [3] P.A.Lee, T.Anderson, Fault Tolerance Principles and Practice, SpringerVerlag, 2nd ed. 1990. [4] D.K.Pradhan, Fault Tolerant Computing: Theory and Techniques Volume 1, Prentice-Hall, 1986. [5] N.K.Jerne, “The Immune System, Scientific American, Vol.229:1, pp. 5260, 1973. [6] C.A.Janeway, P.Travers, Immunobiology, the Immune System in Health and Disease, Churchill Livingstone, 3rd ed. 1997. [7] A.Avizienis, “Towards Systematic Design of Fault-Tolerant Systems”, IEEE Computer, Vol. 30:4, pp. 51-58, April 1997. [8] D.Dasgupta, N.Attoh-Okine, “Immunity-Based Systems: A Survey”, IEEE International Conference on Systems, Man and Cybernetics, 1997 [9] D.Dasgupta, N.Majumdar, F.Nino, “Artificial Immune Systems: A Bibliography”, CS Technical Report – CS-01-002, ver. 2.0, The University of Memphis USA, June 2001. [10] D.W. Bradley, A.M.Tyrrell, “The Architecture for a Hardware Immune System” Proceedings of the 3rd NASA/DoD Workshop on Evolvable Hardware, pp. 193-200, July 2001. [11] N.K.Jerne, “Towards a network theory of the immune system”, Ann. Immunol. (Inst. Pasteur), Vol. 125C, pp. 373-379, 1974. [12] S.Forrest, A.S.Perelson, L.Allen, R.Cherukuri, “Self-Nonself Discrimination in a Computer”, Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy, pp. 202-212, 1994. [13] D.W.Bradley, A.M.Tyrrell, “Immunotronics: Novel Finite State Machine Architectures with Built in Self Test using Self-Nonself Differentiation”, to appear in IEEE Transactions on Evolutionary Computation. [14] P.D'haeseleer, “Further Efficient Algorithms for Generating Antibody Strings”, Technical Report CS95-3, Department of Computer Science, University of New Mexico, 1995. [15] T.Kohonen, Content-Addressable Memories, Springer-Verlag, 2nd ed. 1987. [16] C.J.Gibert, T.W.Routen, “Associative Memory in an Immune-Based System”, in Proceedings of the 12th International Conference on Artificial Intelligence AAAI-94, pp. 852-857, 1994. [17] J.E.Hunt, D.E.Cooke, “Learning using an artificial immune system”, Journal of Network and Computer Applications, Vol. 19, pp. 189-212, 1996. [18] Xilinx Inc, “Virtex data sheet”, 1999, http://www.xilinx.com/partinfo/virtex.pdf [19] D.W.Bradley, A.M.Tyrrell, “Multi-layered Defence Mechanisms: Architecture, Implementation and Demonstration of a Hardware Immune System”, in Proceedings of 4th International Conference on Evolvable Systems: From Biology to Hardware (ICES2001), Lecture Notes in Computer Science 2210, pp. 140-150, October 2001. [20] S.Yang, “Logic Synthesis and Optimization Benchmarks User Guide Version 3.0” Technical report, Microelectronics Center of North Carolina, January 1991. [21] N.K.Jha, S.J.Wang, “Design and Synthesis of Self-Checking VLSI Circuits” IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol. 12(6), pp. 878-887, June 1993.

Suggest Documents