IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
873
Immune-Inspired Adaptable Error Detection for Automated Teller Machines Rog´erio de Lemos, Member, IEEE, Jon Timmis, Member, IEEE, Modupe Ayara, and Simon Forrest
Abstract—This paper presents an immune-inspired adaptable error detection (AED) framework for automated teller machines (ATMs). This framework has two levels: one is local to a single ATM, while the other is network-wide. The framework employs vaccination and adaptability analogies of the immune system. For discriminating between normal and erroneous states, an immuneinspired one-class supervised algorithm was employed, which supports continual learning and adaptation. The effectiveness of the proposed approach was confirmed in terms of classification performance and impact on availability. The overall results are encouraging as the downtime of ATMs can de reduced by anticipating the occurrence of failures before they actually occur. Index Terms—Adaptable error detection (AED), artificial immune systems (AIS), automated teller machines (ATMs), availability, fault tolerance.
I. INTRODUCTION UTOMATED teller machines (ATMs) are embedded systems for financial-related services. Work presented in this paper is concerned with how to improve the availability of these systems through adaptable error detection (AED). Adaptability is an important feature for improving the availability of these systems because ATMs exhibit different operational profiles depending on the environment they work in [1]. If the downtime for these machines has to be reduced, alternative techniques have to be investigated differently from the ones that rely on a fixed set of error detectors that are usually identified during designtime. The proposed error-detection technique aims to reduce the system downtime by detecting those states during run-time that are precursors of system failure. This is achieved by employing immune inspired continuous learning for updating the set of error detectors in a system. The technique relies on the existence of sequences of states that represent the operational status of an ATM, from which the AED is able to identify those sequences that might contain fatal states. At present, ATMs do not include the notion of adaptability in their error-detection systems, and this is an important feature for improving the maintainability of these machines. This paper reports an industry-led research
A
Manuscript received April 26, 2005; revised June 8, 2006. This work was supported by the NCR Financial Solutions Group. This paper was recommended by Associate Editor L. Meng. R. de Lemos is with the University of Kent, Canterbury, Kent CT2 7NF, U.K. (e-mail:
[email protected]). J. Timmis is with the University of York, Heslington, York YO10 5DD, U.K. (e-mail:
[email protected]). M. Ayara is with the BNP Paribas, London NW1 6AA, U.K. (e-mail:
[email protected]). S. Forrest is with the Advanced Technology and Research (AT&R), NCR Financial Solutions Group, Dundee DD2 4SW, U.K. (e-mail: simon.forrest@ scotland.ncr.com). Digital Object Identifier 10.1109/TSMCC.2007.900662
project, which was developed in close cooperation with NCR Financial Solutions Group. The aim was to investigate how machine learning techniques could be used in improving the quality of services of ATMs. This is the only work of its kind on ATMs, and therefore, it is impossible to benchmark any final system developed against another. Extensive testing has been carried out with the system proposed in this paper, and has laid the foundations for AED in ATMs. This paper details the investigations undertaken to develop an immune-inspired AED technique for ATMs. In the context of this application, error detection entails to identify a sequence of states that precedes a system failure, and adaptability consists of changing a set of detectors according to the operational profile of the ATM. Underlying the immune-inspired AED is a framework that is based on the architecture of a network of ATMs, which consists of individual ATMs that are networked to a central management system. The network supports a two-way communication mechanism between the central management system and connected ATMs. Likewise, the proposed framework for AED consists of two levels of error detection. One level of the framework is local to a single ATM, while the other is a network-wide AED. The latter is for exchanging information on new and common error behaviors amongst individual ATMs. In this architecture, each ATM hosts a local AED, while the network-wide AED is implemented within the central management system. By exploiting the communication mechanism between the central management system and individual ATMs, exchange of information regarding error detectors amongst the local AED is made possible through the network-wide AED. The implementation undertaken in this paper was limited to the local AED. The network-wide AED is based on the same techniques, the only difference being that the error detectors produced at this level are general across a number of ATMS, and that intervention from a human operator is required to decide which detectors should be incorporated into the network of ATMs. An ATM is made up of several modules, but a single module—the cash dispenser—was employed for the implementation and validation of the local AED technique. The basis of this technique was an artificial immune system originally developed for e-mail classification [2]. Artificial immune systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving [3]. Adaptability in the immune system ensues from features such as learning and memory that endow the immune system with the ability to combat a large variety of invaders. The application of AIS to fault tolerance was initially motivated by Aviˇzienis, who described the analogy between the immune
1094-6977/$25.00 © 2007 IEEE
874
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
system and fault tolerance [4]. Since then, several approaches have been proposed in literature that have applied AIS to problems related to both software [5] and hardware [6] fault tolerance. Although a wide range of machine learning techniques have been applied to error detection, some of these techniques have potential drawbacks. In particular, from the perspective of AED, an extensive study was performed comparing different techniques [1], which has highlighted the disadvantages of some of these techniques. For example, artificial neural networks are “black boxes,” which means that it is difficult to understand the hidden knowledge encoded therein. Expert systems are cumbersome to maintain since they involve continual updating of the knowledge base to keep up with changes. In addition, most of the solutions are not capable of continuous learning, which implies that they are incapable of updating their knowledge during the monitoring of the target system’s operations. In the context of this paper, two alternative techniques were investigated in more detail, and were demonstrated to be inappropriate for the purpose of detector generation [1]. The C5.0 algorithm (a rule induction algorithm) was shown to generate generalized detectors that would significantly increase the number of false positives. In fact, our analysis discovered that for the data used in this study, any detectors generated were so general that a decision was made to investigate alternative approaches. In addition, a heuristic-based algorithm generated detectors so specialized that it was difficult to discern the true positives [1]. Moreover, these two approaches do not allow for easy online adaptation, which is a requirement for adaptable error detection for ATMs. We therefore concluded, an alternative approach was required. An appropriate approach for AED should be able to generate suitable and effective detectors that exhibit effective and comprehensible classification performance, and to incorporate a continuous learning feature. Continuous learning is needed to incorporate new information about errors into an AED system during system operations. An AIS algorithm was found to possess these characteristics, and was evaluated by using relevant criteria that include: 1) classification performance of the algorithm in discriminating normal behaviors from potential failure behaviors and 2) the measurement of the time interval between detection and the actual system failure. From the outcome of the evaluation, it was demonstrated that the proposed AED technique could detect an incipient system failure approximately 12 h for one data set and 2 h for a second data set. Based on these results, it is concluded that the framework, and subsequent prototype, is effective for AED. The rest of the paper is structured as follows. The following section introduces the problem domain, which is the enhancement of availability and maintainability in ATMs. Section IV reports on related work in the area of AIS applied to fault tolerance. Section IV defines a general framework for AED in context of a single ATM and a network of ATMs, with the following section discussing how AIS can be exploited for AED. The subsequent three sections describe, respectively, the prototype that was implemented, the experiments performed, and an analysis of the results obtained. The final section of the paper presents some concluding remarks and future work concerning
Fig. 1.
System architecture for AED in ATMs.
the application of AIS techniques to error detection in systems that are continuously subjected to change. II. AVAILABILITY AND MAINTAINABILITY IN ATMS A. Automated Teller Machines ATMs are embedded systems for financial-related services, such as dispensing cash, bank account enquiries, printing of balances, and cheque deposits. An ATM is made up of selfcontained modules that include cash dispenser for delivering money, magnetic card reader (MCRW) for reading debit or credit cards, keypad as input interface, display as output interface, printer for printing receipts, and depository for depositing cheques. The architecture of a system of ATMs is composed of individual ATMs that are networked to a central manager. The network supports a two-way communication mechanism between the central manager and connected ATMs. The architecture of a system of ATMs is depicted in Fig. 1. Each ATM (ATM 1, ATM 2, ATM 3, and ATM 4) is connected in a network to a Central Manager, which is able to receive and send information to connected ATMs. Both the ATMs and the central manager mentioned in this paper are proprietary of NCR Financial Solutions Group; thus, certain technical details have to be omitted. ATMs are highly available systems, hence the need for effective error-detection techniques that are able to reduce their downtime. One of the error-detection mechanisms incorporated into ATMs exploits the syntactic knowledge of a code space. This space is generated by a set of rules that map each state of a device into a corresponding code. As a result, error detection is simply identifying the erroneous states of each component based on the semantic description of relevant codes. Another error-detection mechanism exploited in ATMs are preemptive diagnostic checks to test the states of the processors controlling the modules. A problem associated with these error-detection techniques, in the context of ATMs, is that they are unable to detect erroneous behaviors that have no corresponding codes. In a situation where complete knowledge of all the possible states of a system is available, this approach is an appropriate solution. However, ATMs are complex systems derived from the integration of disparate components; therefore, it is difficult to identify all the
DE LEMOS et al.: IMMUNE-INSPIRED ADAPTABLE ERROR DETECTION FOR ATMs
possible interactions, and their failure behaviors, that could occur between the components. Also, there are different families of ATMs, which may be used in disparate geographical regions characterized by peculiar environmental conditions. In consequence, the errors generated by individual ATMs may vary with the family or location, thereby making it difficult to anticipate all possible errors of an ATM. Furthermore, different maintenance procedures and personnel might affect the operation of an ATM. The objective is to address these issues by AED, in order to enable an ATM to cope with uncertainties, as well as to facilitate informed and quicker maintenance. B. Dependability Availability and maintainability are attributes of dependability, and dependability is essentially the ability of a system to deliver service that can be justifiably trusted [7]. The dependability attributes express the properties of a system, and allow the quality of its services to be evaluated. There are several means for obtaining dependable systems (in our case, highly available and maintainable systems), including rigorous design, verification and validation, system evaluation, and fault tolerance [7]. The scope of this paper is fault tolerance, which allows a system to deliver its specified service despite the presence of faults [8]. The premise is that systems will always contain residual faults regardless of fault prevention and removal mechanisms. Fault tolerance is carried out via error detection and recovery. Error detection is responsible for identifying the presence of an error in a system. Recovery transforms a system state that contains one or more errors and (possibly) faults into a state without detected errors and without faults that can be activated again [7]. Error detection is the trigger for fault tolerance, thus the need for having an effective error-detection capability, if a system is expected to tolerate faults. Error-detection techniques usually exploit known error profiles for detecting erroneous states and behaviors. Such techniques rely on monitoring of system’s behavior with respect to a given set of rules that include: 1) adherence to given controlflow paths, 2) execution time limits, 3) data integrity checks, 4) comparison among redundant components, and 5) algorithmbased plausibility checks of data [9]. However, these approaches restrict the detection of errors to those that are known at designtime. A limitation of such approaches is that error-detection techniques are not expected to adapt to new patterns of behavior that can emerge during system operation. This observation motivated the investigations into AED for enhancing the availability of ATMs. III. RELATED WORK The analogy between fault tolerance and the immune system was first expressed by Aviˇzienis [4]. In that paper, four attributes of the immune system that support the idea are that the immune system functions continuously and autonomously, independent of cognition; its elements (lymph nodes, other lymphoid organs, and lymphocytes) are distributed throughout the body to serve all of its organs; it has its own communication links—the
875
network of lymphatic vessels; and its elements (organs and vessels) are themselves redundant, and in some cases, diverse. The conclusion is that a nature-inspired model such as the immune system will stimulate the development of fault-tolerant solutions that will outperform current solutions. This suggestion triggered current research efforts in the area. Properties such as diversity, redundancy, self-organization, anomaly detection, learning, and memory are all important from a fault tolerance perspective. A pioneering research in the application of AIS to fault tolerance was presented in [5]. The authors applied AIS to tolerate software design faults. One important contribution of the paper is the mapping of the immune system analogy to fault tolerance context, based on the immune system components. The authors further developed a model for software fault tolerance based on the immune network theory. The idea is to generate artificial antibodies that can be used to detect errors in software. The model is divided into two phases: learning and operational. During the learning phase, antibodies are generated randomly and evolved using a genetic algorithm. The operational phase is when the antibodies generated are applied to error detection in a software. Research into hardware fault tolerance can be described under fault diagnosis and error detection. Work on fault diagnosis has focused on applying immune network concepts for defining relationships between data from sensors, e.g., [10]. The sensors can be likened to cells in an immune network such that each sensor dynamically evaluates other connected sensors for inconsistencies based on the relationships between them. This network model was applied to the automatic diagnosis of faults in cement plants [11]. The results indicate that the model provides accurate information about faulty sensors. However, the model is limited by the need to define accurate relationships between sensors. In another work related to failure propagation, relationships between sensors and other system components are identified to indicate the direction of failure propagation within a system [12]. The approach was based on the stimulation and suppression analogies of the immune network. Based on the stimulation and suppression from connected components, each component is associated with mathematical values known as failure origin ratios that indicate the possibility of failing, and can be used for locating the source of the failure. Some simulation results were reported which showed that the origin of failures could be traced successfully based on the failure propagation network. More pertinent to this research is the investigations of AIS to error detection, which can be found in [6], [13], and [14]. By taking ideas from [4] and [5], Bradley and Tyrrell have examined the application of AIS to error detection in hardware [15]. The name immunotronics was coined for immune-based hardware fault tolerance. They proposed a mapping from the immune system to hardware fault tolerance that later led to the development of models for a hardware immune system using the attributes specified in [4]. One of the approaches proposed was based on a lymphatic network, to be implemented as an error-detection system for an embryonic architecture [16]. Embryonics takes its inspiration from the embryonic development of multicellular organisms. This concept relates to the generation of generic
876
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
cells during cell reproduction, such that generic cells are able to take over the function of any other cell before differentiation occurs. The idea was then applied to the implementation of component-level redundancy in hardware for achieving fault tolerance. A further model for hardware immune fault tolerance was developed [14], [15], [17], which was based on negative selection algorithm [18], [19]. Using this model, hardware can be immunized with tolerance conditions or antibodies that act as error detectors. The target hardware is represented as a finite state machine (FSM) that defines the acceptable states and transitions between the states. However, the work undertaken by Bradley et al. is fundamentally different from the work presented in this paper. First, Bradley’s approach was concerned with the fault tolerance of hardware, i.e., chips that could reconfigure themselves once an error had been detected, we are concerned with the detection of errors prior to their occurrence. In addition, Bradleys’ approach was underpinned by negative selection, and once the detectors had been placed on the chip, no immune mechanism was employed for AED, this was achieved via a developmental approach. IV. FRAMEWORK FOR AED A framework for AED was developed, which employs ideas from vaccination and adaptability analogies of the immune system. Vaccination or immunization is a process of priming the natural immune system against the occurrence of a disease by introducing attenuated antigens of the disease [20]. This process allows the immune system to generate antibodies for the introduced antigens, with the effect that subsequent invasions by similar antigens induce secondary immune responses. Therefore, this process endows the immune system with knowledge about antigens that it had not previously encountered, and enables it to adapt to novel antigens during the primary immune response. This confers on the immune system the ability to detect novel patterns and react accordingly, thereby supplementing the existing knowledge about antigens. In the proposed framework, the immunization metaphor corresponds to the traditional error-detection approach of deploying a set of error detectors, which are representative of known error signatures. However, the problem of traditional techniques is the inability to detect unexpected erroneous behaviors. What is required is a system that can continually learn about these unknown behaviors, and adapt a set of detectors capable of identifying them in the future. This requirement motivated us to adopt ideas from the continuous learning nature of the immune system. The framework consists of two phases, namely, design-time immunization and run-time adaptation that are comparable with the immune metaphors of immunization and continual learning, respectively. The design-time immunization caters to the distribution of generic error detectors amongst systems from an offline process of detector generation. To be more precise, assume that there is a family of embedded systems with similar functions and behaviors whereby each system is characterized by its own unique features. The idea is to extract generic error detectors corresponding to error signatures common to these systems. Therefore, these generic error detectors serve as the
Fig. 2.
Activity diagram of the framework for AED.
minimum set of detectors across all the systems compared to populations of detectors that are unique to individual systems. In contrast, the run-time adaptation phase confers on each system a more specialized set of detectors, and is responsible for augmenting the detectors that are more generic (through the use of an evolutionary process). The specialized error detectors are generated from error sequences observed during run-time operations of the system. Furthermore, the framework divides the learning mechanisms into two levels: 1) learning within a system and 2) learning amongst systems. The two levels are represented as local AED and network-wide AED, as illustrated in Fig. 2. V. AIS FOR AED As already mentioned, the design of the framework for AED was inspired by the natural immune system. In the following, the main features of AISs are outlined, and the algorithm that essentially implements the framework for AED is described. A. Artificial Immune Systems AISs are an example of nature-inspired problem solving system, and can be defined as adaptive systems, inspired by theoretical immunology and observed immune functions, principles, and models that are applied to problem solving [3]. AISs have demonstrated significant advantages and strengths in diverse scenarios, for example, where the inputs are often prone to noise and large perturbations, and the system needs to recover and continue operation [21]; where there can be concept shifts in the input space, and the system must track these [2];
DE LEMOS et al.: IMMUNE-INSPIRED ADAPTABLE ERROR DETECTION FOR ATMs
where the memory structure must be stable over time, but able to cope with novel structures [22]; where the system needs to be able to generalize or adapt quickly [23]; where, for the system to operate in real time, special hardware may be necessary to cope with the load and throughput [16]; where we may require the facility of self-repair and continual development to take full advantage of the immune system capabilities [2], [22]. For these reasons, they are proving to be an interesting and fruitful avenue of research for fault tolerance. In an attempt to create a common basis for AIS, a framework has been proposed [3]. The basic elements of that framework are the following. 1) Application domain: This includes the application data for which an AIS is to be developed. 2) Representation: Appropriate representation needs to be developed for the components of the AIS. 3) Affinity measure: This defines how the components of the AIS will interact. It is also a determinant of how the interactions will be evaluated. 4) Immune algorithms: These include processes that govern the dynamics of the system over time. In other words, the immune algorithms define the variation in the behavior of the AIS over time. The AIS framework can be thought of as a layered approach starting from an application domain or target function. On this basis, the way in which the components of the system will be represented will be considered. For example, the representation of network traffic may well be different from the representation of a real-time embedded system. Once the representation has been chosen, one or more affinity measures are used to quantify the interactions of the elements of the system. There are many possible affinity measures (which are partially dependent upon the representation adopted), such as Hamming and Euclidean distances. The final layer involves the use of algorithms, which govern the behavior (dynamics) of the system. There are several algorithms, and these can be based on the following immune processes: negative and positive selection, clonal selection, bone marrow, and immune network algorithms. Work in [24] discusses the importance of adopting a problem-oriented approach to the development of AIS, rather than the more ad hoc adoption of techniques. In line with this, we carefully analyzed the problem domain, and at all stages, we have focused on solving the problem at hand with the most suitable solution.
877
of that approach in this domain, as it has been shown to perform as well as Baysian systems in terms of classification, but proved itself to be efficient at adapting to changes in the data space: an important requirement in our study. The AISEC algorithm exploits two sets of artificial immune cells that display features and behaviors of natural B cells and T cells. For simplicity, all the immune cells in the AISEC algorithm are referred to as B cells. One set of immune cells are naive, while the other set are memory cells. The AISEC algorithm consists of training and testing phases. From the training phase emerge B cells that represent uninteresting e-mails. Each B cell is a feature vector containing words from the subject and sender fields of the corresponding uninteresting e-mail. During the testing phase of the AISEC algorithm, new e-mails are classified into interesting or uninteresting. These new e-mails are regarded as antigens, and are initially processed into the same format as B cells. Subsequently, if the affinity between an antigen and a B cell exceeds a threshold, the antigen is classified as an uninteresting e-mail. The classification of an antigen as an uninteresting e-mail requires a feedback from a user, termed costimulation, to confirm the accuracy of the classification. If the costimulation ascertains that the antigen is an uninteresting e-mail, the corresponding B cell is rewarded by being promoted to a memory B cell that are long-lived (assuming the B cell was not already a memory B cell). In addition, the B cell responsible for the correct classification of an e-mail undergoes clonal selection to produce variants of itself. Alternatively, an incorrect classification induces the death of the B cell responsible for the classification, as well as other similar B cells. In a situation where an antigen is classified as interesting, it is just passed on to the user. Altogether, the continuous learning feature of the AISEC algorithm is a product of the intermittent reproduction of B cells, user feedback on classification of e-mails, and cell death. Other techniques were investigated, such as rule induction, and we concluded that this algorithm had the simplicity and properties that are required. As part of this preliminary work, we undertook considerable investigations into the use of negative selection [1]. However, investigations revealed that there were significant problems with the approach, such as detector coverage, detector generation, and use of discrete data sources, and the need for two classes from which to learn. These problems have also been, subsequently, reported in the literature, and the reader is referred to those for further information [28]–[30].
B. AIS for E-Mail Classification (AISEC) Algorithm The field of AIS [3] has had success in the development of effective classification algorithms [25]–[27]. For our solution, we required a technique that was capable of continual learning, was highly adaptable, and produced a model of the system that could prove useful to field engineers. Through our investigations, we found the immune system for e-mail classification (AISEC) [2] to contain such properties. AISEC was developed for a two-class problem, to discriminate between interesting and uninteresting e-mails. However, it has properties such as continual learning and adaptation that are required for our application to ATMs. Therefore, a decision was made to investigate the applicability
VI. AIS PROTOTYPE FOR AED IN ATMS In this section, we outline a prototype system that has been realized as part of the research. We initially describe the mapping of the framework for AED into a network of ATMs, then discuss issues relating to the data and immune-inspired techniques employed. The framework for AED, outlined in the previous section, can be placed in the context of a network of ATMs, as depicted in Fig. 1. In this diagram, we see ATMs labeled ATM 1, ATM 2, ATM 3, and ATM 4 connected to a Central Manager, which is able to receive and send information to connected ATMs. As can
878
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
be observed, the framework exploits the network infrastructure to support local and network-wide learning. A local AED system is implemented within an ATM, while the network-wide AED is hosted by the Central Manager that supports the information exchange amongst local AEDs. Altogether, the connection of each ATM to the Central Manager enables learning amongst ATMs through the network-wide AED. In the following, we proceed to describe the prototype for AED in terms of the basic elements of the AIS framework, as introduced in Section V-A.
Fig. 3.
Illustration of fatal and nonfatal sequences.
Fig. 4.
r-Contiguous bits matching rule.
A. Application Domain The data source provided for the implementation of the local AED were ATM log files, which capture histories of error occurrences in an ATM. Each error occurrence is associated with a time stamp indicating the time of the error event. Each log file records error occurrences related to different modules in an ATM, for example, an error in the MCRW module. Each record is made up of fields that are descriptive of an error occurrence. Throughout our work, we investigated alternative ways on how to use the data, ranging from time stamps, M-Status, M-Data, and combinations of some fields. After careful investigation, it was concluded that the combination of a number of fields, e.g., M-Status and time, may be beneficial, but the simplest approach was adopted. Consequently, the M-Status field was selected for the implementation. The M-Status field takes discrete values that are error codes, describing the state of an ATM at the time an error was detected. As a result of further investigations, it was discovered that the M-Status field was sufficient for identifying possible fatal states (assumed to be a value of 10) of an ATM. A fatal state is indicative of the failure of an ATM.
B. Representation The task of the AED given the ATM log files is the a priori detection of fatal states that correspond to an impending system failure. In this regard, the patterns to be detected are sequences of states preceding fatal states, i.e., fatal sequences of states. The approach monitors the sequences of ATM states until a fatal sequence of states is detected. Hence, detectors identify exclusively fatal sequences, and not nonfatal ones. In order to constrain the number of states in a sequence, a fixed window size for generating sequences was adopted due to the absence of information on markers that tag the beginning of sequences of states. Sequences of states are fatal when they are terminated by fatal states (M-Status value of 10 is a fatal state). Nonfatal sequences of states are not terminated by fatal states (any M-Status value apart from 10). Fig. 3 shows examples of the fatal and nonfatal sequences of states. The sequences illustrated in Fig. 3 are generated using a fixed window size of 6. The sequence terminated by a M-Status value of 10 in Fig. 3 is a fatal sequence of states, while the sequence that ends with a M-Status value of 13 is a nonfatal sequence of states.
C. Affinity Measure In order for a detector to identify whether a sequence of data is a precursor to failure, some form of affinity measure between the two is required. Given that a sequence is being used, any affinity metric should take into account the number of states (a history) in order to make the prediction. To this end, the most obvious choice is to adopt some form of window on the incoming sequence that allows a sequence to be matched against a detector. If sufficient states within the sequence are matched against a fatal sequence detector, then a classification can be made whether the sequence is a precursor to a fatal state. The affinity measure employed adapts the r-contiguous bits matching rule for the problem, which defines the affinity between two data items, when they have a number of contiguous bits in common. Consequently, affinity between a sequence of run-time ATM states (antigen) and an error detector (B cell) is computed by identifying the number of contiguous states that are common to them. An illustration is shown in Fig. 4, whereby the value of r is the minimum number of contiguous states required to define affinity. In our prototype, affinity is calculated from the r-contiguous states common to an antigen and a B cell. The affinity measure also takes into account the proximity of the common contiguous states present in the B cells in relation to the fatal state. For example, in Fig. 4, the B cell has two states 35, 35 between contiguous states 8 18 8 5 and fatal state of 10. The antigen has two states 4, 18 after the contiguous states 8 18 8 5. These states that lie in between the contiguous states and the fatal state provide another factor for the affinity. Affinity is calculated as a value between 0 and 1, and it is computed using (1) based on the following notations. affinity variable to store affinity between antigen and B cell; r-contiguous contiguous bits common to antigen and B cell; windowSize window size for generating sequences; abs(x) absolute value of x;
DE LEMOS et al.: IMMUNE-INSPIRED ADAPTABLE ERROR DETECTION FOR ATMs
antigenInterval b-cellInterval
affinity=
879
number of states between r-contiguous bits and fatal state in antigen; number of states between r-contiguous bits and fatal state in B cell.
r-contiguousbits . windowSize+abs(antigenInterval−b-cellInterval) (1)
D. AISEC Algorithm for Local AED Since the AISEC algorithm cannot be applied directly to AED, the algorithm had to be redesigned according to the new application [24]. The modified AISEC algorithm for AED has two phases. The training phase, during which the error detectors are generated offline, based on data obtained from error logs produced by an ATM. The testing phase, which corresponds to the online execution of the AISEC algorithm, during which errors are identified based on detectors previously generated. These detectors are adapted according to the learning capabilities of the AISEC algorithm. In the following sections, these two phases are presented in more detail. For both phases, the sequences of states (which characterize detectors) are of fixed length, and the identification of fatal sequences is performed by applying an overlapping sliding window to a stream of incoming data. Empirical studies have shown that the AISEC algorithm could not discriminate between fatal and nonfatal sequences when sequences are generated through an overlapping mode [1]. The reason being that a fatal sequence and a preceding nonfatal sequence have last n − 1 states in common, where n is the length of each sequence. Since the AISEC algorithm cannot be applied directly to AED, the algorithm had to be redesigned according to the new application [24]. The modified AISEC algorithm for AED has two phases. The training phase, during which error detectors are generated offline. The testing phase, which corresponds to the online execution of the AISEC algorithm, during which errors are identified based on detectors generated previously, and these detectors are adapted according to the learning capabilities of the AISEC algorithm. In the following, these two phases are presented in more detail. For both phases, the sequences of states (which characterizes detectors) are of fixed length, and the identification of fatal sequences is performed by applying an overlapping sliding window to a stream of incoming data. Empirical studies have shown that the AISEC algorithm could not discriminate between fatal and nonfatal sequences when sequences are generated through an overlapping mode [1]. The reason being that a fatal sequence and a preceding nonfatal sequence have last n − 1 states in common, where n is the length of each sequence. 1) Training Phase of AISEC Algorithm: During the offline phase of the AISEC algorithm, M-Status sequences terminated in a fatal state are generated from the ATM log files. These sequences will be used as a basis for generating the error detectors. The pseudocode in Fig. 5 outlines the offline phase of the AISEC algorithm. The AISEC algorithm is trained with B cells (trainingBcells), which were obtained from M-Status sequences
Fig. 5.
Pseudocode for the offline process of the AISEC algorithm.
Fig. 6. Pseudocode for the classification of a sequence by the AISEC algorithm.
terminated by fatal states generated from the ATM log files. First, the memory set (memoryCells) is initialized with a subset of the B cells in trainingBcells. The number of B cells used to initialize the memory set is limited to the memory seed (memorySeed). Then, the remaining B cells undergo cloning and mutation to produce a diverse set of B cells that are introduced into the naive set (naiveCells). The outcome of this process is a set of generic error detectors that are used to immunize the local AED for the classification of potential failure sequences. 2) Testing Phase of AISEC Algorithm: The online phase of the AISEC algorithm for AED caters to the error detection, learning, local tolerization, validation, and evaluation processes. It involves classifying each fixed-length sequence of states presented to the algorithm and reacting to the feedback on each classification. The classification process is outlined by the pseudocode in Fig. 6. An antigen, or a sequence of states denoted as sequence, is compared with the naive and memory B cells termed naiveCells and memoryCells, respectively. If sequence matches a B cell in naiveCells or memoryCells, it is classified as fatal. Otherwise, sequence is considered to be nonfatal. The sequence matches a B cell when their affinity exceeds the classification threshold (classificationThreshold).
880
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
TABLE I DATA SETS IN THE TRAINING, VALIDATION, AND TESTING PARTITIONS OF ATM DATA FOR EVALUATING THE CLASSIFICATION PERFORMANCE OF THE LOCAL AED
TABLE II DATA SETS IN THE TRAINING AND TESTING PARTITIONS OF ATM DATA FOR EVALUATING THE MEAN DETECTION TIME INTERVAL OF THE LOCAL AED
Fig. 7.
Pseudocode for the online phase of the AISEC algorithm.
From the outcome of the classification process, a positive, negative, or nil response may be generated by the AISEC algorithm. No response is generated by the AISEC algorithm when a sequence is classified as nonfatal. The situation is different when the classification infers that a sequence is fatal, whereby the AISEC algorithm reacts positively or negatively depending on the costimulation. A costimulation that approves of a classification incites a positive reaction. The converse is the case with a rebuttal from the costimulation. The pseudocode in Fig. 7 outlines the reaction of the AISEC algorithm when a sequence is classified as fatal. A sequence classified as fatal and confirmed to be correct by the costimulation leads to the evolutionary process. Every naive B cell termed naiveCells, whose affinity with the classified sequence (sequence) exceeds the affinity threshold (affinityThreshold) has its lifespan increased by the value of the stimulation count for naive B cells. In addition, such naive B cells in naiveCells are cloned and mutated, and the clones are added to naiveCells. Furthermore, bCellBest is initialized with a naive B cell from naiveCells, with the highest affinity for sequence. mCellBest is also initialized with a memory B cell from memoryCells, having the highest affinity for sequence. If the affinity between bCellBest and sequence, exceeds the affinity between mCellBest and sequence, bCellBest is promoted to become a memory B cell. mCellBest is not removed from the memory set at this
stage. Memory cells are only removed once the lifespan indicator has fallen below a certain threshold. The promotion entails initializing the lifespan of the new memory B cell with the stimulation count for memory B cells. In a situation where sequence is classified as fatal and the costimulation disproves of the classification, all naive B cells whose affinity for sequence exceeds affinityThreshold are removed from naiveCells. The same applies to the memory B cells in memoryCells. After presenting a sequence for classification, the lifespan of all B cells in naiveCells are decremented. Each presentation of a sequence of states for classification corresponds to an iteration of the algorithm. The AISEC algorithm for AED employs generalization and specialization of B cells as its mutation mechanism. Generalization substitutes a valid state in a B cell with a do not care (*), while specialization substitutes a state with another valid state in the gene library. The gene library constrains the algorithm to mutating with only valid states. In addition to this, new B cells can be introduced into the detector set through the incorporation of undetected fatal sequences. Unlike cloning and mutation, which are characterized by guided random process, the incorporation of fatal sequences allows for learning about specific failure sequences, that is, if they occur again, they will be detected. VII. EXPERIMENTAL SETUP For evaluating the classification performance of the AISEC algorithm, two data sets were used. Data set ATM-data-setA was derived from the concatenation of preprocessed ATM log files generated by different ATMs located in a common geographical area. The same applies to the data in ATM-data-setB, which were obtained from a geographical area different from that of ATM-data-set-A. Even though data from a single ATM should have been used for this purpose, the insufficiency of data required the concatenating of data from different ATMs. It was assumed that geographically located ATMs provided a common data set for experimentation. An initial investigation confirmed that this was not the case for nongeographically located ATMs. Prior to the detector generation process, the ATM log files are initially preprocessed. Data sets ATM-data-set-A and ATMdata-set-B were divided into three separate parts. Two-thirds of each data set were for training, while the remaining one-third of
DE LEMOS et al.: IMMUNE-INSPIRED ADAPTABLE ERROR DETECTION FOR ATMs
881
TABLE III COMPARISON OF CLASSIFICATION PERORMANCES OF THE VARIANTS OF THE AISEC ALGORITHM USING TRAINING AND TESTING DATA FROM ATM-data-set-A
TABLE IV COMPARISON OF CLASSIFICATION PERORMANCES OF THE VARIANTS OF THE AISEC ALGORITHM USING TRAINING AND TESTING DATA FROM ATM-data-set-B
Fig. 8. Changes to classification accuracy of the AISEC algorithm. Data set applied is ATM-data-set-A. Parameters include: window size = 6, classification threshold = 0.98, affinity threshold = 0.95, memory seed = 65, clone constant = 7, stimulation count (naive) = 25, stimulation count (memory) = 15, train data = 68 detectors, test data = 38 (24 fatal sequences and 14 nonfatal sequences).
Fig. 9. Changes to classification accuracy of the AISEC algorithm. Data set applied is ATM-data-set-B. Parameters include: window size = 14, classification threshold = 0.98, affinity threshold = 0.95, memory seed = 50, clone constant = 7, stimulation count (naive) = 25, stimulation count (memory) = 15, train data = 55 detectors, test data = 89 (17 fatal sequences and 72 nonfatal sequences).
each data set was divided into halves for validation and testing. The objective in partitioning the data into training, validation, and testing is to obtain an accurate and unbiased measure of the classification performance from experiments [31]. Records in each partition of the data sets are shown in Table I. For evaluating the mean detection time interval of the AISEC, the algorithm was trained with data sets ATM-data-set-A’ and ATM-data-set-B’. The detectors generated from data set ATM-data-set-A’ were tested using data set ATM-data-set-A”. Data set ATM-data-set-A” corresponds to a single log file in
ATM-data-set-A, while the remaining concatenated log files in ATM-data-set-A make up ATM-data-set-A’. The same applies to data sets ATM-data-set-B’ and ATM-data-set-B”, which are subsets of ATM-data-set-B. It was assumed that the training data from ATM-data-set-A’ were generic to log files in the relevant geographical region, such that the detectors generated were adequate for detecting errors in an ATM, i.e., ATM-dataset-A”. The same assumption holds for the experiments with ATM-data-set-B’ and ATM-data-set-B”. Table II presents the number of records in the training data sets ATM-data-set-A’ and
882
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
ATM-data-set-B’, as well as the testing data sets ATM-data-setA” and ATM-data-set-B”. All experiments were repeated for 30 independent runs, and the average taken. AISEC has a number of parameters; for more detail on these, see [2]. We undertook an extensive analysis of the parameter space, and the parameters used in our experiments were determined empirically, and are detailed in the table caption. For detailed information, refer to [1]. VIII. RESULTS AND ANALYSIS Experiments were carried out using four variants of the AISEC algorithm to understand the role of the offline and online processes, as well as the different evolutionary mechanisms. The variants of the local AED include the following. 1) Static AED: This is the AISEC algorithm, without offline evolutionary process during the training of naive B cells, without online feedback on classification, and without online evolutionary process. 2) Static AED with evolution: This is the AISEC algorithm, which includes the offline evolution of naive B cells, but without online feedback on classification, and without online evolutionary process. 3) Online AED with evolution: This is the full ASIEC algorithm, which includes online feedback on classification, online evolutionary process through cloning and mutation, and offline evolutionary process during the training of naive B cells. 4) Online AED with incorporation of fatal sequences: This is the full AISEC algorithm but instead of the evolutionary process by cloning and mutation, new B cells are recruited into the naive pool by incorporating undetected fatal sequences. A. Classification Performance The results from the experiments using ATM-data-set-A are shown in Table III, and for ATM-data-set-B, in Table IV. Column (a) represents results from executing the static AED, column (b) shows results from static AED with evolution, column (c) is for online AED with evolution, and column (d) provides the outcomes from online AED with incorporation of fatal sequences. The standard deviations are shown in brackets. As can be seen from Table IV, in terms of classification accuracy (i.e., how well it predicts a fatal sequence), the local AED is consistently high. Also, it shows that the static AISEC generated a higher classification accuracy and true positive rate than did the static AISEC with evolutionary process, based on the Z statistic at 0.05 significant level [Z = 5.06 Z0.05 = ±1.96]. A likely reason for this outcome is that the single attempt at exploiting the evolutionary mechanism during the training of the static AISEC with evolutionary process did not generate B cells that were useful for detecting fatal sequences of states. What is very encouraging is the low rate of false positives (i.e., how many times the AED system said there was a potential failure, when there was not). This figure should be as low as possible, as a high false positive rate is the same as a high false alarm rate. However, what should be noted is that there is very
little difference between the classification performance of all four variants. This is attributed to the relatively small amount of data that was available. Data employed in these experiments come from real ATMs in operation, and retrieval of these data (at present) is difficult. This restriction has not enabled the immune mechanism within the AED with evolution sufficient experience and time to improve. From the second data set ATM-data-set-B, the results in Table IV indicate that the classification accuracy, true positive and false positive rates of the static AISEC are identical to that of the static AISEC with evolutionary process. The reason for this is that B cells generated from the training data were similar since the data set ATM-data-set-B is highly repetitive. In contrast with Table III, the static and online AISEC variants in Table IV display higher false positive rates. This may have originated from the use of the repetitive data set ATM-data-set-B, such that the fatal and nonfatal sequences of states that were generated displayed some similarities. The lower false positive rates generated by the online AISEC variants in Table IV, when compared with the static AISEC variants, is as a result of the continuous learning that purged false positive B cells. Figs. 8 and 9, respectively, from the data sets ATM-data-setA and ATM-data-set-B show the changes to the classification accuracy of the online AISEC variants. The experiments relating to the online AISEC with evolutionary process are labeled as “evolutionary process” in the graphs. The label “incorporation of fatal sequences” in the graphs relates to the results from the online AISEC with incorporation of fatal sequences. Fig. 8 shows that during the online execution, the presentation of at least five testing sequences of states leads to almost 100.00% classification accuracy of the AISEC algorithm. Fig. 9 displays 100.00% classification accuracy for at least five testing sequences of states presented to the AISEC algorithm. Subsequently, misclassifications of fatal and nonfatal sequences of states decrease the classification accuracy. However, in the long term, the classification accuracy becomes stable, as depicted in Fig. 9, indicating that the AISEC might have reached an equilibrium. Furthermore, Figs. 8 and 9 show that the classification accuracies of the online AISEC with evolutionary process and the online AISEC with incorporation of fatal sequences are overlapping, which is more pronounced in Fig. 9. This confirms that the outcomes of the online AISEC variants are comparable as observed in the tabular results. Although previous experiments have demonstrated that the AISEC algorithm applied to AED augments error detectors by the continuous learning feature, they do not confirm whether the classification performance of the online AISEC algorithm improves with the recruitment of new error detectors. Therefore, experiments were performed to determine whether the online AISEC with evolutionary process and the online AISEC with incorporation of fatal sequences will improve with the learning of new error detectors. For these experiments, the testing data were passed twice through the AISEC algorithm to demonstrate whether the continuous learning of new B cells will improve the classification performance of the algorithm. The experiments have shown that through the learning of effective B cells, the classification accuracy rises progressively.
DE LEMOS et al.: IMMUNE-INSPIRED ADAPTABLE ERROR DETECTION FOR ATMs
883
TABLE V MEAN DETECTION TIME INTERVAL OF THE AISEC ALGORITHM VARIANTS, WHEN TRAINED WITH DATA SET ATM-data-set-A’ AND TESTED WITH DATA SET ATM-data-set-A”
TABLE VI MEAN DETECTION TIME INTERVAL OF THE AISEC ALGORITHM VARIANTS, WHEN TRAINED WITH DATA SET ATM-data-set-B’ AND TESTED WITH DATA SET ATM-data-set-B”
After a period of reinforcing effective B cells and removing ineffective ones, the classification accuracy stabilizes to indicate that the AISEC algorithm may have potentially reached an equilibrium. B. Mean Detection Time Interval The other criterion for evaluating the AISEC algorithm is the impact of the local AED on the availability of ATMs. The availability of ATMs could not be estimated from the usual equation due to the lack of information regarding the time after which an ATM resumes its operations after a fatal state (or a failure). Instead, the mean detection time interval was computed to infer how the adaptable error detectors would affect the availability of ATMs. The detection time interval is the difference between the time when a fatal sequence of states is detected and the time when the consequent fatal state occurs. By averaging the differences over the frequency of true positive detection it generates, the mean value as calculated from (2). The mean detection time interval is calculated from the difference between the time a system is expected to end at a fatal state (tf ) and time of correct error detection (tcorrect ), which is summed over the total number of times an error is correctly detected. The summed value is then divided by the frequency of correct error detection (Ncorrect ) Mean detection time interval =
Σ(tf − tcorrect ) . Ncorrect
(2)
This mean value was computed from the time stamps associated with each state, which were extracted from the ATM data used for the experiments. It is assumed that the mean detection time interval provides early detection of imminent failures to trigger the actions for avoiding system downtime, which increases the mean time to failure (MTTF) of a system. In the event that the failure event could not be prevented, the early detection reduces the mean time to repair (MTTR) a system.
For the experiments, the AISEC algorithm was trained using ATM-data-set-A’ and tested with ATM-data-set-A”, for assessing the local AED in an ATM located in a geographical region (data in ATM-data-set-A” are specific to an ATM). The training data, i.e., data set ATM-data-set-A’, were constructed from ATMs (apart from the testing ATM ATM-data-set-A”) colocated in a similar geographical region. The same applies to ATMdata-set-B’ and ATM-data-set-B” (see [1] for more details on these data sets). The results of the experiments are shown in Table V and Table VI, which should be read using the format days:hours:minutes:seconds, and the standard deviations are shown in brackets. From Table V, it is observed that the AISEC algorithm detected approximately 87.00% of the fatal sequences of states in an ATM, at an average of 12 h before the occurrences of the failures. Also, the AISEC algorithm in Table VI detected an approximate 90.00% of the fatal sequences in an ATM, at an average of 2 h prior to the failures. These results in Table V and Table VI support the inference that the immune-inspired AED, which has been evaluated in this work, is capable of detecting fatal sequences of states ahead of the occurrences of the actual fatal state. The earliness of the detection is determined by the window size, the classification threshold, and the time lag between the indicators of a fatal state and the occurrence of the fatal state. For example, the mean detection time interval of approximately 12 h in Table V mirrors the frequency of failures in the corresponding ATM, and likewise, the results in Table VI. This implies that the mean detection time interval of an AED in an ATM that fails less frequently is likely to be longer than another that fails more often. By demonstrating that the immune-inspired AED provides early warning on future failures, and given the supposition that the time interval prior to a failure is adequate for initiating timely actions to avoid or reduce system downtime, this work has shown that an implemented AED can lead to improved availability of ATMs.
884
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
C. Discussion of Results From the experiments performed, it has been concluded that the AISEC algorithm detects fatal sequences of states prior to the occurrences of failures, since the experiments performed using adequate training data and appropriate parameters. The results have showed that during the offline training phase, the classification performance of the static AISEC algorithm with the evolutionary process was not better than the static AISEC algorithm without the evolutionary process. For the online testing phase of the AISEC algorithm, it was confirmed that the incorporation of novel fatal sequences of states improved the classification performance of the AISEC algorithm, in contrast to the exploratory evolutionary process. This led to the conclusion that the continuous learning used by the AED is associated with an improvement in its detection capability. In this case, the continuous learning of new errors is through the incorporation of novel fatal sequences of states, but it was inferred that the online evolutionary process might need longer time to influence a significant improvement in the classification performance of the AISEC algorithm. It was also observed that the AISEC algorithm generated minimal false alarm rates. In addition, the continuous learning feature influenced the reduction in the false positive rate by enabling ineffective B cells to be purged from the naive and memory sets. Furthermore, the measure of time interval between the detection of impending fatal states and their occurrences provides the basis for stating that AED is useful for enhancing the availability of ATMs, by circumventing the fatal occurrences or quickening their repair. Therefore, an early detection of errors may reduce the MTTR of the target system or increase the MTTF. The results of the experiments also show that there is a tradeoff between the population of training data and the classification threshold parameter. With more training data, a high classification threshold may be used to achieve high true positive and low false positive rates. However, a small number of training data constrains the classification threshold to be lowered, in order to successfully recognize the fatal sequences of states. The outcome is an increased false positive rate of the AISEC algorithm. Another point worthy of note is that the early detection of imminent failures by an AED is influenced by the frequency of failures in an ATM. Another issue that was observed from the experiments was that online AISEC with incorporation of fatal sequences generated better classification performance than did the online AISEC with evolutionary process. This is attributed to the fact that the incorporation of novel B cells, representative of novel fatal sequences of states, is more inclined at producing effective B cells. On the other hand, the random changes through cloning and mutation did not generate effective B cells, which might require longer time to be effective at producing useful B cells. This leads to the question of whether an alternative way of mutating the B cells could be more effective. A suggestion is to select subsequences within a sequence of states for mutation, which is then mutated with a valid subsequence from the gene library [32].
IX. CONCLUSION We have proposed a framework for improving the availability of ATMs in which a key component is a local AED that was implemented by adapting the AISEC algorithm. The effectiveness of the local AED was established using data that correspond to the error incidences in the cash-handler module of an ATM. Results from the empirical studies has confirmed the efficacy of the local AED at forecasting system failures. A summary of the findings is as follows. 1) Detection of failure occurrences: The classification performance was derived from the classification accuracy of the AISEC algorithm (employed in the local AED), which is simply the number of failure occurrences detected out of the total number of failure occurrences reported in the system. From the outcome, classification accuracies of approximately 90% were recorded for one of the data sets. 2) Enhancement of availability: The local AED was assessed with regards to its ability to detect potential failures in the system before their occurrences. Through the early detection of failures, it is assumed that necessary repair actions could be undertaken to prevent system downtime. In other words, the MTTF of the system can be increased and MTTR can be reduced. Based on this criterion, the time intervals between the detection and occurrences of failures were monitored. From the estimated mean of these time intervals based on a particular data set, it was demonstrated that the local AED detected failure occurrences on an average of 12 h for one data set and 2 h for a second data set. These mean time intervals are absolute values, but their significance for carrying out repairs to circumvent failures are dependent on domain expert’s opinion. One of the limitations of the proposed approach is that it does not cater to rare events that may be associated with elusive faults, some of which can be every difficult to reproduce. The reason for this is that the algorithm stores only those recurrent system states as error detectors that are precursors of system failure. If it were otherwise, there would be an explosion of error detectors and an undesirable increase of false positives that might hinder the overall performance of the local AED. An alternative approach would be to have a separate system that monitors and identifies potential rare events. The incorporation of the correspondent error detectors into the main pool of detectors would be performed by maintenance personnel. The aim for future work is to implement network-wide AED, as initially foreseen in the framework. For that, it is necessary to specify the protocol for exchanging information between the local AEDs and the network-wide AED, and determine the minimum number of local AEDs that must propagate a novel error detector to the network-wide AED, before the detector is considered generic to the connected ATMs. Another area of research is to investigate the possibility of generating variable-length error detectors. Error detectors in the implemented framework were represented as fixed-length sequences of states that were
DE LEMOS et al.: IMMUNE-INSPIRED ADAPTABLE ERROR DETECTION FOR ATMs
terminated by fatal states. The motivation for constraining the sequences to fixed lengths was due to a lack of domain knowledge on the markers that tag the beginning of sequences. In a situation where the markers are known, it may be necessary to investigate the feasibility of exploiting variable-length sequences, error detectors, and window sizes. Above all, we would like to continue to explore other immune-based ideas. The framework for AED was inspired by the immunization and adaptability concepts taken from the immune system; thus, other immuneinspired ideas may be explored for error detection, such as the danger theory [33] and the self-assertion view of the immune system [34].
885
[20] [21]
[22] [23]
[24]
REFERENCES [1] M. Ayara, “An immune-inspired solution for adaptable error detection in embedded systems” Ph.D. dissertation, Comput. Lab., Univ. Kent, Canterbury, U.K., Sep. 2005. [2] A. Secker, A. Freitas, and J. Timmis, “AISEC: An artificial immune system for e-mail classification,” in Proc. Congr. Evol. Comput., R. Sarker, R. Reynolds, H. Abbass, T. Kay Chen, R. McKay, D. Essam, and T. Gedeon, Eds. Canberra, Australia, Dec. 2003, pp. 131–139. [3] L. N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach. New York: Springer-Verlag, 2002. [4] A. Aviˇzienis, “Towards systematic design of fault-tolerant systems,” IEEE Comput., vol. 30, no. 4, pp. 51–58, Apr. 1997. [5] S. Xanthakis, S. Karapoulios, R. Pajot, and A. Rozz, “Immune system and fault tolerant computing,” in Lecture Notes in Computer Science, vol. 1063. Berlin, Germany: Springer-Verlag, 1996, pp. 181–197. [6] D. W. Bradley and A. M. Tyrrell, “A hardware immune system for benchmark state machine error detection,” in Proc. Congr. Evol. Comput. World Congr. Comput. Intell., Honolulu, HI, 2002, pp. 813–818. [7] A. Aviˇzienis, J. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE Trans. Dependable Secure Comput., vol. 1, no. 1, pp. 11–33, Jan. 2004. [8] P. Jalote, Fault Tolerance in Distributed Systems. Upper Saddle River, NJ: Prentice-Hall, 1998. [9] C. Scherrer and A. Steininger, “Dealing with dormant faults in an embedded fault-tolerant computer systems,” IEEE Trans. Reliab., vol. 52, no. 4, pp. 512–522, Dec. 2003. [10] Y. Ishida, “Active diagnosis by self-organization: An approach by the immune network metaphor,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI 1997), Nagoya, Japan, pp. 1084–1089. [11] F. Mizessyn and Y. Ishida, “Immune networks for cement plants,” in Proc. Int. Symp. Auton. Decentralised Syst., 1993, pp. 282–288. [12] A. Ishiguro, W. Yuji, and Y. Uchikawa, “Fault diagnisis of plant systems using immune networks,” in Proc. IEEE Int. Conf. Multisensor Fusion Integr. Intell. Syst. (MFI 1994), Las Vegas, NV, Oct. 2–5, pp. 34– 42. [13] D. W. Bradley and A. M. Tyrrell, “Immunotronics: Hardware fault tolerance inspired by the immune system,” in Proc. 3rd Int. Conf. Evolvable Syst. (ICES 2000), vol. 1801, Apr., pp. 11–20. [14] D. W. Bradley and A. M. Tyrrell, “The architecture for the hardware immune system,” in Proc. 3rd NASA–DoD Workshop Evolvable Hardware, D. Keymeulen, A. Stoica, J. Lohn, and R. S. Zebulum, Eds. Long Beach, CA, Jul. 2001, pp. 193–200. [15] D. W. Bradley and A. Tyrrell, “Hardware fault tolerance: An immunological solution,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 2000, pp. 107– 112. [16] D. W. Bradley, C. Ortega, and A. Tyrell, “Embryonics + immunotronics: A bio-inspired approach to fault tolerance,” in Proc. 2nd NASA/DoD Workshop Evolvable Hardware. J. Lohn et al. Eds. Long Beach, CA, Jul. 2000, pp. 215–233. [17] D. W. Bradley and A. M. Tyrrell, “Immunotronics—Novel finite-statemachine architectures with built-in self-test using self-nonself differentiation,” IEEE Trans. Evol. Comput., vol. 6, no. 3, pp. 227–233, Jun. 2002. [18] P. D’haeseleer, “An immunological approach to change detection: Theoretical results,” Univ. New Mexico. Albuquerque, Tech. Rep., 1996. [19] P. D’haeseleer, S. Forrest, and P. Helman, “An immunological approach to change detection: Algorithms, analysis and implications,” in Proc.
[25] [26]
[27]
[28]
[29] [30] [31]
[32]
[33] [34]
IEEE Symp. Res. Security Privacy, Oakland, CA, May 1996, pp. 110– 119. B. H´erve, “A brief history of the prevention of infectious diseases by immunisations,” Comparative Immunol., Microbiol., Infect. Dis., vol. 26, no. 5, pp. 293–308, 2003. J. Kim and P. Bentley, “Towards an artificial immune system for network intrusion detection: An investigation of dynamic clonal selection,” in Proc. Congr. Evol. Comput. (CEC 2002), Honolulu, HI, May, pp. 1015– 1020. M. Neal, “Meta-stable memory in an artificial immune network,” in Proc. 2nd Int. Conf. Artif. Immune Syst. (ICARIS), J. Timmis, P. J. Bentley, and E. Hart, Eds. Edinburgh, U.K., Sep. 2003, pp. 168–180. ,J. Kelsey and J. Timmis, “Immune inspired somatic contiguous hypermutation for function optimisation,” in Proc. Genet. Evol. Comput. (GECCO 2003). E. Cant-Paz et al. Eds. Chicago, IL, Jul. 12–16,, pp. 207– 218. A. A. Freitas and J. Timmis, “Revisiting the foundations of artificial immune systems: A problemoriented perspective,” in Proc. 2nd Int. Conf. Artif. Immune Syst. (ICARIS), P. J. B. J. Timmis and E. Hart, Eds. Edinburgh, U.K., Sep. 2003, pp. 229–241. D. Goodman, L. Boggess, and A. Watkins, “Artificial immune system classification of multiple-class problems,” in Proc. Intell. Eng. Syst.. New York: ASME, 2002, pp. 179–184. J. Kim and P. Bentley, “Immune memory in the dynamic clonal selection algorithm,” in Proc. 1st Int. Conf. Artif. Immune Syst. (ICARIS), J. Timmis and P. J. Bentley, Eds. Canterbury, U.K.: Univ. Kent at Canterbury Printing Unit, Sep. 2002, pp. 59–67. A. Watkins, J. Timmis, and L. Boggess. (2004, Sep.). Artificial immune recognition system (AIRS): An immune inspired supervised machine learning algorithm. Genet. Progr. Evolvable Mach., vol. 5, no. 3, pp. 291– 318. [Online]. Available: http://www.cs.kent.ac.uk/pubs/2004/1634. M. Ayara, J. Timmis, R. de Lemos, L. N. de Castro, and R. Duncan, “How to generate detectors,” in Proc. 1st Int. Conf. Artif. Immune Syst. (ICARIS), J. Timmis and P. J. Bentley, Ed. Canterbury, U.K.: Univ. Kent at Canterbury Printing Unit, Sep. 2002, pp. 89–98. T. Stibor, K. Bayarou, and C. Eckert, “An investigation of R-chunk detector generation on higher alphabets,” in Lecture Notes in Computer Science, vol. 3102. Berlin, Germany: Springer-Verlag, 2004, pp. 26–30. J. Timmis, R. de Lemos, M. Ayara, and R. Duncan, “Towards immune inspired fault tolerance in embedded systems,” in Proc. 9th Int. Conf. Neural Inf. Process. (ICONIP), Singapore, Nov. 2002, pp. 1459–1463. S. M. Weiss and K. A. Casimir, Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann, 1991. V. Cutello, G. Nicosia, and M. Pavone, “A hybrid immune algorithm with information gain for the graph coloring problem,” in Proc. Genet. Evol. Comput. (GECCO 2003). E. Cant-Paz et al. Eds. Chicago, IL, Jul. 12–16, pp. 171–182. P. Matzinger, “The danger model: A renewed sense of self,” Science, vol. 296, pp. 301–305, Apr. 2002. H. Bersini, “Self-assertion versus self-recognition: A tribute to Francisco Varela,” in Proc. 1st Int. Conf. Artif. Immune Syst. (ICARIS), J. Timmis and P. J. Bentley, Eds. Canterbury, U.K.: Univ. Kent at Canterbury Printing Unit, Sep. 2002, pp. 107–112.
Rog´erio de Lemos (S’89–M’92) received the B. Eng. degree in electrical engineering and the M. Eng. degree in electrical engineering - control systems from UFSC in 1983 and 1988, respectively. He received the Ph.D. degree in computing science from the University of Newcastle upon Tyne, Newcastle upon Tyne, U.K. in 1994 He was a Senior Research Officer at the Centre for Software Reliability (CSR), University of Newcastle upon Tyne. He is currently a Lecturer in the Computing Laboratory, University of Kent, Canterbury, U.K. His current research interests include software architectures for dependable and secure systems, mainly in dynamic reconfiguration and bio-inspired computing applied to dependability.
886
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 5, SEPTEMBER 2007
Jon Timmis (M’02) received the Ph.D. degree in computer science from the University of Wales, Aberystwyth, U.K. in 2002. He was with the University of Kent, Canterbury, U.K. He is currently a Reader at the University of York, Heslington, U.K., in a joint appointment with the Department of Computer Science and the Department of Electronics. His current research interests include the computational abilities of the immune system, relating to computer science and engineering. He is on the Editorial Board of the Evolutionary Computation. He is a Member of IEEE since 2002.
Modupe Ayara received the M.Sc. degree in information systems from the University of Leeds, Leeds, U.K., in 2001, and the Ph.D. degree in computer science from the University of Kent, Canterbury, U.K., in 2005. She is currently with the BNP Paribas, London, U.K.
Simon Forrest received the M.Phil. degree in mechatronics from the University of Abertay, Dundee, U.K. in 1994. In 1994, he joined the Advanced Mechatronics Research Centre (AMRC), NCR Financial Solutions Group, Dundee, where he is currently a Senior Research Engineer and a member of the Advanced Technology and Research (AT&R), and is engaged in emerging wireless and software technologies as well as sensors and intelligent algorithms. He is the holder of several patents on the integration of new technologies to financial self-service. Mr. Forrest was awarded the 1993 AT&T Inventor of the Year Award for his work on expert systems.