A Hybrid Intelligent Architecture for Re ning Input Characterization and Domain Knowledge Ismail Taha and Joydeep Ghosh
Department of Electrical and Computer Engineering, University of Texas, Austin, TX 78712-1084 E-mail: fIsmail,
[email protected] Abstract: A Hybrid Intelligent Architecture that aims to exploit the complementary features of expert systems and connectionist architecture, is proposed to revise input characterization and initial domain knowledge. HIA has two building blocks, a Rule-Based module and a Connectionist Architecture module. A speci c format for the rule-based description of the initial theory acquired from the application domain enables its mapping into a uniform, three layer network. Continuous inputs are discretized into input vectors using a new coarse coding scheme. An extension of the Backpropagation Algorithm allows re nement of the discretization functions. A successful application to the control of dams on the Colorado river near Austin, is described.
1 Introduction
Hybrid Intelligent Systems that aim to exploit the complementary features of expert systems and neural networks paradigms, have been studied by several researchers [1, 5, 3, 8]. Knowledge Bases often suer from the knowledge acquisition bottleneck [4] and the brittleness of rules [7], particularly when domains are noisy or hard to acquire. Connectionist models, on the other hand, can be trained in a supervised or unsupervised fashion to perform reasonably well based on training data sets, even in domains with noisy data. However, they cannot readily incorporate domain knowledge or provide a symbolic explanation of results. In this paper, we augment a knowledge base system with a connectionist network that helps the former to re ne its domain knowledge. A novel feature is the ability to change the characterization of continuous valued inputs, based on training data. The proposed HIA has two main modules. The rst one is a rule based system. It represents the initial domain theory extracted from domain representatives. The second is a connectionist architecture. A rule based system is mapped into an initial connectionist architecture with uniform structure. During the training phase of the connectionist architecture, an Augmented Backpropagation Algorithm (ABA) with momentum term is used to enhance domain parameters and re ne inputs. Enhancing domain parameters better exploits the extracted domain knowledge to minimize the output error. At the end of the training phase, the nal connectionist architecture, with the updated weights and links, can be viewed as a revised domain theory. It can be converted back, if needed, to a rule based format to achieve the power of explanation [6].
2 Representation of Initial Domain Knowledge
The rst module represents the initial domain knowledge through Domain Operational Rules (DOR). The DOR are built using only the basic domain primitives which can be acquired easily from the domain without consuming much time or eort. They represent the initial domain theory and may not be sucient for representing the complete problem in a rule-based format. However, they can be used to build an initial rule-based expert system. The DOR format given below is a general rule based format that can be used to represent any rule-based system: w If Compound-Condition [OR Compound-Condition] ?! Consequent+ Compound-Condition ::= Simple-Condition j Simple-Condition \AND" Compound-Condition Simple-Condition ::= Boolean-Expression j Negated Simple-Condition Each rule has an attached value, w, which indicates the measure of belief or disbelief in the rule consequent provided the premise (left hand side) of the rule is true. Rule consequents are not permitted to be used as conditions in any other rule. Such a restriction leads to a simpler uniform connectionist network, as seen later.
3 The Node Links Algorithm (NLA)
The NLA utilizes a set of mapping principles to map the initial domain theory, represented in the DOR format, into an initial connectionist architecture that can learn more new domain concepts during the
training phase. It results in a three layer AND-OR tree, as exempli ed by Figure 1. Note that a Negated Simple-Condition is translated into a negative initial weight (?0:6 or ?0:7) for the corresponding link. Also, the NLA creates a hidden node even when there is only one Simple-Condition in the premise. In this case, the NLA creates a hidden node with one link to a single input node. This type of hidden node is named self-anded hidden node, because it ANDs one input node with itself. Therefore, output nodes are viewed as OR nodes and hidden nodes are viewed either as AND or as self-anded nodes. The NLA creates a light link between each self-anded node and all other input and output nodes that are not linked with it. Introducing such self-anded hidden nodes and light links provides the initial connectionist architecture with the power to learn more domain knowledge and extract new features during the training phase. The overhead due to the introduction of the self-anded nodes and their related links is much less than that incurred by interrupting the training phase and adding, heuristically, a random number of hidden nodes [2]. The generated initial connectionist architecture by the NLA has only three layers, independent of the hierarchy of the initial rule-based system and regardless of the nature of the application domain. Moreover, all hidden nodes functionally represent a conjunction concept and use the same activation function, which clearly improves the training phase of the initial connectionist architecture. This is in contrast to models that have variable network structure and hidden units with dierent functionalities. W1
Q1
IF A Q1 W2 IF B AND C Q2 W3 IF B OR C AND D Q3 W4 IF E AND D OR C Q2 W5 Q1 IF A AND C OR B AND D W6 IF A AND B AND D Q2 W7 IF C OR E Q4 W8 IF NOT B Q1
W5
NLA
Q2
Q3
Q4 W3
W1 W8
W5 W2
W3
W7 W7
W4
W4
W6
-0.7
A
(a) A simple rule-based system.
C
B
D
E
(b): A corresponding initial connectionist architecture generated by the NLA light links to and from the self-anded nodes
Figure 1: From rule-based system to initial connectionist architecture using the Node-Links Algorithm.
4 Connectionist Input Representations
Measured inputs in control domains are often continuous. Since the operational rules that represent the system use a multi-interval range to describe the application domain, a discretization function is needed to map continuous inputs into multiple interval inputs. Assume that a continuous measured input z 2 [a; b]. A discretization function is used to map it into a corresponding vector X: (x1 ; x2; :::; xn), where xi 2 [0; 1] 8i and n is the number of the discretized intervals. A continuous discretization approach is preferred to the binary discretization approach (xi 2 0,1) since it allows \coarse coding" i.e., more than one interval can be active at the same time with dierent certainty values, based on the value of the measured input z. Coarse coding is a more robust representation for noisy data, which is a prime objective here. This discretization process is a typical fuzzy cation approach for determining the degree of membership of the measured input z in each interval i. The value of each element xi is interpreted as the measure of belief that z falls in the ith interval. Degree of membership
The discretization result X = [0.39,0.75,0,0,0 ]
1.0 0.75 0.5
σ1
σ2
σ3
0.39 a
µ1
µ2
µ3
σ4
σ5
µ4
µ5
b
A Measured input feature z A continuous range between [a,b]
Figure 2: Discretizing a continuous measured input into n intervals using Gaussian Distribution Function A Gaussian distribution function with mean i and standard deviation i is selected to represent the
distribution of the measured input z over each interval i, so n Gaussian curves are used to fuzzify z into n intervals. As shown in Figure 2, a continuous measured input z has been fuzzi ed into an input vector X, resulting in x1 = 0:39 and x2 = 0:75. This fuzzy cation is done as a preprocessing phase to the initial connectionist architecture. The output of the fuzzy cation process, X, represents the activation values of the input nodes of the initial connectionist architecture where each interval is represented by an input node. IfP the application domain has k continuous measured inputs the fuzzy cation approach results in a total of ki=0 nk input nodes, where nk is the number of discretized intervals of the kth measured input.
5 Re ning Input Characterization
The initial connectionist architecture is trained by the output vectors (Xs) of the fuzzy cation function. Assuming that the measured input values are normally distributed within each interval i with mean i and standard deviation i, the Gaussian functions: 1 z? 2 i xi = fi (z) = e? 2 ( i ) (1) are used to discretize the measured input value z. An Augmented version of the Backpropagation Algorithm, ABA, with momentum term is used to train the initial architecture and stochastically search for the optimal weights. Moreover, the ABA is used to re ne the initial discretization parameters i and i for each interval i. The ABA calculates the stochastic gradient descents of the output error with respect to i and i and propagates them one more step back to the fuzzy cation function, i.e. to the external input of the connectionist architecture. Re ning the discretization parameters (i , i) makes the eective inputs to the feed forward network more useful for the decision process. Moreover, adjusting input parameters helps the connectionist architecture to extract accurate features from the measured inputs and learn more domain knowledge through the training phase. The chain rule were used to derive the derivative of the output error with respect to i and i: ?1 @E @E = @E @fi (z) = 1 (z ? ) hX w (2) ij i 2 @i @fi (z) @i i @wij j =0 ?1 @E @E = @E @fi (z) = (z ? i )2 hX wij @w (3) @i @fi (z) @i i 3 ij j =0 P ?1 wij @E represents the gradient decent of the output error propagated back to all Where the term hj =0 @wij @E s do not need to be recomputed as the h hidden nodes linked to input node number. Note that the @w ij they are already obtained from updating the weights into the hidden units. The center and width of the ith interval are adjusted as follows: @E + MomentumTerm (4) inew = iold ? @ i
6 Implementation
@E + MomentumTerm inew = iold ? @ i
(5)
The decision of specifying the amount of water that should be released from any of Colorado river dams or lakes is a complicated process. The Lower Colorado River Authority (LCRA) determines this decision of each dam or lake based on the current elevation of the water in the lake, the in ow rate from upstream dams and lakes, the out ow rate from the current lake, the predicted weather (rain fall rate), the predicted elevation of the downstream dams and lakes, and many other factors. We acquired the main operational rules of controlling the ood gates of Colorado river, speci cally in the greater Austin area, from dierent documents issued by the LCRA after the 1991 Christmas ood in Austin area. The rst module of the HIA was used to implement this domain knowledge in the DOR format. The NLA mapped these rules into an initial connectionist architecture. A data set representing 600 patterns was gathered from the LCRA historical records of the dierent dams and lakes and also the corresponding best decisions at each time based on the available data at that time. The acquired data set was divided into two sets. The rst one had 400 patterns and was used as the training set, the second was the test set. Each pattern includes the measured elevation of three lakes at some given time and the corresponding best decision. A Gaussian discretization function was used to provide the initial architecture with the
equivalent input vector X for each of the measured elevations. The constructed architecture has 23 input nodes, 18 hidden nodes, and 8 output nodes representing the dierent possible decisions at any time. The ABA was used to train the network. After the training phase the test data was used to measure how the hybrid architecture will perform in a real ood situation. The same data of the 1991 Christmas ood and the historically recorded decisions at that time were used and compared with the decisions taken by the proposed architecture. 94.2% of the time, the decisions taken by the HIA matched with the test data decisions. Most of the time the decisions taken by the HIA connectionist module were better than the actual decisions, and we were also able to get more than one decision at a time with dierent certainty factors. For comparison a fully connected MLP trained and tested with the same data sets used by the HIA. This MLP architecture does not have any priori knowledge about the domain. Only 72.4% of the time the decisions taken by the randomly initialized MLP matched with the best decisions. Moreover, the number of epochs needed to train this MLP was much more than the number of epochs needed to train the HIA connectionist architecture. Table 1 provides a summary of testing results for both the HIA connectionist architecture and MLP. Table 1: Test results (MSE and Classi cation rate) of HIA and MLP networks HIA MLP No. of epochs MSE Classi cation Rate MSE Classi cation Rate 1 0.096 76.176 1.367 64.0 10 0.015 93.323 0.889 68.5 20 0.014 94.234 0.334 71.7 30 0.013 94.234 0.233 72.38
References
[1] Fu, L.M. 1993. Knowledge-based connectionism for revising domain theories. IEEE Transaction on Systems, Man, and Cybernetics, 23(1), pp. 173-182. [2] Fu. L. M. 1994. Neural Networks in Computer Intelligence, Mc Graw-Hill, Inc. [3] Glover, C.W., Silliman, M., Walker, M. and Spelt, P. 1990. Hybrid neural network and rule-based pattern recognition system capable of self-modi cation. In Proceedings of SPIE, Application of Arti cial Intelligence VIII, pp. 290-300. [4] Jackson, P. 1990. Introduction to Expert Systems, Addison Wesley. [5] Lacher, R.C. 1992. Backpropagation learning in Expert networks. IEEE Transaction on Neural Networks, 3(1), pp. 62-72. [6] Murphy, P. M., and Pazzani, M. J., 1991. ID2-of-3: Constructive induction of N-of-M concepts for discriminators in decision trees. In Proceedings of Eighth International Machine Learning Workshop, Evanston, pp. 183-187. [7] Sun, R. 1994. Integrating Rules and Connectionism for Robust Commonsense Reasoning, John Wiley and Sons. Inc. [8] Towell, G. G., Shavlik, J. W., and Noordwier, M. O. 1990. Re nement of approximate domain theories by knowledge-based arti cial neural network. In Proceedings of Eighth National Conference on Arti cial Intelligence, Boston, pp. 861-866.