Mar 1, 1994 - fuzzy systems process the input signals to yield a control signal. This sample also ...... Commander Submarine Development Squadron 12. 1.
NUWC-NPT Technical Report 10,U2
AD--A280 641
1 March 1994
Concepts of Fuzzy Model Assessment P. R. Kemten Combat Control Systems Department
S. C. Nardone University of Masachusetts Dartmouth
iIC
,94--194!59
Naval Undersea Warfare Center Division Newport, Rhode Island DTIC QUALITY INSPECTED 5
Approved for public release; distribution is unlimited.
94 6 24
086
PREFACE
This report was funded under NUWC Newport IR/IED Project
No. 802424 "Fuzzy Expert Systems." The IR/IED program is funded by the Office of Naval Research; the NUWC Newport program manager is K. M. Lima (Code 102). The technical reviewer for this report was S. E. Maloney (Code 2211).
The authors gratefully acknowledge the enthusiastic support of K. F. Gong, S. E. Hammel, and W. G. Ravo for the Fuzzy Systems effort in Code 2211. The authors also wish to thank D. J. Ferkinhoff for supplying a copy of the compatability maps, some data samples, and many interestinp, conversations about the Contact Management Model AssessL.ent system.
Reviewed and Approved: 1 March 1994
P. A. La Brecque Head, Combat Control Systems Department
REPORT DOCUMENTATION PAGE
Formn APPrmod OW Na 0704-08
I Pi*subbe 1,
Ws~nam~, sUme Mw we. perepe..ee mabelfn swum %l r buhm inmhim 1 -00 MWug a%... 00M4 amm ~ubi~me i mp ue .sss~ dmO~hgeeeeseeb~r esL em Sen "emu awngm WOW.eE~ so"" e wow amp" WOW ende.. of -u1an WANeiudbggesunatr 11 qnsOft budan. W&ahdnwe Heedqmuut 8 eWise Weewdf SUbr~~. NRepeu~ww Og1,. W I21 . JIMSAAs Defs WNwA. Subt MCAtl VA U d WS O~folMOM f~tWW endW"0LW fttm otPW~ d --DC MISS
1
1. AGENCY USE ONLY ILseve blanI)
2. REPORT DATE II March 136F9
3. REPORT TYPE AND DATES COVERED _
*
_
_
__
_
_
5. FUNDING NUMBERS
4. TITLE AND SUBTITLE Concepts of Fuzzy Model Assessment ILAUTHOR(S) P.R.t Karstmn S.C. Nardone AN ADRES(ES NAE(S ORANIATIN 7. PRFOMIN 7. PRFOMIN NAE(S ANDADOESSES)REPORT ORANIZTIO Nava Uindersea Warfare Cafter Division Newport RI OM34-17W
8.PERFORMING ORGANIZATION NUMBER TRI1034
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDREESS(ES)
10.11Ja~ L RN AGENCY REPORT NUMBER
11. SUPPLEMENTARY NOTES
12a. DISTRRU)TION/AVAILABILITY STATEMENT
112b. DISTRIBUTION CODE
Approved for public release; distribution Isunihled
13. ABSTRACT (Rfdanhmr 200 words) This report discusses the concepts and mathemantical models neddto apply fuzzy system theory to contact mimagsaent model assessment. The detection and classification of propagation models awe essential components of contact tracking. The classification of propagation models Is a pattern reonItion problem, which Is addressed in this report by using fuzzy systemn theory. The sensor data are mdldwith fuzzy nsumbers, the decision ruleswe constructed using fuzzy rules, and the decision quality is evaluated by INterval-valued certaintlies This system Isone key conqmpoen to building a fu" expert system tracker.
14. SUBJECT TERMS Fuzzy Sefts Pattern Recognition Fuzzy Experts
Fuzzy Expert Systems;
17. SECURITY CLASSIFICATION
OF REPORT Uncltassified NON 740141-204500
15. NUMBER OF PAGES 62
Fuzzy Logic
I I
11. SECURITY CLASSIFICATION
OF THIS PAGE Unclassified
16. PRICE CODE
I I
19. SECURITY CLASSIFICATION
20. LIMITATION OF ABSTRACT
OF ABSTRACT Unclassified
SAR ftnard Fornme M nv. 246) Prescribed by ANSN Md. Z048
21111I0
TABLE OF CONTENTS
Section
Page
LIST OF ILLUSTRATIONS ............................................................................................. LIST OF ABBREVIATIONS AND ACRONYM S ...........................................................
ii i
1 1.1 1.2 1.3
INTRODUCTION .................................................................................................... Data Abstraction of the Track ......................................................................................... Classification of the Model .............................................................................................. Extension of the Classification Process ............................................................................
1-1 1-1 1-3 1-7
2 2.1 2.2 2.3 2.4
FUZZY DECISION SYSTEM S ..................................................................................... Embedding Fuzzy Systems .............................................................................................. Fuzzy Control Example .............................................................................................. Soft Decisions Versus Hard Decisions ............................................................................. Cluster Interpretation of Rules ....................................................................................
2-1 2-1 2-4 2-6 2-8
3 3.1 3.1. 1 3.1.2 3.2 3.3
FUZZY RULES IN CLASSIFICATION ......................................................................... Fuzzy Rules .................................................................................................................... Interval-Valued Certainties ......................................................................................... Propagation of Evidence Through Fuzzy Rules ............................................................... Decision Space ............................................................................................................... Aggregation of Certainty for Conclusions ........................................................................
3-1 3-1 3-1 3-3 3-7 3-9
4 4.1 4.2
FUZZY CM MA SYSTEM .............................................................................................. Data in the CM MA System ............................................................................................. Structure of the Fuzzy Rules ...........................................................................................
4-1 4-1 4-3
5
CONCLUSIONS AND FUTURE DIRECTIONS ...........................................................
5-1
6
REFERENCES ...............................................................................................................
6-1
APPENDIX - CONTINUOUS LINEAR PIECEWISE REPRESENTATION OF FUZZY SETS ........................................... Accesion For NTIS
DTIC
CRA&I
TAB
Unannounced Justification BY ........ ........ . Distribution I Availability Codes Dist
Avail andd /or Special
&I-
I-
A-I
LIST OF ILLUSTRATIONS
Figure
Page
1-1 1-2 1-3a l-3b 1-4a 1-4b 2-1 2-2 2-3 2-4 2-5 2-6 2-7 3-1 3-2 3-3a 3-3b 3-4 3-5
Solution IHierarchy Induced by the Piecewise Stationarity of the System Processes ....... 1-2 Fukunga's Flow Chart of the Process of Classifier Design ........................................... 1-4 Decision Rule in Statistical Pattern Recognition ........................................................... 1-5 Decision Rule in Fuzzy Pattern Recognition ................................................................. 1-6 Decision Rule with Interval-Valued Certainty ............................................................... 1-7 Decision Rule with Interval-Valued Memberships ......................................................... 1-7 Linguistic Variable COLOR ......................................................................................... 2-1 Fuzzy System Model ......................................................... 2-2 Typical Output Strength Calculation in Fuzzy Control Logic ........................................ 2-5 Hard Decisions vs Soft Decisions ................................................................................. 2-7 Interval-V alued Fuzzy Sets ......................................................................................... 2-8 Formation of Clusters in Model Assessment ................................................................. 2-9 Fuzzy Rule Maps to Interval-Valued Fuzzy Set .......................................................... 2-10 Fuzzy Data Fitted to a Fuzzy Premise ........................................................................... 3-1 Calculation of the Necessity and Possibility of A is B ................................................... 3-2 Calculation of the Possibility from the Data for the Moderate Term .............................. 3-3 Calculation of the Necessity from the Data for the Moderate Term ............................... 3-3 Linguistic Variable TRUTH ......................................................................................... 3-6 Alternate Definition of the Linguistic Variable TRUTH ................................................ 3-7
4-1
Regression Fit of Input Data, Model Fit by Fuzzy Numbers, then Fuzzify the Fuzzy Numbers from the Fit ...............................................................................................
4-1
4-2 4-3 4-4
Certainty Plot of Data Designed to Test for a Contact Maneuver, Hypothesis h2 .......... 4-5 Aggregated Certainty Plot for a Contact Maneuver, Hypothesis h2 .............................. 4-6 Certainty Plot of Low SNR Data Designed to Test for a Contact Maneuver, Hypothesis h2 ........................................................................................................... 4 -6 A-I Trapezoidal Membership Function, TRAPI ................................................................. A-2 A-2 Trapezoidal Membership Function, TRAP2 ................................................................. A-2 A-3 Points of the Universe of Discourse Where the Partition is Made ................................. A-2 A-4a Constructing the Common Partition of the Two Fuzzy Sets ......................................... A-3
A-4b Resulting Fuzzy Set Prior to Simplification of the Partition for the Fuzzy OR O perator ..................................................................................................................
ii
A-3
LIST OF ABBREVIATIONS AND ACRONYMS
bf bpa• BIOFAM
Base fiequency change Basic probability assignment Binary inpuoutput fuzzy associaive memory
CRI
Compositional Rule of Inference
CLOS CM1MA
Commn Lisp Object System Contact management model assessment
CDF
Cumulative density funimcon
FNN
Fuzzy neural network
hO h2 inf
Null hypothesis or no change in model Contact maneuver Infimum
LISP MF MPG NN nl PDF pp ROC
List processing language Membership function Modusponens generating function Neural network Unknown anomaly Probability density function Propagation path Rule of combination
FLOPS
FPR FS
LPC
s-norm SPR sup SNR t-norm
Fuzzy Logic Official Production System
Fuzzy pattern recognition Fuzzy syste
Linear piecewise continuous
Triangular conorm Statistical pattern recognition Supremum Signal-to-noise ratio Triangular norm
iilv Reverse Blank
CONCEPTS OF FUZZY MODEL ASSESSMENT 1. INTRODUCTION When tracking a contact via passive sonar, one finds the solution structure of the track is in large part determined by the stationarity assumptions made on the stochastic processes Sassocaed vwith the tracking solution. Stationarity is assumed for the noise processes, the kinematics of the two vessels, and the acoustic channel linking the source and the observer, which in our model corresponds to the contact and own ship, respectively. Essentially, one is forced to assume the processes are piecewise stationary, i.e., stationary for periods of time with the change points between stationary periods occurring quickly with respect to the expected length of stationarity. During periods of stationarity, a filter can track the contact. Between the periods of stationarity, the filter may lose track and should in any event be reinitialized after the switching period between stationary periods. Thus, one has a renewal process with periods of stationanty during which tracking is possible. The points of renewal in this process are the ends of the transition periods of stationarity. To successfully model this process, one must not only design the tracking filters for the stationary periods but also detect the change points and determine the new propagation model that will hold during the next period of stationarity. The essence of model assessment is the classification of the possible propagation models. Model assessment is a pattern recognition problem, which can be approached in many ways. Here a fuzzy system is used to determine the possible models and the confidence associated with each model. From this information, a decision is made as to which possible models should be maintained in constructing the tracking solution during the next period of stationarity. This report describes the application of fuzzy system modeling to propagation model assessment, with the emphasis on theoretical issues. 1.1 DATA ABSTRACTION OF THE TRACK The piecewise stationarity of the observation process is reflected in the tracking solution. The overall tracking solution is a string of tracks built up over many periods of stationarity. A track established on a single period of stationarity is called a segment. For a single contact, the
piecewise construction of the solution leads to the hierarchical representation of the tracking solution as a tree. For a set of contacts, the solution is a forest of trees. Segments are classified according to mechanism that caused the segment formation. The mechanisms include the following: 1. Change of propagation path (pp) (e.g., a change from a direct path to a bounce path
between source and observer or vice versa). In fact, any change in the propagation channel fits into this category including multi-bounce models. 2. Changes in the own ship heading, speed, or depth. 3. Changes in the base frequency of the source (bf), which should produce minor changes in propagation channel characterization provided the change in frequency is a small percentage of the base frequency. 4. Changes in the contact heading, speed, or depth (h2), which are usually observable in the Doppler and bearing measurements.
1-1
5. Changes in the system of unknown type (nl). Unknown anomalies are a way of specifying the class of changes not modeled so far.
Segments not caused by contact maneuvers are used to build up contact segments, or those parts of the track where the contact has not changed its kinematics. A sequence of contact segments is then used to forms a contact track, which is represented as a tree in this hierarchy. The set of all contact tracks is a collection of trees or a forest of trees, which is the contact situation assessment.
This solution construction is illustrated in figure 1-1, where on the right side of the figure a single physical contact track is drawn and on the left side is the corresponding data abstraction that represents the ideal solution for this track. In the physical track, the small white circles represent data points in the x-y plane. DATA HIERARCHY
PHYSICAL TRACK CONTACT ASSESSMENT
CONTACT TRACKS
CONTACT SEGMENTS
SHIP SOWVN /BSRQECN
SEGMENTS
PROAGATIO PATH
Figure 1-1. Solution HierarchyInducedby the Piecewise Stationarityof the System Processes Tracks formed during stationary periods are called segments and drawn as short solid lines fitted to the data points. The linear fit is the result of the tracking process or state estimation. This fit is sometimes called feature extraction, data reduction, or data abstraction. As long as the contact does not maneuver, the segments can be viewed as data points in a "super segment" called a 1-2
contact segment. Contact segments are drawn as a bold dashed lines in the physical track and there are two contact segments in figure 1-1 where the track evolves in time from the bottom of the figure to the top. The contact segments are concatenated to form a single physical track in this figure. Since the physical track is built up by successive data abstraction, a natural data hierarchy emerges as shown on the left-hand side of figure 1-1. The most basic data elements or data points form the first level of the hierarchical structure. Proceeding up the data hierarchy, groups of points generated during track formation produce the segments, groups of segments form contact segments, groups of contact segments form contact tracks, and at the top of the hierarchy, the set of all contact tracks form the contact assessment. Each level of the hierarchy has a corresponding physical meaning shown by the arrows. For the tracker to perform successfully, the system must be able to detect the change points in the stochastic processes that are affecting the system, and then classify them so that one can reinitialize the tracking filters properly. This later task, if successful, allows a savings of computational resources because the alternative is to initialize a new filter for every possible scenario and see which ones converge. Previous efforts in this field have used Dempster-Shafer evidential reasoning and compatibility maps that map observed effects to possible causes and, at the same time, produce a consistent basic probabilistic assignment (bpa) to the possible causes. Reference 1 describes one such effort. Although this subtask is critical, it is by no means the only issue in designing a Fuzzy Expert System. This report considers modeling issues relevant to the Fuzzy Expert System Tracker. Many of these issues are also relevant to the Contact Management Model Assessment (CMMA) simply because the tracker includes the CMMA functionality within the tracking mechanism. A more complete description on the CMMA system is contained in reference 2. This report emphasizes theoretical issues. The fuzzy CMMA system's description and performance is a topic for future work, once a suitable and correct data set is obtained. 1.2 CLASSIFICATION OF THE MODEL Classifying the cause of change points is a pattern recognition problem, which can be formulated using many different methods. Statistical pattern recognition (SPR) is one and fuzzy pattern recognition (FPR) is another. Although there are many different techniques and approaches to pattern recognition, they all require (1) a correct model of the problem, (2) extraction of a set of features that describe the differences in the classes, and (3) construction of a decision rule to map the feature space to the decision space. Fukunaga (reference 3) describes the pattern recognition process using the flow diagram of figure 1-2. In this figure, the first three blocks represent the initial data collection, processing, and exploratory data analysis. The data structure block consists of the search for structure in the data using clustering and modeling techniques along with data reduction or feature extraction. Feature extraction is an art requiring iterative refinement measured by the error estimates achieved. Finally, the features are used as input to the classifier designed to fit the data for the anticipated use of the system. The final product needs to be tested to validate the design procedure. Note the number of loops and the emphasis on nonparametric methods to determine the performance of the classifier, before it is built. The flow diagram assumes a valid data set, which is not available at this point. So the blocks that address normalization, error estimation, and actual clustering and modeling of the features will have to wait until valid simulated data are available or real data is found. So in terms of Fukunaga's flow chart, this report can only address the data structure analysis block and the classifier design block. The approach is to replace the classical decision rules and the feature extraction by their fuzzy system counterparts, which are the fuzzy rule base and the fuzzy term sets, respectively. This issue will be discussed sections 2 and 3.
1-3
REGISTRATION
SERROR ESTIMATION SERROR (NONPARAMETRIC)
ESTIMATION (NONPARAMETRIC) [
Sl•
TESTS STATISTICAL MODELING
SDTA
UCTURE' ANALYSIS
DSG
D
CLUSTERING
I•I•
OULNEAR
,FEATURE
EXTRACTION]
CLASSIFIER
QUADRATIC CLASSIFIER PIECEWISE CLASSIFIER i ERRO ESIMAIONNONPARAMERI CLASSIFIER
Figure 1-2. Fukunaga'sFlow Chartof the Processof ClassifierDesign The goal of pattern recognition is to design a decision rule that partitions the feature space into equivalence classes, one equivalence class per cause of change point. The extracted features are usually represented as vectors in a Euclidean space. Thus, the decision rule is a map from the observation space to the causes or to the description of the equivalence classes. So the decision fits the traditional representation of a decision rule as a map from the observation space to the set of integers j1,...,c}, where each integer represents a class (D.X -+ {l,...,c}). The statistical decision rule is shown in figure 1-3a as a map that places a unit value at the "proper" class element and zero values in all the other elements. So for this example, 8(x) = 2. In this figure, the one-dimensional observation space is partitioned into five disjoint regions, and the labeled areas are the characteristic functions associated with each of the regions. Classical sets may be defined in terms of a characteristic function that takes on the value I if the point is in the set and 0 if the point is not in the set, i.e., 1, xeA A(x) = 10, otherwise
1-4
For example, the second equivalence class has a characteristic function denoted by Z2 (x) = - (bjb2](x), where
J(bl,b2](x) =
,
oxhe rwis,
For this example, the statistical decision rule is given by
5
where only one of the Zi(x) can be non-zero. In the vector u(x) = [z1 (x),...,Z 5 (x)], each coordinate is associated with a class and so each decision is represented by a unit vector. So the vector of classes for figure 1-3a is u(x) = [0,1,0,0,0,]. The fact that only one element of the function can be 1 also follows from the definition of partition. A point may be in one and only one partition and one and only one equivalence class, and thus in only one class. This is known as a "hard" decision where the word hard refers to the all or nothing quality of the decision. In figure l-3a, the sample x can belong only to one class, class 2. Conceptually, the decision rule 3(x) and the vector u(x) merely represent the decision in different formats. In the latter format, 8(x) can be thought of as the index of the non-zero element in the vector u(x), and each element of u(x) can be interpreted to mean the degree that the data point belongs to the class associated with that index. In fact, the vector u(x) represents the characteristic function of the class chosen by the decision rule &(x), so in effect, the decision can be thought of as a map from the observation space to the space of characteristic functions representing the classes. Thus one could recast the decision rule as a mapping D:X -* {O,l}c, where {0,1} is the two-element set containing I and 0. MF
STATISTICAL Y4
I/
DECISION RULE
OBSERVATION SPACE1234 f
CLASSES
Figure 1-3a. Decision Rule in Statistical Pattern Recognition By replacing the partition of the data space by a soft partition, one can construct soft decision rules. In the previous paragraph, a decision rule was interpreted as a mapping to the set of characteristic functions. Characteristic functions reflect the philosophy of classical set theory: a point either belongs to a set (takes value 1) or it does not (takes value 0) - there is no inbetween. Similarly, fuzzy decision rules can be described as mappings to the set of membership functions (MFs) of fuzzy sets. MFs generalize characteristic functions and define fuzzy sets. When evaluated at a point, a MF determines the degree that the point belongs to the set. For a fuzzy set A, the MF is denoted by UA (x), and takes on values in [0, 1]. In SPR, a decision rule
1-5
partitions the observation space as shown in figure 1-3a, where each partition is a subset of the observation space and that subset is defined by a characteristic function, say Z 2 (x) as discussed above. In a fuzzy partition, a point in the observation space can be in any one of the classes with varying degrees of membership. The memberships are constrained by the requirement that they sum to one for each point in the observation space, i.e., C VXECXX,
i(x) = 1.
The decision is no longer a mapping into the set of integers {L...,c} but instead, into the vector space D. X
-
[0,1], just as the hard decision rule may be thought of as a mapping to the vector
space D.X -( {0,1}c, where u(x) = [XI (x),...,Xc(x)] represents the map. By softening the hard decision rule so that one has a fuzzy partition of the observation space, the decision rule is interpreted to be a vector-valued map to a fuzzy unit vector (fit vector), which is one representation of the MF of a fuzzy set. Here, u(x) = [pst(x),..., St(x)] is both the decision and the fit vector. Each element of the fit vector is the membership of the data point in the class denoted by the coordinate index. One class exists for each dimension of the vector. In the notation of fuzzy sets, the fit vector can be represented by p(x) = it,(x) / l+.- +pic(x) / c, where each pair ji(x) / i is the membership of the point in class i: one class for each dimension of the vector. Because the observation space partition is fuzzy, the vector of classes is no longer a unit vector, but only a vector whose elements sum to 1. Using the same observation space of figure 1-3a, consider the fuzzy partition shown in figure 1-3b. Now the classes are not disjoint and overlap exists between the classes. The partition of the observation space is fuzzy and is defined by the fuzzy sets, one for each class. As noted, the fuzzy sets are the generalizations of the characteristic functions defining the classical
partition of figure 1-3a. In a fuzzy partition, a data point represented by x in this figure, may support a decision with varying degrees in class 2 or class 3 or both. Thus, each possible decision has associated with it a membership value, which represents the truth that the data supports that decision. And in fact, the decision rule can be generalized further, so that the sum of the elements in the vector of classes does not have to sum to 1, but only that each element of the vector is bounded above by 1. In summary, SPR produces a single unambiguous decision, whereas FPR produces an MF that evaluates for each class to the belief that the data belong to that class.
DCIIO
1~2_ 3 i
OBSERVATION SPACE ;
1
2 3 4 CLASSES
Figure 1-3b. Decision Rule in Fuzzy PatternRecognition
1-6
S
1.3 EXTENSION OF THE CLASSIFICATION PROCESS This report takes the above concepts one step further where the decision rule maps from the feature space to interval-valued sets, one set for each class. So each element of the class vector is no longer a real number between 0 and 1, but instead a closed subset of the interval [0,1]. Figure 1-4a illustrates this decision rule for the same observation space of figure 1-3a. Classes 2 and 3 have some measure of support, but now that support is represented by an interval. In general, all the classes can have interval-valued support and the general form of the decision rule is p(x) = XA 1 (x) / I+.--+XAc (x) / c, where Ai is a closed interval. Each interval represents the range of truth the data can support the decision for that class, where truth is measured on a scale from zero to one. The data are represented by fuzzy numbers illustrated by the shaded region in the observation space. This vector of intervals is described in the literature as a fuzzy set of type 2 and called an interval-valued fuzzy set (reference 4, p. 14). The intervalvalued measure of truth is a generalization of FPR. Because the decision rule maps to a closed interval, which itself can be represented by a characteristic function or an MF, one can illustrate the decision rule as shown in figure 1-4b. Conceptually, no difference exists between figure 1-4a and 1-4b; the only change is the representation of the certainty intervals yielded by the decision rule. The decision system described in this report extracts features that are fuzzy sets, uses a collection of fuzzy rules to replace the decision rule of SPR, and uses interval-valued fuzzy sets to represent the output of the decision rule. Note that each interval in this figure may be the result of many rules. Each rule tries to assess the support a feature gives to some class. If an unambiguous decision is required, then further processing is needed. The important point is that the entire pattern recognition process is extended to produce not just a single membership value for each class, but a collection of MFs represented by interval-valued fuzzy sets.
FUZZY PARTITION OBSRVTIOSC 1
2
4
1
2CE4TAINTY FUZZY
5DECISION I
1
OBSERVATION SPACE
RULE
2 3 4 CLASSES
5
Figure 1-4a. Decision Rule with Interval-Valued Certainty
S•
OSRAINSAE12
f
3
4
CLASSES
Figure 1-4b. Decision Rule with Interval-Valued Memberships 1-7
In the following sections, fuzzy logic is applied to generate a decision rule of the form U(x) = A1 (x) /I+... +ZAc (x) / c and then applied to model assessment in contact tracking. In classical decision theory, the decision rule is derived by optimizing a functional such as the probability of error or another loss function. In this report, the decision rule is never explicitly stated because the certainty intervals are the result of a fuzzy rule base that models the governing expert decision system. In section 2, fuzzy systems are discussed in more detail to explain how the fuzzy features are extracted from the data and processed by fuzzy rules to yield a decision that is interval valued as shown in figure 1-4. In section 3, interval-valued fuzzy logic is described and used to solve the pattern recognition problem. This discussion includes the determination of the satisfaction of a fuzzy premise by a fuzzy feature, the representation of the strength of the rule, the propagation of evidence through the rules, and the aggregation of support for a given conclusion. In section 4, an exploratory version of a fuWzy model assessment system is discussed. Included in this discussion is the construction of the term sets, the rule syntax and implementation, the object-oriented aspects of the program, and examples of the outputs. In the last section, the implications of this work are given and possible extensions suggested.
1-8
2. FUZZY DECISION SYSTEMS Fuzzy decision systems use fuzzy logic to reason about decisions. These systems represent signals as fuzzy numbers or linguistic variables and employ fuzzy rules to process these signals. In this section, a cursory description of linguistic variables is presented and the basic concepts of fuzzy systems are described. A simple control example is given to illustrate how fuzzy systems process the input signals to yield a control signal. This sample also gives the simplest illustration of a certainty measure associated with the conclusion. Fuzzy decision systems are compared with classical decision systems and a physical interpretation of the system flow is given. 2.1 EMBEDDING FUZZY SYSTEMS To understand fuzzy decision systems, one must first understand the signals that flow through them, i.e., the linguistic variables. Formally, a linguistic variable is defined (reference 5, p. 132) as a quintuple (x, T(x), U, G, M) where x is the name of the variable, T(x) is the term set of the variable x, U is the domain of definition of the variable x, G is a rule that names the terms and M is the fuzzy set that is used to define each term. All this formality can be easily explained using an example. Consider the linguistic variable called COLOR, where the linguistic variable name is x = COLOR. Figure 2-1 illustrates this linguistic variable, where the term set T(x) = (RED, YELLOW, ORANGE, GREEN, BLUE, INDIGO, VIOLET). The rule G that generates the names of the terms is just the list of basic colors but, in general, G can be a complicated grammar. The universe of discourse U or domain of definition is the frequency of the visible spectrum. Finally, the meaning or semantic definition of the terms is given in this example by a fuzzy set, drawn as a bell-shaped curve around the central frequency associated with the color.
:•:'::TERM
MEANING OR
REDDISH-ORANGE
FREQUENCY UNIVERSE OF DISCOURSE
Fkure 2-1. Linguistic Variabl COLOR
2-1
The shape of the linguistic terms is important and are usually defined mathematically by the MFs such as pjtw(x) . Intuitively, this function ;tsw(x) is interpreted as the degree of membership of the frequency x in the set of RED colors, or how "red" is the frequency x. The range of the function gjim(x) is (0,1], where 0 means no membership in the set and I means total membership into the set of red colors. Partial membership is best illustrated by the example of a sunset - which is reddish-orange, having membership in both the RED and ORANGE terms. In figure 2-1, this frequency is labeled reddish-orange. Another example in this figure is the frequency labeled chartreuse, a frequency half-way between GREEN and BLUE. The terms of the linguistic variables can divide the space producing a fuzzy partition, i.e., VxEX,3atleast one y s.t.- r(y)(X) > 0 and XuArc,(x) = 1, T
so that at every frequency at least one MF is non-zero and the sum of the MFs at that value of frequency is 1. Note the terms of the linguistic variable COLOR do not form a fuzzy partition because by inspection, at the frequency located between BLUE and GREEN, the sum of the membership values is less than one. Fuzzy partitions must then have special term sets. Some of the concepts of a fuzzy system for model assessment are best described by using linguistic variables. Bezdek outlines these concepts best in his introductory article in the first issue of the IEEE Transactionsof Fuzzy Systems (reference 6). Figure 2-2 illustrates the fuzzy system model used in this report and reference 6. In this figure, the data are first fuzzified, i.e., transformed or mapped into linguistic variables. The fuzzy system (FS) processes these linguistic variables by using rule-based methods and obtains one or more conclusions. The processing is carried out by the fuzzy inference engine using the rules stortja. the fuzzy rule base where application-specific knowledge is contained in the data term sets. So the processing solves the problem. However, the rules yield answers that are in terms of fuzzy sets or in terms of conclusions with associated certainties. To apply the answers, one must be able to project back into the control or decision space. In a control problem, this space is a control value or vector. In a decision problem, this space is a finite set of decisions. The process of projection is called defuzzification. The four-step process is summarized by Bezdek as "fuzzify, solve, defuzzify, control" (reference 6, p. 3).
Figure2-2. Fuzzy System Model This solution process is similar to using Fourier transforms to solve a linear system. Here
one transforms the input into the complex frequency domain, multiplies the input transform by the system transform, thereby solving for the output in the transformed space. However, to know what this output is, the inverse Fourier transform must be applied to return to the time domain:
2-2
transform, solve, and take the inverse transformation. Using transforms is similar to, but not the same as, the four-step process of fuzzy systems. Fuzzy systems are similar in that one transforms to another representation to make the problem easier, and then converts back from that representation. However, the Fourier transform is invertable in some sense because the inversion from the transformed space totally recovers the signal for a given class of signals. The linguistic variables used in the fuzzy system are not just another representation of the input. They are in a sense, an embedding of the input into the space of linguistic variables. This process can represent numerical vectors or functions without loss of fidelity. Since linguistic variables can represent data exactly, one could think of these as embedding the input into this larger space. However, this is not usually the case. Once the data have been fuzzified, i.e., mapped into linguistic terms, usually some information has been lost so the input cannot be retrieved. The important system information has been extracted from the input, and this information is then used to solve the problem. The model trades representational complexity C solution tractability and robustness. Embedding as a solution technique has long been used by mathematicians. An exampi, is the evaluation of real integrals by complex analysis. Here one embeds the integrand in the complex number system, and the path of integration within the complex plane. The path of integration includes the real number line, so that one can apply the residue theorem in evaluating the integral on a closed path, thus obtaining the real integral as part of the entire solution. Here
the portion of the integral along the real number line represents the solution and the process of projecting back onto the real number line becomes a triviality. The similarity is more evident when the process is described as complexify, solve, and decomplexify (reference 6).
For the system to solve the problem, linguistic variables are used to frie fuzzy rules. The rules are mappings from the transformed input space of linguistic variables to the solution space of linguistic variables. Multiple levels of rules can be thought of as a composition of mappings. Unfortunately, these mappings are multivalued. So for control problems, the rules can produce ambiguous results. In the case of decisions problems, the rules can produce a set of possible decisions. In binary logic, either a rule fires or it does not, depending on the premise being satisfied or not. But in fuzzy logic, all the fuzzy rules fire with varying levels of satisfaction of the premise and with varying levels of confidence in the implication, leading to conclusions with varying levels of belief. These levels of satisfaction, confidence, and belief are termed certainties, and the passage of confidence through the rules is called propagation of evidence or propagation of certainty through the rules. With multiple answers at the output of the system, the control designer has to take the multitude of answers and disambiguate or interpolate between them to obtain a single value. This process in FSs is called defuzzification, which is analogous to the projection back into the solution space when solving by embedding. In decisions systems, multiple answers mean the designer must select a subset of the solutions and use these to further the decision process in some manner. Methods for handling ambiguity in the decision process is an active area of research. The philosophy of a fuzzy system bears a resemblance to the linear systems concept of resolution of signals into sinusoids or impulse functions. In fuzzy systems, the representation or signal space is now linguistic, which resolves the signal into membership in the term sets of the linguistic variable. The solution technique uses fuzzy rules applied to the input term sets and yields an output fuzzy set or a set of decisions. In linear systems, the signal resolution is combined at the end of the analysis using superposition. In fuzzy systems, the output linguistic terms are aggregated in a non-linear fashion. The defuzzification of the output term often requires a functional to map the output to the solution space. The strength of fuzzy systems is that in some circumstances, fuzzy rules can model nonlinear systems easier than more 2-3
mathematically precise methods. The rules can be heuristic in nature and derived not just from
the physical model of the system, but from the experts' knowledge of the behavior of the system learned over time. The learning need not be from just human experts, but can also be derived from learning methods such as neural networks (NNs). Fuzzy systems are applicable to model assessment because humans can take the sensor readings and, using heuristics, determine what type of propagation path or model is now valid. Heuristics can be implemented as fuzzy rules and it is hoped that these rules will be as efficient and as accurate as human observers. The process is one of pattern recognition, i.e., given a set of tracking residuals and knowing the current track, determine the new propagation model. Although this can be modeled as a statistical pattern recognition problem, it becomes a hard decision problem, i.e., only one decision is accepted. However, this really does not make sense from an operational point of view. If the data do not support a clear and obvious model for the propagation, but instead a set of possible models, then all of these models should be pursued in parallel until further evidence resolves the ambiguity. Clearly, it is more efficient to resolve this ambiguity as soon as possible, and this is part of the reason for doing the work. However, if one discards the correct model by using a hard classifier, then the contact can be lost if the correct model is not used in the tracking process. The approach is to make a soft decision, clearly identifying the most promising model candidates with the hope that the true model is one of the model candidates. 2.2 FUZZY CONTROL EXAMPLE The four-step solution sequence is "...fuzzify, solve, defuzzify, control" (reference 6). Fuzzification is based on the definition of the terms in the linguistic variables. For any data value or input x the nonzero terms t that have support, i.e., {tlteT(x), pt(x) > 0} are used to code or to fuzzify the data into the terms of the linguistic variable. Lotfi Zadeh, the father of fuzzy sets and systems calls this "fuzzy quantification." So in the example of the sunset, the terms supported by the reddish-orange hue are (RED, ORANGE). For a sensor measuring some quantity such as speed, the terms may be (ZERO, POSITIVE SMALL). It is important to remember that associated with each value x and with each applicable term t, a membership value exists describing the degree of membership in this term. This value is very useful as illustrated in the following control example. A taxi driver who is approaching a red light, must brake to come to a safe stop. The control variable is the rate of braking and the two inputs are the speed of the car and the distance to the stop light. In this simple example there are only two variables and two fuzzy rules used to determine the braking rate. The two rules have the following form: RULE 1: IF the speed of the car is HIGH, AND the distance to the stop light is NEAR, THEN the braking should be HARD. RULE 2: IF the speed of the car is MEDIUM, AND the distance to the stop light is MEDIUM, THEN the braking should be MEDIUM. Where the term set for speed is ISMALL, MEDIUM, HIGH), the term set for the distance is (NEAR, MEDIUM, FAR) ,and the term set for the braking is ILIGHT, MEDIUM, HARD, PANIC). Both rules are illustrated in figure 2-3: the first rule is on top and the second rule on
2-4
the bottom. The terms are defined by triangular fuzzy sets that cover the three universes of discourse, i.e., speed, distance, and braking rate. The important point is that the fuzzy rules are approximations to an exact analytical description of the system. In this figure, the input speed x has membership in both terms (MEDIUM, HIGH) of the linguistic variable speed. The degree of membership is evaluated using these term sets. Similarly, the distance y to the red light has membership in INEAR, MEDIUM). The minimum value of the memberships of these two inputs is used to determine the "strength" of the conclusion, which is a term of the linguistic variable called braking rate. 1AMF
MEDIUM II
RULE 1,PREMISE MFMMEDIU MF
RULE 1, CONCLUSION MEDIUM
MF
MDU
IAM
MF RULE 2, CONCLUSION
2, PREMISEMM-DU
AFRULE
Z
BRAKING
Y
ANCE
SPEEI I3
I 2, IPRS RULE
RULE AGGREGATION CONTROL RULES1 RULE 1: IF X ISHIGH AND YIS NEAR THEN BRAKINGuL 2IS HARDl
DEFUZZIFIED RESULT m[z
IF XIS MEDIUM AND YIS MEDIUM _BRAKING
THEN BRAKING IS MEDIUM
Figure2.3. Ty;pical Output Strength Calculationina Fuzzy ControlLogic The two braking rules yield different braking rate terms, which are aggregated to yield a single output braking rate. The aggregation procedure is called "defuzzification" and is a simple is output of the rule strength fuzzy sets.Y.. Again, the relative averaging of the areas in the output .. . .. .. . . . .. ..... . EI NP O T determined by the minimum of the memberships of the inputs in each of the input premises. In effect, the strength of the output for a rule is determined by the degree of satisfaction of the premise clauses. Premise satisfaction or certainty and its propagation through the rules are an inherent part of the solution technique. In fact, data are evidence only if they satisfy the premise as measured by the certainty. Propagation of the certainty through the ply to determine the certainty of the conclusion is the propagation of evidence through the fuzzy rule. The single
value obtained in the solution space by defuzzification is the control part of Bezdek's
"...
fuzzify,
solve, defuzzify, control."
2-5
. .
.
.
. 1F
Note in this example, the certainty of the premise or its validity or its degree of satisfaction is a single-valued real number. The strength of the premise is equivalent to the
notion of the MF value or validity. However, certainty can be represented in other forms besides
a single-valued real number between zero and one, such as interval valued and functional valued. In this report, the interval-valued version of the certainty is used, which is more general than the single-valued certainty illustrated in the control example. However, Bezdek's model still
holds. The main change is in the certainty representation, which will be discussed in detail in section 3. 2.3 SOFT DECISIONS VERSUS HARD DECISIONS In section 1, it was pointed out that a hard decision produces a single cause or conclusion and a soft decision produces either a list of possible conclusions or a list of conclusions with a relative ranking. Dempster-Shafer theory of evidential reasoning is an example of soft decision. The support for the soft decision is on the set of all subsets of the possible conclusions. So if the possible conclusions are (hO, pp, bf, h2, nl}, then the decisions of the form "bf-or-h2" are allowable decisions. SPR produces a hard decision since one is essentially doing an N-class test of hypothesis where the decision is one of N disjoint conclusions. SPR produces a single conclusion after the test is made. FPR can do the same thing if the decision process is based on Bayes classifier as described in reference 7. Here, fuzzy clustering is used to replace the standard statistical estimation of centroids and dispersion, and then these estimates are used in a maximum a posteriori decision rule. The answer is still "hard," only one decision value. Assume, as discussed in section 1, that one has a vector, u(x) = [u1, ... ,uc] then a "hard" decision is represented as a unit vector where only one component has a value of one, and all others have a value of zero. A data vector is represented by a boldfaced letter, e.g., xt. With unsupervised clustering, one learns about the structure of the data without the aid of labeled data or as is said in some cases without a teacher. As the data are clustered, each data point xk has associated with it a fit vector u(xk) = [I#l(xk),...,pc(xk)], representing the memberships of xk in each of the c classes. This vector can be thought of as a fuzzy unit vector or fit vector as called by Kosko (reference 8) or as a fuzzy set and represented by Pt(xk) =/•1(xk)1/l+-..+/1c(Xk)1/, which is the standard fuzzy set notation. The only requirement placed on the components of this vector is that they sum to 1, i.e., C
XuPi(xk) = 1. i=1 As shown in figure 2-4, this means that for three dimensions, the vector is constrained to lie in the plane illustrated by the shaded part. For hard classifications, the constraint is even more severe; the membership vector or fit vector is required to be one of the unit vectors, also illustrated in figure 2-4. An application is the fuzzy k-nearest neighbor algorithm. One finds the 2-6
k-nearest neighbors, and then averages their membership vectors. The membership vector indicates to what degree the sample vector x&belongs to each class. If one must make a hard decision, then the class with the maximum component will provide the hard decision. Ties can be broken randomly or arbitrarily.
U• ... .
,1
.........
Us
F:•,ure 2-4. HardDecision, vs Soft Decision,
The type of classification or pattern recognition occurring in model assessment is even
more general than the fuzzy nearest neighbor. Associated with each class is a closed interval, which represents the lower and uppe~r bounds on the membership in that class. So, the membership vector has now been replaced with what is called an interval-valued fuzzy set. Klir defimes the interval-valued fuzzy sets as "... membership functions of the form: PA: X -- , P([0, 1]) where P.(x) is a closed interval in [0,1].x. for each "(reference 4, p. 14). Here X, the universe of discourse of the fuzzy set, is one of the possible classes, and the interval associated with each class is the closed interval represer'ing the lower and upper bound associated with the
certainty of membership of each class. This situation is illustrated in figure 2-5, for a five-class problem. The fuzzy set is given by (x)=IA 1
+-.+ .(xk) .
.(xk) . c,
where instead of single-valued membership functions of figure 1l-3a, one has interval-valued membership functions as illustrated in figure 1-4. In this fuzzy set, the integers (1,.. .,c}, c = 5 are the decisions and the universe of discourse. The support for each decision is a set represented by the indicator function fA(x), which is 1 if eX A and 0 elsewhere. For the interval-valued fuzzy sets,the setAhasthe special form A=s[a,a 2 t,where ac is the lower bound of the supportand 2 t is the upper boundofthesupport. AsinFPR,thereis n clear-cut winnersince each of the possible conclusions can have support. The process of aggregating this support will be discussed later.
2-7
1
2
3 4 6 CLASSES
Figure 2-5. Intral-Valued Fury Sets
2.4 CLUSTER INTERPRETATION OF RULES Supervised clustering occurs when data samples that lead to a known decision are clustered. In effect, one is approximating the decision function by determining the pre-image of a known conclusion. The decision function then can be specified by 8(xk) = i 4• xkeXi, where Xi is the i-th cluster. Essentially, these clusters are fuzzy rules mapping portions of the input space to integers in the output space. The integers are the classes that are referred to as models such as bf, h2, pp, hO and nl, as well as the index into the vector of classes. Kosko points out that "The key idea is cluster equals rule" (reference 8, p. 330). Although Kosko was referring to a very specific case of a binary input/output fuzzy associative matrix (BIOFAM), the general idea holds here as well. Figure 2-6 illustrates the formation of clusters or rules for model assessment. In this figure, the feature extraction from the data does the same thing that a Hough transform would, i.e., maps lines to points. However, here is a line represented by a two-dimensional point (intercept, slope). The slope is called the drift and the intercept is called the jump because of its historical reference to the bearing tracks. The feature space in figure 2-6 is a four-dimensional space, and the cluster are mapped to the decision called h2. The point is, that this cluster represents a region of the feature space that maps to an integer in the decision space. Intuitively, these mappings are the fuzzy rules. In this report, features are extracted from the data to drive the decision system and the features are then modeled as fuzzy sets. Using the variance of each feature, a normal density is fitted to the feature. When renormalized, this density is interpreted as a fuzzy number that is treated as input data. It is this interpretation that allows the decision process to be modeled totally in terms of fuzzy constructs. In fact, the entire system is a fuzzy expert system where the data objects are fuzzy sets and the inference engine uses fuzzy logic. The fuzzy rules are viewed as composite mappings from the input space to the decision space. The steps are summarized as follows:
2-8
t
AZMUWHETR~
S....
0CASSES'IWEENE 0 0°
0
....
JUMP
u~~,a
T
U
gl-
FEATURE
B
ooUM"o"L Figure2-6. Formationof Clusters in Model Assessment
1. The data are fitted using linear regression techniques. 2. The features are extracted from the linear fit and their densities are re-normalized to construct fuzzy numbers. These numbers now form the input data to the fuzzy system whose general structure was illustrated in figure 2-2. 3. The clauses of the fuzzy rules and the fuzzy partition of the feature space are then used to construct closed certainty intervals associated with each clause. These intervals are, in turn, combined to form a premise certainty interval. 4. The premise certainty interval is mapped to the output decision space using the strength of the rule itself. Figure 2-7 illustrates this mapping procedure. Note that the clusters illustrated in figure 2-7 are two-dimensional, but this is not how they are presently implemented. Instead, they are implemented as two one-dimensional clusters. This figure is for conceptual purposes only. As
mentioned in section 2.2, the decisions are described in terms of interval-valued fuzzy sets. So
conceptually, the fuzzy rule is a mapping from regions in the input space to the output space, but in this case, because of the form of the input data and the output decision, the mapping is harder to visualize. But conceptually, Kosko's claim remains, "...cluster equals rule."
2-9
ABEARINGBERN
RESIDUALSEXRC
%2~ Fiur 2o7 Fuz
2-10
E0RAC FEATURE
~
OO
T
I
ueMp onevlVle
uzySe
3I FUZZY RULES IN CLASSIFICATION 3.1 FUZZY RULES In section 2, fuzzy rules were described as mappings from a cluster or region of the feature space into the decision space. Propagation of certainty through fuzzy rules was also discussed and illustrated in a simple control example. The main components of Bezdek's description of a fuzzy system were illustrated in this example, namely," ... fuzzily, solve, defuzzify and control." In the taxi driver example of figure 2-3, the certainty was represented as a real number in the interval [0,1]. In this section, the model is extended to include a more sophisticated method of certainty representation - an interval-valued representation. For the remainder of this report, only the interval-valued certainty representation is discussed. Intuitively, this interval can be thought of as lower and upper bounds on the satisfaction of an event, premise, conclusion, or fact. 3.1.1 Interl-Valued Certainties
First consider how interval-valued certainty arises from the interaction of the fuzzy data
and the fuzzy rule. When the data is itself uncertain, the input is modeled as a fuzzy number, which is a special type of fuzzy set. A fuzzy set is represented by an MF, which is a generalization of the characteristic function used to express sets in real analysis. So it is a mapping from the domain of definition X to the interval [0,1] denoted pA2(x):X -- [0,1]. A fuzzy number is a convex normalized fuzzy set (reference 4, p. 17), normalized meaning the MF has a maximum of 1, or (x)11A W = 1. Intuitively, the convexity of a fuzzy set implies that it looks like a bell-shaped or single-mode curve. Formally, convexity of fuzzy sets is defined in terms of a-cuts of the MF, which are crisp sets representing the region of X with membership greater than or equal to a, i.e., the crisp set defined by A. = (x I xeX, PA(X) >O a]. A fuzzy set A is convex if and only if all of its a-cuts are convex. In figure 3-1, both the data and the property are convex sets, and this makes the derivation of the certainty interval easier. 1
~PROPERTY
Figure3-1. Fuzzy DataFited to a Fuzy Premise Certainty intervals are derived in CMMA from the fact that the input data are fuzzy
numbers, and the properties defining the clauses are represented as fuzzy sets. Again, in figure 3-2 one sees the property as p. (x) and the data as pA (x), except here only one term set is shown. Conceptually, the upper bound of the certainty interval represents the amount of overlap of the fuzzy data A and the fuzzy property B. In possibility theory, the standard intersection of two fuzzy sets is given by the minimum function, i.e.,
3-1
PAcnB(X) = min[UA(x),PB(x)], xXF and the maximum of pArB(x) represents the largest degree of overlap. This minimum of two fuzzy sets is the counterpart to the logical AND. The standard union of two fuzzy sets is given by PAJB(x) = maxWA(x),PB(x)]. xe, The union is the counterpart of the logical OR and complementation is the counterpart of the logical NOT. Complementation is represented as J-(x) = 1- PA (x). Both of these concepts are needed to construct the possibility and the necessity. The lower bound of the certainty interval is represented by the measure of subsethood of fuzzy set A in fuzzy set B, or the degree that A is a subset of B. Mathematically, the upper bound is defined as the possibility:
l = Smin[,pA(x),pB(x)]. The lower bound is defined as the necessity: N = inf max[1 ZEX
A(X),A(X)].
The graphical construction for both of these measures is illustrated in figure 3-2.
I
I
X 0
0 POSSIBILJTY = suP mU/ X
A (x),P B(x)
x
NECESSITY =nf mmaxf -p A (x), PB(X)J X
Figure3-2. Calculationof the Necessity and Possibilityof A is B In this example, the data are used directly to calculate the bounds. In this model assessment system, the fuzzy data are first fitted by a piecewise continuous approximation and then the necessity and the possibility are constructed. This approximation increases the speed of the system by reducing the time complexity (I the problem. For example, in figure 3-3a the fuzzy input is first approximated with a trapezoid, and then the possibility calculation is based on the trapezoid. The necessity is illustrated in figure 3-3b. The certainty interval that results is given by [N,1].
3-2
FEATURE KNPtI
o
Figure 3.3a. Calculation o~f the Possibility from the Data for the Moderate Term
0
FEATURE INPUT
Figure3-3b. Calculationof the Necessity from the Datafor the Moderate Term Note that when the data are singletons, then fl
=
P
=/2re(x),
and the interval reduces to
the single-valued representation used in the control problem example of the taxi driver. In fact, the control problem can be considered a special case of the decision problem where there is only one• class; instead of estimating the certainty that each class has occurred, one uses the fuzzy rules to estimate the control parameter itself. The next section discusses the propagation of evidence through fuzzy rules when the certainty representation is interval valued. 3.1.2 Propagationof Evidence Through Furzy Rules
In the taxi driver control example of figure 2-3, the propagation of evidence through the fuzzy rules was described as using the minimum of the MF values for the two inputs. More specifically, the two input variables, speed and distance, were the arguments to term sets known sets and when evaluated at these as HIGH NEAR. membership The term sets werewas triangular inputs, theand mini-um value used to fuzzy truncate the output fuzzy set for the braking
control capled HARD. See th fist e
ecin fiue2-3. In particular, the output is the rate of
.•z = min(p,,.•(z),minWm1 , f(x),p•N,,(y)]) so the certainty of the output can be thought of as min[•py, 0 1 j(x),p•,.•.(y)]. This concept is generalized for the
braking and is given by/2x
interval-valued certainty measures.
In this report, the fuzzy model assessment uses interval-valued measures of certainty, and the propagation of certainty is based upon t-nores, s-norms, and the detachment operator. These *
operations are generalizations of the logical operations of conjunction, disjunction, and used to of propagate thepropagation same look and feel as the implication, respectively, and haveThe evidence in the control problem. of certainty andoperations the propagation evidence
fphrases are used interchangeably. In section 3.1, the certainty of the premise was derived using the necessity and the possibility. When the premise has multiple clauses, the clause certainties are aggregated to yield the premise certainty. Intuitively, aggregation means the combination of
3-3
certainties and the mathematical definitions are deferred until section 3.3. In this report, conjunctions are allowed in the premise clauses but premise disjunctions must be expanded into distinct fuzzy rules. For a more complete description of this approach, Bonissone's papers (references 9-12) are recommended to the reader. For a complete discussion of the representations of certainty, refer to reference 13. To aggregate the premise certainty, the t-norm is used. Since the triangular norm or tnorm behaves like a logical conjunction, the strength of the premise is based on the weakest clause. The t-norm is a binary operator denoted by T(x,y) where xye[0,1], and the minimum function is well known example. The t-norm is associative, T(x,T(y,z)) = T(T(x~y),z), commutative T(x,y) = T(y,x), and monotonically nondecreasing in both arguments
T(xy) < T(v,w); if x < v and y < w. The t-norm satisfies the boundary conditions T(0,0) = 0 and T(x,1) = T(1,x) = x, which also satisfy the definition of the logical conjunction. To apply the t-norm to the interval-valued certainty, assume that the premise is of the form
f., i-!
where .
is a fuzzy clause and [ai,AI] are its interval-valued certainties. Then the premise interval certainty is given by
[b,B] = [T(aI,a 2, ....,),T(A,,A
2,.
A,)].
For the special case when T(x,y) = min(x,y), the premise certainty is given by the minimum of the certainty minimums and the minimum of the certainty maximums. In general, the t-norm can be designed to reflect the data association or correlation of the premise clauses. For positively correlated clauses, the norm T(xy) = min(x,y) is a good choice; for uncorrelated clauses the norm T(x,y) = xy may be a good choice; and for negatively correlated clauses, the bold intersection T(x,y) = max(0,x + y- 1) may best capture the association (reference 10). Although the work so far has used only T(x,y) = min(x,y), the mechanism to select the appropriate t-norm for each rule is clearly in place. The s-norm or t-conorm is a generalization of the logical disjunction or logical OR operator. The s-norm S(xy) with x,ye[0,1] has similar properties to the t-norm. These properties include associativity S(x,S(y,z)) = S(S(x,y),z), commutativity S(x,y) = S(y,x), and monotonicity in both arguments S(x,y) : S(v,w) if x < v and y < w. The boundary conditions resemble the logical OR in that S(,1) = 1; S(xO) = S(0,x) = x. The s-norm used in this report is the maximum function S(x,y) = max(x,y). The t-norm and s-norm are related by a generalized version of DeMorgan's law provided one uses a hard complement, i.e., N(x) = 1- x. DeMorgans' laws then become S(x,y) = N(T(N(x),N(y))) and T(x,y) = N(S(N(x),N(y))). The detachment operator is the mathematical mechanism needed to propagate the premise certainty through the implication operator to the conclusion. The detachment operator is defined in terms of the s-norm and t-norm operators and provides the mechanism for the propagation of evidence, not only for the interval-valued certainty representation, but also for the single-valued certainty representation. In the taxi driver control problem, the output fuzzy set of figure 2-3 can be written as p,,&4n(z) = min(J mD(z),minLu[HGH(x),py^w(y)]). This output is derived from the following rule: if the speed x is HIGH and the stop light is NEAR then brake HARD. Yet there is nothing in this conclusion that includes the confidence in the rule itself. In general, if one has a rule P -- Q, then if v(P) denotes the certainty or validity of the premise and
3-4
v(P -. Q) denotes the strength or validity of the forward implication operator, then the detachment operator must relate these two quantities to the conclusion validity v(Q). More precisely, the binary detachment operator m(v(P), v(P -+ Q)) is defimed so that m is as large as possible and still satisfies the constraint m(v(P), v(P -+ Q)) < v(Q) (reference 14, p. 167). That is, the strongest conclusionyou can infer without overstating the truth of the conclusion. The validity or certainty can also be thought of as the truth, as well. If one assumes that v(P -+ Q) = 1 then the control example is also an application of a detachment operator. The detachment operator m can be taken to be a t-norm (reference 14) and, in particular, the minimum function min(v(P), v(P -+ Q)) is the single-valued certainty that is often used by fuzzy expert system shells like fuzzy logic official production system (FLOPS) (reference 15). Just as the detachment operator can propagate evidence through fuzzy rules for single-valued certainty measures, the operator can also propagate evidence using interval-valued certainty measures. The detachment operator discussed above was a function of two arguments since the validity of both the premise and the rule were real numbers in the interval [0,1]. Now the premise certainty is given by the interval [b,B], and the conclusion certainty is represented by an interval as well. Moreover, it's not clear how to represent the certainty of the implication in this description. Bonissone (reference 10) uses the "necessity" and "sufficiency" of the implication operator to construct the certainty of the conclusion. The necessity is given by n = v(Q -+ P) and the sufficiency is given by s = v(P -- Q). (Note this is not the same necessity used to measure the subsethood of one'fuzzy set to another; but instead, is the classical mathematical definition of the necessary part of the if-and-only-if logical implication.) The sufficiency s is the strength of the implication in the forward direction and the necessity n is the strength of the implication in the backward direction. So the detachment operator is a vector-valued function on a vector field m([b,B],[s,n]) = [T(s,b),S(B,1 -n)] where T is a t-norm and S is its corresponding s-norm. For example, if T = min, then S = max and m((b,B],[s,n]) = [min(s,b),max(B,l -n)],
which is the detachment operator used in the examples of section 4. One can get a feel for this operator by simply considering the following four special cases, where the strength of the forward and backward implication operator are set at their limiting values: 1. [s,n] = [0,0] • m = [0,1]. Here one cannot rely on the rule in either direction so that inference mechanism has no strength. The certainty interval is appropriately given by [0,1] since the lower bound on the certainty is zero and the upper bound is 1. This simply says, nothing is known about the certainty of the conclusion. 2. [s,n] = [1,1] =ý m = [b,B]. Here the implication operator is totally reliable in both the forward and the reverse direction. The strength of the conclusion is then bounded above and below by the corresponding bounds of the premise, simply because the implication operator has infinite fidelity. 3. [s,n] = [1,0] =ý m = [b,1]. Here the implication operator in the forward direction is absolutely reliable but the reverse implication operator is absolutely unreliable. The conclusion certainty interval is lower bounded by the lower bound of the premise validity, but the upper bound of the conclusion validity is 1 since no reliability can be placed on the reverse implication operator. Without the reverse implication, nothing can be inferred about the upper bound except that it is 1.
3-5
4. [s,n] = [0,1] * m=[0,B]. Here the forward implication operator is useless and so no lower bound can be established on the conclusion validity. However, the strength of the
reverse implication operator allows an upper bound on the conclusion validity. One expects this in the case of modus tollens or backward chaining. Note that the premise certainty interval in conjunction with the forward ply determines the strength of the lower bound, and in conjunction with the reverse ply determines the strength of the upper bound. However, the interval interpretation is not the only one that can be placed upon the conclusion validity. The truth of the conclusion can be thought of as a linguistic variable. Figure 3-4 shows the definition of the linguistic variable TRUTH. In this figure, one sees that the semantic interpretation of the linguistic term "true" appears as the ramp function "absolutely true," as a Kronecker delta function located at a truth value of 1. Note that the term "undecided" is a constant function shaped like a uniform distribution function, much like case 1 in the above paragraph. In fact, the certainty intervals can be thought of as fuzzy sets (as illustrated in figure 1-4b and thus seen to be related to the terms in the linguistic variable TRUTH (reference 5). In fact, redefining TRUTH as in figure 3-5, the certainty intervals can be interpreted as approximations to the terms of TRUTH (reference 13). Thus the interval-valued certainties approximate the functional-valued linguistic terms of the linguistic variable TRUTH.
FFLE•FAIRLY (•FAIRLY
( ( VERY •FALSE
TRUE•q
UNDECIDED
L
A
TRUTH VALUE
Figure3-4. Linguistic Variab TRUTH 3-6
VERY TRUE•0
see
TRUE
FAS ! ~
UNDECIDED
U.
of
/
.
U
TRUTH VALUE
Figure3-5. Altemate Definition of the Linguistic Variable TRU771 3.2 DECISION SPACE In section 2, it was pointed out that the decision function Of classical Pattern recognition mapped the data to singletons or single classes in the decision space- FPR generalizes classical pattern recognition by mapping to a fuzzy unit vector (fit vector), where the elements of the vectors are associated with classes and the values of the elements represent the membership in those classes. It was also pointed out that fuzzy model assessment generalizes the certainty representation so that the elements in the fit vector are now interval-valued sets and the fit vector becomes an interval-valued fuzzy set. Section 3.1 established that one could view these intervalvalued fizzy sets as second-order fuzzy sets whose members are fuzzy sets approximating the terms of the linguistic variable called TRUTH. That is, think of these intervals as approximations to the terms of the linguistic variable TRUTH. One further extension is needed. Instead of each element of the fit vector being associated with singletons in the deiinspace, the dimension of this vector is expanded to include one element for each member of the power set of the classes, just as in the Dempster-Shafer formulation. That is, the elements of the fit vector are members of the power set of {hO,•,9pph2,nl} , denoted P~ho,bf ,pph2,nl} = P. In practice, fuzzy model assessment maps to a subset of P'. So a fuzzy rule can have a conclusion {bJ',h2l , which is interpreted as either a change in base frequency or a contact maneuver has 3-7
occurred. Associated with this decision or element of the fit vector is a fuzzy set representing the certainty interval of the conclusion. The fact that the elements of the decision space are now members of P is a result of the fuzzy rule base. The fuzzy rules were fashioned after the compatibility maps of CMMA. The Dempster-Shafer approach used in CMMA maps the evidence or data to the frame of discernment whose initial set is the set of all possible decisions; this means that the basic probability assignment (bpa) is on P. The evidence or sensor readings are mapped to the decision frame via compatibility maps. Since different sets of sensors can map to the decision frame, the evidence is combined from these different sources using the Dempster-Shafer rule of combination (ROC). Compatibility maps are very convenient in that they only ask how the current inputs can support the elements in the final decision frame of discernment Then the support for each conclusion is aggregated using the ROC, eliminating the question of how the different sensors affect each other because this is all taken care of in the ROC. However, the ROC assumes that the sources of information or data sources are "independent," although it is not clear what independence means. Fuzzy rules also provide support for the conclusions from the data, but the support representation and the propagation mechanism are quite different. The compatibility maps of CMMA map the bpa of the sensor frame to the bpa of the decision frame. Compatibility maps are based on compatibility relations, which are defined on the product space of the two frames. "A compatibility relation simply describes which elements from the two frames can be true simultaneously" (reference 16). A compatibility map is defined as CA. 45 (AA) = {bI(ai,bj)eE9A,,aeA }.
where OA = {al,a2, ... ,an} and e9B = {bl,b 2 ,...,bn} so support is mapped from sets in eA to sets in EB. The support is translated from frame to frame via the summation of the bpa that maps to a specific set, i.e., mB(Bj)=
XmA(Ai), CA-.B(A)f=Bj
where mB(Bj) is the bpa assigned to set Bj in frame B. From the formula for the bpa in frame B, it is clear that support is mapped from frame A in an additive manner. This is not the case with fuzzy rules. Fuzzy rules map support to the same sets in the decision frame, but in a different manner and in a different form. The decision space is the power set Pf=fiB, and the compatibility maps are the source of the multivaluedness of the mappings. The fuzzy rules reflect the multivaluedness of the compatibility maps since they were fashioned from them. On a more basic level, the multivaluedness is due to the fact that singlesensor, single-measurement rules are not specific enough to reduce the ambiguity of the compatibility map or the fuzzy rule. So ambiguity is not a flaw, but a natural result of the partial observation of the data. When more specific rules are designed, the premise contains the conjunction of many measurement conditions and produces more specific decisions. Unlike binary rule-based systems, all the rules fire for each data input, but the strength of the premise satisfaction and the strength of the rule determines the strength of the conclusion. The next step is to aggregate the conclusion support from all rules, which is the counterpart to the DempsterShafer ROC in CMMA evidential reasoning. Aggregation is discussed in the next section.
3-8
3.3 AGGREGATION OF CERTAINTY FOR CONCLUSIONS Conclusion certainty is combined using general aggregation operators, which behave as averýping operations (reference 4, section 2.6). Aggregation operators combine fuzzy sets in a specifc manner that can be defined by a function with the appropriate properties. This function is defined as h:[0,1]" -+ [0,1] and when applied to fuzzy sets defined on a common universe of discourse has the form PA (X) = hOzs, (x),#, (x),...,.,, (x)). The function h is assumed to be continuous, symmetric, and monotonic nondecreasing in all its arguments. Moreover, two boundary conditions must be satisfied, namely: h(0,0,...,O) = 0 and h(l ,...,l) = 1, which reflect the averaging nature of the operator. Many operators possess these properties, including the maximum and the minimum operators representing the union and intersection of the fuzzy sets in our system. Aggregation operators are used in two different ways. The first way is to aggregate the certainty obtained from one set of data across the rules. The rules are not independent, Le., different rules can map to the same conclusion and the data and the rules can in some sense be correlated or associated. Therefore, there may be many fuzzy certainty intervals associated with
one conclusion of P originating from a single data set. These certainties need to be combined or aggregated to form one certainty interval. The second way to aggregate is across data samples.
With several sets of samples, evidence will accumulate at the conclusions in the form of these interval-valued sets, and this evidence has to be combined or aggregated across the samples or across time. The second aggregation method does not have to be the same as the first. So far, this latter aggregation method has not been tested simply because the work has not progressed far enough. The first method has been tried and the next simple aggregation procedure has been tested. For each element of the decision space, there will be a collection of intervals generated by
all the rule fiings. Bonissone's method of conclusion aggregation is to replace these intervals [c, C,] with one interval, (c, CJ = [S(cc
2 ,.... ,c-),S(C,,C2 .... ,C)]
(reference 10).
When the s-norm operator is chosen as S(x,y) = max(x,y), this yields a certainty interval from the max-of-the-minimums to the max-of-the-maximums for the conclusion interval. The support of the conclusion then is the maximum support that any one rule gives to the conclusion; likewise, the possibility is the maximum of all the possibilities. This method of conclusion aggregation always yields a nonempty interval provided some rule has yielded a nonempty certainty interval. More formally, if the individual intervals are given by p,(x) = lt ,.co(x), then the aggregation operator is given by l[•,.cP) = supPE(v) A^IF(x), where V:Zi-l
jt,(x) = flp1
(x) and pF(x) = U0
(x).
i=I
The union is implemented using the maximum operator and the intersection implemented using the minimum operator. From this definition, the aggregation function h is continuous, symmetric, and a nondecreasing function of all its arguments. The boundary conditons are also satisfied. The simplicity of this method makes it practical, requiring only minimum and maximum operations. Note further, that this aggregation is conservative in that each lower bound c8 is itself the minimum of the certainties of the premise clauses and the certainty of the forward ply. So, the aggregation procedure provides a conservative necessity measure. Observe 3-9
the support that is mapped into the decision space by the fuzzy rules is not additive, as it is with
the Dempster-Shafer approach. Other alternatives to this aggregation procedure exist. Source conensus (refence 10) is
one more aggregation method. Here the form of the aggregation is [c,C= [s(,c=,...,C,=, ( ,
,...,V,)],
which, for the choice of T(x,y) = min(x,y) would yield the max-of-the-minimums and the minof-the-maximums This interval would be considerably shorter, and in fact could be null even when each component interval is not null. However, when all the intervals supporting a conclusion overlap, this would give a smaller interval. The necessity would be consrvative, but the possibility may noLt This shortened interval length tends to indicate too much knowledge of
the certainty of the conclusion. Conflicting answers can yield an empty set leaving no idea of what type of support exists for the conclusions other than they may be conflicting. Bonissone points out that this is a test for conflicting evidence.
3-10
4. FUZZY CMMA SYSTEM 4.1 DATA IN THE CMMA SYSTEM The CMMA system, as described in section 1, is a system driven by features extracted from contact tracks. Thes features are linear fits to data, Le., linear regression of order one. In general, this fit could be a fuzzy regression or it could be a robust nonlinear regression, or just an ordinary least-squares regression. Robust regression procedures That can be applied to the detection and filtering of features in the model assessment problem is discussed in reference 17. The slope and intercept estimates describe a straight line and the distribution of these parameters is known, provided th errors are additive normal random deviates. Each parameter is a staiic, and its distribution is known to be normal (reference 18). The probability density function (PDF) of the derived parameters is used to represent the information as a fuzzy number, merely by renormalizing the PDF so it has a mode of one. Describing the parameters as fuzzy numbers enables the uncertainty of the linear fit to be propagated through the rule. Figure 4-1 illustrates the overall process of fuzzifying the data from the standpoint of contact managemenL This solution flow diagram is also part of the tracking blackboard currently part of an independent research study (reference 19). First the track is re-initialized by detecting the start of the new segment as discussed in section 1. The new segment is modeled by a regresive fit, and the distribution of the extracted parameters is used to form the fuzzy sets labeled A and B, respectively, in the figure. Figure 4-1 shows how the jump axes are partitioned using trapezoidal term sets, which results in a waffle-iron texture in the two-dimensional feature space. In both dimensions of the jump-drift space, the trapezoids provide a fuzzy pseudopartition of the space. In this figure, only the jump feature is shown along with the term sets
needed to fuzzify the data.
OWSH
AX+B
SEGMENTSYz
C
BAE FREQ 0
EMENTS0 PAT
0
MY
TRACK
-
XAP
JUMEMBERP
MEMBERSHIP
FUNCIONIFUNCTION ~B. JUMPF
0
CONSTRUCTING THE UNGUISTIC VARIABLES
so
0
W
Figure4-1. Regression Fit of Input Data,Model Fitby Fuzzy Numbers,
then Fuzzify the Fuzzy Numbersfrom the Fit
4-1
As a practical manner, the data are not fitted directly to the term sets of the features. Instead, the data are first approximated by a piecewise continuous MF and then the possibility and the necessity is calculated from this MF. This calculation was illustrated in figure 3-3. Figure 3-1 showed how a normal data pulse is fitted by a trapezoidal approximation for one of the parameter values. This simplifies the code considerably and does so with little loss in performance and with full generality in the shape of the data. The approximation MF is represented and stored as a sequence of linear line segments. The details of this representation are contained in appendix A, so it suffices to say an MF is represented as a list of points, ((Po,PI),(PI,P2 ),. .,(P,_IP,)) where each point Pi is a pair of the form (xi,yi). The advantage of this format is simplicity, low spatial complexity, and low time complexity of the fuzzy logic operators. The formation of the possibility and the necessity was discussed conceptually in section 3. In practice, the possibility n = ssminLA(X),p,(x)] is calculated by first forming the intersection (minimum of the two MFs), and then finding the supremum over the resulting MF by simply looking for the maximum ordinate over the set of the segment end-points (xi,yi). In like manner, the necessity N = inf max[l -PA(x),P,'(x)] is calculated by first forming the complement of the fuzzy set A, taking its union with the fuzzy set B, and then searching for the minimum ordinate of the piecewise representation of this union. The space and time complexity of this process is clearly linear with the number of segments and thus an efficient implementation of the calculation. In figure 3-3a, calculation of the possibility was illustrated once the data had been approximated by a trapezoidal term set. The necessity is illustrated in figure 3-3b for the same trapezoidal term set. The exploratory program for the fuzzy model assessment is written in Common LISP using the Common LISP Object System (CLOS) for the object-oriented programming aspect of the problem. Object-oriented programming attempts to encapsulate the data and the code associated with objects in the programs. Accordingly, the objects in the code correspond to natural entities in the decision process. In particular, there are objects for linguistic variables, for the conclusion, and for the data. An example is the linguistic variable object called "lingvar." Instances of this object are created for each sensor value like spherical bearing. The general form of the definition looks as follows: (defclass lingvar 0
(
(base :initarg :base :initform 1.0 :accessor base) (xlow :initarg :xlow :initform -1 :accessor xlow) (xhigh :initarg :xhigh :initform 1 :accessor xhigh) (numterms :initarg :numterms :initform 7 :accessor numterms) (terms :initarg :terms :iJ itform '(ns nm nw ze pw pm ps) :accessor terms) (nsj :initarg :nsj :initform '0 :accessor nsj) (nmj :initarg :nmj :initform '0 :accessor nmj) (nwj :initarg :nwj :initform '0 :accessor nwj) (zej :initarg :zej :initform '() :accessor zej) (pwj :initarg :pwj :initform '0 :accessor pwj) (pmj :initarg :pmj :initform '0 :accessor pmj) (psj :initarg :psj initform '0 :accessor psj) (nsd :initarg :nsd :initform '() :accessor nsd) (nmd :initarg :nmd :initform '0 :accessor nmd) (nwd :initarg :nwd :initform '0 :accessor nwd) (zed :initarg :zed :initform '0 :accessor zed) (pwd :initarg :pwd :initform '() :accessor pwd)
4-2
;; scaling term ;; lower bound of range ;; upper bound of range ;; terms number ;; terms names ;; nsj names size of jump ;; nmj names size of jump ;; nwj names size of jump ;; zej names size of jump ;; pwj names size of jump ;; pmj names size of jump ;; psj names size of jump ;; nsd names size of drift ;; nmd names size of drift ;; nwd names size of drift ;; zed names size of drift ;; pwd names size of drift
(pmrd initarg :pmd :initform '0 :accessor pmd)
(psd :initarg :psd :initform '0 :accessor psd)
) )
;; pmrd names size of drift
;; psd names size of drift
(:documentation "Trapezoidal representation of a MF for sensor variable.*)
This definition of the class called "lingvar" is used to represent terms of the sensor measurements. It represents the term sets for the features extracted from the residual data on the measurement. The linear fit to the residuals produces two components, the jump and the drift. The syntax of the term sets is
if the sensor term set is not zero; for the
zero term set, the syntax is {ze}{d}. The term set name grammar uses the n as negative, p as positive. The size is given as w for weak, m for moderate, and s for strong. The type of variable is j for jump and d for drift. The range of the measurement must be bounded,
which incurs no loss of generality. The range is [xlow, xhigh] and the total number of terms needed to cover the measurement space is given by "numterms," currently set at seven. The variable "terms" is a list of names for these terms without the qualification of jump or drift. The important thing is not the names or the syntax, but the fact that this measurement form is captured in the object class, and each measurement has its own instantiation of this class. This class represents one of the entities in the fuzzy system diagram that describes the term
sets, and is the information that is needed to fuzzify the data on input. At present, only the positive parts of the rules are being used because the data received are folded, i.e., the absolute value of the center of the fuzzy term sets is used. This causes a reduction in the number of rules. Because the rules are essentially symmetric, then for testing purposes, the additional rules are redundant and add little knowledge to the exploratory testing. Clearly, these rules can be added easily at a later point. Of more importance, is the question of scaling. The data seen in the CMMA problem ranges over three orders of magnitude, sometimes more. This is one reason that the trapezoidal sets were used instead of triangular sets. A more appropriate scaling might be a logarithmic scaling of the data; however, the data may be either positive or negative, requiring separate scaling of the positive and negative parts and an adjustment for the values around zero. Although this decision is of no consequence for this study, it will be important when running larger data sets and handling both the negative and positive parts of the input data so the transformed data are more uniformly distributed across a finite range. When making this transformation however, the normal shape of the data will also be transformed, and how that looks in the transformed space must also be determined. The next issue addressed is when should the trapezoidal approximation to the data be made, before or after the transformation? It is probably advantageous to approximate first, then transform. 4.2 STRUCTURE OF THE FUZZY RULES The form of the fuzzy rules in the system is similar to the simple control rules that were discussed in section 2. An example follows: IF the drift in the spherical bearing is POSITIVE WEAK OR MODERATE AND the jump in the spherical bearing is POSITIVE WEAK OR MODERATE, THEN a contact maneuver has occurred.
4-3
In this format, the terms drift and jump are represented by fuzzy numbers and both the terms and their logical constructs are given by the capitalized letters. So that drift is a fuzzy number and POSITIVE WEAK is a term that is then OR'd with the term POSITIVE MODERATE. The necessity and possibility of the drift feature being a member of POSITIVE WEAK OR MODERATE are calculated and the certainty interval is formed for this clause. A similar procedure is applied to the second clause of the premise. The certainty of the premise is combined from the certainty of the clauses using the t-norm operator. The certainty of the conclusion is again an interval, which assumes that the strength of the forward and backward implication is one, that is, the necessity = sufficiency = 1. These variables are in the system and can be changed for all the rules or the inference engine modified to include the strengths in the rules themselves. The actual rule itself is almost self-documenting provided one recalls that in LISP the functional forms are in polish notation, i.e., (function argi arg2 ... ). The cply function is the detachment operator, and the spbrdfunction is an accessor method, which retrieves the data from the data object called sensor. The t+ function is the fuzzy OR of the two terms POSITIVE WEAK and POSITIVE STRONG, and the tisa function is interpreted as is a or is a member of function. So the second and third rows of the rule produce certainty intervals, the output of the tisa function. The term ply is a list that contains the strength of the implication in the forward and reverse directions. For this report, ply has been set to (1 1) or (forward backward) implication strength. The conclusion, h2, is a contact maneuver. Note the t+ and tisa function both utilize the piecewise continuous representation of the MFs adopted for this system. (cply 'RULEBR3 (c* (tisa (spbrd sensor) (t+ (pwd spbr) (pmd spbr))) (tisa (spbrj sensor) (t+ (pwj spbr) (pmj spbr))) ) ply '(h2))
AND the predicates to form premise ;; is bearing drift positive weak or mod ;; is bearing jump positive weak or mod ;; THEN conclude contact maneuver ,;
The rules are kept in a separate file and used in a read-only manner. Since the form of the rules is also as a LISP function call, to fire a rule, activate it by a function call using the evaluate function. Some of the rules presently have disjunctive forms using a function called c+, but this is allowed at this point only because the t-norm is the minimum and the s-norm is the maximum. In this one special case, disjunctions in the premises are allowed. These rules are being rewritten as separate rules. The code is designed so that any t-norm can be used in place of the min and any s-norm can be used in place of the maximum with minor modifications of the code. After all the rules have fired, the conclusions can be displayed using certainty plots. Figure 4-2 is a certainty plot for moderately high signal-to-noise (SNR) test data when a contact maneuver has occurred. Note that the single hypothesis h2 has several strong certainty intervals supporting this hypothesis. No other single hypothesis has any support. However, the multiple hypotheses have an absolute certainty consisting of the interval [1, 1]. These are to single sensor measurement rules designed to support the h2 hypothesis. Note further, all the conclusions having absolute support contain the h2 hypothesis, which confirms consistency in the rule base for this hypothesis.
4-4
1.1
0.9U 0.7 0.5
R
0.3 0.1 HYPOTHESES
-0.1 hOb
nr
h2 pp bF bf b b b bf ho h0 hO hO h2h2 pp h2 h2 pp PP
ho pp
hO h2 h2 pp PP
Figure4-2. CerainyPlot ofDataDesigned to Test for a Contact Maneuver,Hypotdeis h2 The corresponding aggregated certainty plot is given in figure 4-3. Aggregation as it is presently set up for testing purposes uses the more conservative form of conclusion aggregation, where if the certainty intervals for a single conclusion are given by [ci, C1 ] then the certainty intervals for the aggregated conclusion is given by [max(c,,...,c,), max(Cq,...,C 3 )]. Again the code can be easily generalized so the generalized s-norms can be used in place of the maximum function. This is an ideal case with strong data in all the sensor measurements. When the SNR deteriorates, then the certainty in the single hypothesis weakens causing the necessity to decrease. So the minimum support drops and the certainty intervals widen. This is illustrated in figure 4-4, by considering a low SNR version of figure 4-2. Note further that the propagation path certainty for the single hypothesis has also widened. This is because the SNR of all the sensor measurements was lowered in this example. The rule that builds the certainty for the change of propagation path alternative requires that the bearing have zero jump, and as the support of the bearing information widens (higher uncertainty in the exact bearing) the data lends support to the zero jump clause causing the pp hypothesis certainty increase. In effect, the certainty "leaks" out of the rule for which it was designed and "spills" into other rules. The weakening of the minimum support for the h2 hypothesis is caused by leakage and the increase of the support for pp is caused by the spillage. Again, this is not a flaw in the system; instead, it illustrates how the deterioration of the data uncertainty must temper the strength of the conclusions.
4-5
1.20 1.0
0.8
0.6
U
U
U•
Q I
1 0.4
0.2. HYPOTHESES 0.0
0 hO
bf
a a m - - -1 bE bf bf bf bf h ho hO h2 ho hO h2h2 pp h2 pp h2 pp h2 pp PP PP Figure4-3. Aggregated CertaintyPlotfor a ContactManeuver,Hypodtesis h2 nl
h2
pp
bf hO
1.1 i
am
a
0.9 0.7
0.5 0.3 0.1 m
m
m
U
U
U
U
U
m
HYPOTHESES
-0.1 hO bf
hO h2 bf bf hO ho bf bEf h0 h0 h0 h2 h2 pp h2 pp h2 pp h2 pp PP PP Figure4-4. Certainty Plotof Low SNR Data Designedto Tatfor a ContactManeuver, nl
h2
pp
Hypothis 12
4-6
5. CONCLUSIONS AND FUTURE DIRECTIONS In this report, fuzzy logic has been applied to model assessment This problem is a pattern recognition problem, which could be modeled as statistical pattern recognition or fuzzy pattern recognition. The approach investigated uses a generalized form of FPR in which the conclusions are all possible to varying degrees. Instead of modeling the degrees of possibility of the conclusions as memberships or as real numbers, the degree is modeled as an interval-valued set. The conclusions are generated by a fuzzy rule-base system using fuzzy data extracted from the residuals of linear fits to the sensor data. The fuzzy rules map the data into conclusions just as discriminant functions used in SPR except for the format of the conclusion. The fuzzy rules represent heuristics based on physical laws of motion and Doppler shifts. The system has been tested using only a few key cases to ensure that the rules and the fuzzification and aggregation algorithms work. The rule-based system works, although performance tests have not been done because no correct data set was available. This work has generated promising results and indicated many promising directions of research. In particular, the performance of the system is dependent on the number and shape of the terms in the linguistic variables of the extracted features. The design of these terms is dependent on the clustering or learning algorithms used to create these term sets. There are
numerous methods for learning the term sets. One promising technique, first suggested in reference.- 13 and 19, applies fuzzy neural networks (FNN). These FNNs learn not only the term but also the fuzzy rules themselves. A second interesting area of study is the relationship between the Fuzzy logic approach and the Dempster-Shafer approach. Clearly these two aupi-oaches are related in some sense, but the detail relationship is a difficult theoretical problem.
The approximation of discrirrminant functions by fuzzy rules is a theoretical problem that begs for an answer. A third area oi research mentioned in section 3 is the generalization of intervalvalued certainty representation to term sets of the linguistic variable TRUTH. Finally, further work is needed on the aggregation of conclusion certainty, not only across rules but also across data samples. The directions for future study require both theoretical and simulation study of the issues associated with the design of this modeling problem.
5-1/5-2 Reverse Blank
&REFERENCES 1.
S. C. Nardone, D. J. Ferkinhoff, S. F. Hammel, and K. F. Gong, "Hypothesis Selection for Evidential Reasoning Systems," 1993 Basic Research Group Symposium on Command and ControlResearch, National Defense University, Washington, DC, 28-29 June 1993 (UNCLASSIFIED).
2.
S. C. Nardone, et al., "Contact Management Model Assessment: A System Description," NUWC-NPT Technical Document (in preparation), Naval Undersea Warfare Center Division, Newport, RI (UNCLASSIFIED).
3.
K. Fukunaga, Introduction to StatisticalPatternRecognition, Second Edition, Academic Press, Boston, MA, 1990.
4.
G. Klir and T. Foldger, Fuzzy Sets, Uncertaintyand Information, Prentice Hall, Englewood Cliffs, NY, 1968.
5.
H. Zimmermann, Fuzzy Set Theory and its Applications, Second Edition, Kluwer Academic Publishers, Boston, MA, 1992.
6.
J. Bezdek, "Editorial Fuzzy Models - What are They, and Why?" IEEE Transactionson Fuzzy Systems, vol. 1, no. 1, February 1993, pp. 1-6.
7.
J. Bezdek, PatternRecogntion with Fuzzy Object FunctionAlgorithm, Plenum Press, New York, 1981.
8.
B. Kosko, Neural Networks and Fuzzy Systems, Prentice Hall, Englewood Cliffs, NY, 1992.
9.
P. Bonissone and K. S. Decker, "Selecting Uncertainty Calculi and Granularity: An Experiment in Trading off Precision and Complexity," in Uncertaintyin Artificial Intelligence, L. N. Kanal and J. F. Lemmer, eds., Elsevier Science Publishers, B. V. North Holland, 1986, pp. 217-247.
10.
P. Bonissone, "Summarizing and Propagating Uncertain Information with Triangular Norms," InternationalJournalof Approximate Reasoning, vol. 1, January 1987, pp. 71-101.
11.
P. Bonissone, S. Gans, and S. Decker, "RUM: A Layered Architecture for Reasoning with Uncertainty," in Proceedings10th InternationalJoint Conference on Artificial Intelligence(IJCAI-87), Milano, Italy, 1987, pp. 891-898 (UNCLASSIFIED).
12.
P. Bonissone, "Using T-norm Based Uncertainty Calculi in a Naval Situation Assessment Application," in Third Workshop on Uncertaintyin Artificial Intelligence, Seattle, WA, July 10-12, 1987, pp. 250-261.
13.
P. R. Kersten, "Propagation of Evidence through Fuzzy Rules," NUWC-NPT Technical Report 10,222, Naval Undersea Warfare Center Division, Newport, RI, 1 September 1993 (UNCLASSIFIED).
14.
D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, San Diego, CA, 1980.
6-1
15.
J. Buckley, W. Slier, and D. Tucker, "A Fuzzy Expert System," Fuzzy Sets and Systems, vol. 20, 1988, pp. 129-148.
16.
J. D. Lowrance, T. G. Garvey, and T. M. Strat, "A Framework for Evidential Reasoning System," ProceedingsAAAI-86, Fifth National Conference on Artificial Intelligence, vol. 1, Philadelphia, PA, 11-15 August 1986.
17.
P. K. Kersten, "Fuzzy Robust Detection and Regression in Model Assessment Problems," NUWC-NPT Technical Report (in preparation), Naval Undersea Warfare Center Division, Newport, RI (UNCLASSIFIED).
18.
D. J. Ferkinhoff, J. G. Baylog, K. F. Gong, and S. C. Nardone, "Feature Detection for Model Assessment in State Estimation," NUWC Technical Report 7064, Naval Undersea Warfare Center Division, Newport, RI, 5 October 1991 (UNCLASSIFIED).
19.
P. R. Kersten, "Fuzzy Expert System FY 94 IR Proposal," Naval Undersea Warfare Center Division, Newport, RI, June 1993.
6-2
APPENDIX CONTINUOUS LINEAR PIECEWISE REPRESENTATION OF FUZZY SETS
One issue in implementing fuzzy logic and control is the choice of the term sets in the fuzzification and defuzzification process. Conceptually, this seems to be a trivial issue; however, for implementing logical operations and for evaluating the inclusion index, this is an important practical issue. Brute force implementations such as vector representations of the MFs have both high space and time complexity, which can slow the logic and yield erroneous results. An eloquent and efficient representation of the fuzzy sets is imperative. One such representation is discussed in this section, the linear piecewise continuous (LPC) approximation of the term sets. In section 2 a linguistic variable was defined as a quintuple (xT(x),U,G,M)where the elements are 1. 2. 3. 4. 5.
x is the name of the variable; T(x) is the term set of the variable x; U is the universe of discourse, or domain of definition or base; G is a syntactic rule for generating the name; M is a fuzzy set representing the meaning of T(x), M is called a semantic rule.
An example of a linguistic variable COLOR was displayed in figure 2-1. Conceptually, the fuzzy sets that define the terms in the linguistic variable are easy to represent both mathematically and symbolically. Even numerically, the MFs are quite easy to evaluate directly. In general, the result of logic operations on the MFs or semantic rules of the term sets is not in the same class of functions that defined the original MFs. For example, the intersection of triangle functions is again triangular, however, the union is not, provided the intersection operator is the minimum function and the union operator is the maximum function. For practicality, one wants the representation for the fuzzy sets to be closed under all the logical operations, such as OR, AND, and NOT. One set of functions that has this property is the piecewise constant functions. For this appendix, the fuzzy set intersection and union will be implemented using the minimum and maximum, respectively. More specifically, IlAaD(x) = nin[JLA(x),ILB(x)] and UAU B(x) = max[LA (x)4tB(x)], respectively. The fuzzy set complement is given by P/i(x) =1 -IA(X). The goal is closure under the three logical operations {flU,-}, the symbol set for {AND,OR,NOT)=L, the standard set of logical operations. Another set of functions closed under L is the LPC approximation of the MFs. This representation is efficient and simple: simple since it is just a string of linear line segments connected at the end points, and efficient since the space complexity is low and the time complexity is directly related to the space complexity. The sequence of the points is represented by the list ((P0 PI) (PI P2 ) ... (P,,- P,)), where the points
are themselves lists Pi = (xi yi). This representation is somewhat redundant, but simplifies coding. Figure A- 1 and figure A-2 contain two examples of LPC membership functions. These are trapezoidal MFs, TRAP1 and TRAP2, respectively.
A-1
2
Figure A-i.
4
6
1
1
1
1
1
0
2
4
6
Trapezoidal Membership Function, TRAPI
•mm•.UNIVERSE
2 Figure A.2.
A
OF DISCOURSE
I
6
1
1
0
2
4
6
Trapezoidal Membership Function, TRAP2
The sequence of x values associated with each LPC MF, induces a partition on the domain or universe of discourse. The partition can be represented by the sequence of points, {x 0 , x1,...,x. I representing intervals of the partition given by {(xO, xl),(xl, x2), ...,[Xn-1, xn]}. To implement binary logic operations between two MFs in an efficient manner, both MFs should have the same partition, including the same domain, which means the first and last points of the partition must be identical. Therefore, part of the implementation must include a refinement of both membership partitions so that both MFs are defined on a common partition. Moreover, this partition is still not refined enough since it needs to include all the cross-over points of the two MFs. Figure A-3 shows all the points which must be included in the partition, including the crossover points. MF S~UNIVERSE
OF
DISCOURSE
0
2
Figure A-3.
A-2
4
6
8
1
1
1
1
0
2
4
6
Points of the Universe of Discourse Where the Partition is Made
The algorithm to carry out the binary opemations is given as follows: 1. Construct the sequence of points {xo, x1 ,. ., x, } that represent the partition associated with the i-th MF, i = 1 , 2, respectively. 2. Refine the two partitions by appending the two partitions and then ordering the points to form a bag. Reduce the bag to a set by removing the duplicate points. The resulting set is a partition refinement on which both MFs are defined. 3. Construct the list of linear segments based on the refined partition for both MFs. 4. Find all the crossing points located strictly within an interval. Add these points to the refined partition and reconstruct the list of linear segments again for both functions. Both MFs associated with the logical operation now have the same partition. 5. Carry out the binary logic function on each interval of the partition constructing the linear piece on each interval and concatenating these results to form the resulting MF. 6. Simplify the resulting representation by joining adjacent intervals that represent the same linear line segment. That is, either adjacent constant line segments or continued lines with the same slope. Figure A-4 illustrates the steps of the algorithm before collapsing the representation. The algorithm is conceptually easy to understand and moderately easy to implement in LISP.
1
UNIVERSE OF DISCOURSE
0
Figure A-4a.
2
4
6
Constructing the Common Partition of the Two Fuzzy Sets
t
M
UNIVERSE OF
1
DISCOURSE V
2
4
6
$1111 0
Figure A-4b.
2
4
6
Resulting Fuzzy Set Prior to Simplification
of the Partitionfor the Fuzzy OR Operator
A-3
The closure properties of this representation are guaranteed using the operators {CLL}. The minimum operation is carried out on each interval by taking the linear segment lying below the other, the maximum is implemented by taking the segment above the other, and lastly, the complement is implemented by replacing y-components with their complement 1 - y. So once the MFs ha. e a common partition, the logical operations are not only trivial to implement, they are also efficient. In summary, the LPC representation of MFs is an efficient representation that includes the trapezoidal and triangular MFs. The logical operations are exact up to the round-off error effects of the computer language implementation. Moreover, the LPC representation approximates general continuous MFs and allows efficient implementation. The LPC representation retains its closure properties under unions, intersections, and complements.
A-4
INITIAL DISTRIBUTION LIST Addressee
No. of Copies
Program Executive Officer for Undersea Warfare (ASTO-E)
1
Chief of Naval Operations (N872T, N872E2)
2
Chief of Naval Research (ONR-333)
2
Naval Air Warfare Center Weapons Division (Code 2158 -0. A. Bergman) Naval Surface Warfare Center Carderock Division (Code NOS
McNiel, J. Hodge, 3 -
M. Stripling)
I
Commander Submarine Force Atlantic Fleet
1
Commander Submarine Force Pacific Fleet
1
Commander Submarine Development Squadron 12
1
Advanced Research Projects Agency
2
Defense Technical Information Center
.'2
Center for Naval Analyses
1