I. INTRODUCTION
Target Classification Approach Based on the Belief Function Theory
BRANKO RISTIC DSTO Australia PHILIPPE SMETS IRIDIA Belgium
A theoretical framework is presented for target classification based on the belief theory on the continuous space. The proposed approach is applicable when class-conditioned densities of feature/attribute measurements are known only partially, as subjective models of a potential “betting” behaviour. Prior class probabilities may also be unknown. Numerical examples are provided to illustrate how the proposed approach is more cautious in decision making and produces very different results from those obtained using the Bayesian classifier.
Manuscript received March 24, 2004; revised October 1, 2004; released for publication December 9, 2004. IEEE Log No. T-AES/41/2/849020. Refereeing of this contribution was handled by C. Jauffret. Authors’ addresses: B. Ristic, DSTO, ISR Division, 200 Labs, PO Box 1500, Edinburgh, SA 5111, Australia, E-mail: (
[email protected]); P. Smets, IRIDIA, Universite´ Libre de Bruxelles, 50 Av. Roosevelt, CP 194-6, 1050 Bruxelles, Belgium.
c 2005 IEEE 0018-9251/05/$17.00 ° 574
One of the most important tasks of an air surveillance system is to correctly classify noncooperative flying objects within its surveillance volume. In general, target classification is based on a set of features or attributes which distinguish targets according to their shape, kinematic behaviour, and electro-magnetic (EM) emissions. To fully exploit the feature space, a surveillance system often consists of multiple sensors [20]. For example, a radar can provide kinematic features (maximum observed speed, acceleration) and shape features (length, range profile, radar cross section (RCS), etc.). An infrared (IR) sensor can supply shape features such as the target area or its spatial distribution. An ESM sensor exploits target EM emissions to determine transmission frequency, pulse repetition interval (PRI), or pulsewidth (which can then be related to the emitter type). This paper deals with identification of objects as members of predefined classification categories, when the underlying classification knowledge is imperfectly known. This is a typical case in target classification, where class-conditioned probability functions of feature measurements, for practical reasons, are rarely obtained by repeated experiments (frequency approach). Instead, these functions are modelled using prior knowledge of performance limits, typically expressed as expected or nominal feature values. For example, knowing the minimum, nominal, and maximum speed of a class of “commercial” jets, we usually model the distribution of target speed measurements corresponding to this class by a suitable probability density function (Gaussian, uniform, beta, etc.). These class-conditioned densities are the key to model-based classification, and a requirement for the Bayesian classifier. In the work presented here we develop a model-based target classification scheme using the belief functions as understood in the transferable belief model (TBM) [17, 18]. We assume, as in the Bayesian classifier, that probabilistic models of class-conditioned densities of feature measurements are available. However, we treat these densities as subjective probabilistic models, based on potential “betting” behaviour.1 In the terminology of TBM, they represent the (class-conditioned) pignistic densities. Our classification algorithm then proceeds in the following steps: 1) it builds from the class-conditioned pignistic densities their corresponding least committed (LC) class-conditioned 1 These densities represent expert’s opinion rather than objective knowledge. Subjective probabilistic models are built as follows. When an expert is ready to bet with probability p that a measurement will be smaller than a certain value, then this value is declared to be the p percentile of the distribution.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 41, NO. 2 APRIL 2005
belief functions; 2) it applies the generalised Bayesian theorem to compute beliefs over the frame of classes [13]; 3) it applies the pignistic transform to compute the class probabilities. Since the feature measurements are often continuous-valued (e.g aircraft length, speed, area, RCS, radar pulsewidth), we present the proposed classification scheme within the framework of the belief function theory on continuous spaces [10, 15]. The paper is organised as follows. Section II presents a motivating example and prepares the necessary theoretical background. Sections III describes the proposed classification scheme. Section IV illustrates the theory by numerical examples. Section V is a summary. II. BACKGROUND A. Motivation Consider the following example as a motivation. Suppose the aim is to classify air targets into one of three platform categories: 1) commercial planes, (2) large military aircrafts, 3) light and agile military aircrafts. The available target feature for classification is the maximum observed acceleration (during a certain interval of time). This feature is related to target maneuverability and our classification knowledge with respect to this feature can be described as follows. For class 1, the acceleration is rarely higher than 1 g (g = 9:81 m/s2 is the gravitation due to gravity), because the acceleration higher than §1 g causes sickness in passengers. Targets of class 2 sometimes perform mild evasive manoeuvres but their maximum acceleration (due to their size) is rarely higher than say §4 g. Targets of class 3 are light and agile, with highly trained pilots (the maximum acceleration of modern fighter planes can go up to §7 g). The steady-state of acceleration, however, for all three classes of targets can be considered to be zero (because of minimal fuel consumption, minimal stress for pilots, etc.). Finally, the knowledge of prior class probabilities is unavailable. Let us denote our feature measurement by z, and a set of classes by £ = fµ1 , µ2 , µ3 g. In order to perform model-based classification we need to adopt a suitable model for the conditional probability function of the feature measurement, for each class separately, i.e., p(z j µi ), for i = 1, 2, 3. The Bayesian classifier for example would compute the posterior class probabilities as p(µi j z) = ®p(z j µi )p(µi ) where ® is a normalisation constant and p(µi ) are prior class probabilities. Although our underlying classification knowledge is imperfectly known we are forced to adopt probabilistic models for p(z j µi ) and p(µi ). These models, however, are not obtained
(1)
by repeated experiments and hence represent only subjective probabilities. While the Bayesian approach would use these imperfect models as if they are true models, we build from them the least committed (LC) belief functions and then we apply the machinery of the TBM to perform classification. The described example is revisited in Section IVA, to demonstrate how drastically different classification results can be obtained by two approaches (Bayesian versus the one proposed here). B. Review of Belief Function Theory The belief function theory (also known as evidential theory) is originally developed for a discrete set of elementary events related to a given problem [11, 17]. This set is referred to as the frame of discernment (or frame): £ = fµ1 , µ2 , : : : , µN g
(2)
and it has a finite cardinality N = j£j. Beliefs are expressed on the subsets of £. All subsets of £ form the power set denoted as 2£ and defined as: 2£ = fA : A µ £g. The belief is represented by a so-called £ basic belief P assignment (BBA) m : 2 ! [0, 1], that satisfies Aµ£ m(A) = 1. The value m(A) represents the amount of belief that is exactly committed to A µ £, and due to lack of further information cannot be transferred to any more specific event. The BBA assigned to the empty set m(Ø) is interpreted as the amount of conflict or as the result of the possibility that £ is not exhaustive. The subsets A with a property m(A) > 0 are referred to as the focal sets of the BBA. The state of complete ignorance is represented by a vacuous BBA defined as m(A) = 1 if A = £ and zero otherwise. Belief, Plausibility and Commonality: The belief function bel, the plausibility function pl, and the commonality function q are all in one-to-one correspondence with the BBA. They are introduced as alternate convenient interpretations of belief: X bel(A) = m(B) 8 Aµ£ (3) Ø6=BµA X m(B) 8 Aµ£ (4) pl(A) = A\B6=Ø X q(A) = m(B) 8 A µ £: (5) B:AµB
Thus bel(A) represents the total belief that is committed to A without also being committed to its ¯ pl(A) corresponds to the total belief complement A; which does not contradict A; q(A) is the total belief “above” the set A (or implied by A). Conjunctive Rule of Combination: Let m1 and m2 be two BBAs defined on the same frame £. Suppose
RISTIC & SMETS: TARGET CLASSIFICATION APPROACH BASED ON THE BELIEF FUNCTION THEORY
575
that the two BBAs result from two distinct2 pieces of evidence. Then the joint impact of the two pieces of evidence can be expressed by the BBA: X m12 (A) = (m1 ° \ m2 )(A) = m1 (B) ¢ m2 (C): B,Cµ£:B\C=A
(6) Note that the conjunctive rule ° \ is both commutative and associative. Pignistic Probability: The pignistic probability is the result of mapping a belief measure to a probability measure. For the singletons µi 2 £ it is defined as Bet P(µi ) =
X
µi 2Aµ£
1 m(A) : jAj 1 ¡ m(Ø)
¯ µi 2A
(A µ £, Z µ Z)
(8)
where plZ (Z j µi ) is the conditional plausibility function which corresponds to mZ (Z j µi ) via (4), and ¯ is a complement of A in frame £. If there is some A prior belief over £, represented by m0£ (A), then this belief is combined to m£ (A j Z) using the conjunctive rule of combination. Least Committed BBA: Suppose we know the pignistic probability Bet P on a frame £ and would like to build its underlying basic belief masses. Clearly there is no unique solution: the collection of BBAs on £ that induces the same pignistic probability on £ is referred to as the set of isopignistic BBAs. A reasonable solution is to be cautious and to select from the isopignistic BBAs the one with the “smallest” amount of committed belief [4, 17]. For this, however, we need to be able to compare BBAs according to the amount of committed belief. We say that m1 is q-less committed than m2 if and only 2 The notion of “distinctness,” discussed in [17], is often called independence, although these two concepts are subtly different. 3 Pignus means a bet or a wage in Latin. 4 Since the GBT deals with two frames, to avoid confusion in our notation, when necessary, we use the superscript to denote the relevant frame of discernment over which the beliefs are expressed.
576
mLC (Ai ) = jAi j(Bet P(µi ) ¡ Bet P(µi+1 )),
(i = 1, : : : , N) (9)
(7)
The pignistic transformation (7) is linear and has some other useful properties [17], such as that bel(µi ) · Bet P(µi ) · pl(µi ), if m(Ø) = 0. Bet P is the probability measure that we use for decision making (betting) and hence its name.3 Generalised Bayesian Theorem (GBT): Let £ defined by (2) be the frame of (target) classes and Z = fz1 , z2 , : : : , zL g be the frame of the feature measurements. Suppose for every Z µ Z we know the conditional BBAs4 mZ (Z j µi ), i = 1, : : : , N. Then the GBT provides a means to compute for every A µ £ the conditional BBA m£ (A j Z) as follows [13, 3]: Y Z Y m£ (A j Z) = pl (Z j µi ) [1 ¡ plZ (Z j µi )], µi 2A
if q1 (A) ¸ q2 (A), for every A µ £.5 Then it turns out that the q-least committed (q-LC) BBA among all the isopignistic BBAs is characterised by nested focal sets [14]. The belief (plausibility) function with nested focal sets is referred to as a consonant belief (plausibility) function [11].6 Let us label the elements µi 2 £ so that Bet P(µ1 ) ¸ Bet P(µ2 ) ¸ ¢ ¢ ¢ ¸ Bet P(µN ). Then the q-LC BBA has N nested focal sets defined as Ai = fµ1 , µ2 , : : : , µi g where i = 1, : : : , N (the focal sets Ai are nested because A1 ½ A2 ½ ¢ ¢ ¢ ½ AN ), and the BBA mass allocated to each Ai is given by [14]:
where Bet P(µN+1 ) = 0. Using the q-LC isopignistic belief function in the TBM framework is conceptually similar to using the probability function with the largest entropy in the probability theory. III. CLASSIFICATION USING LC BELIEF FUNCTION ON CONTINUOUS FRAME The bulk of this section is devoted to the development of the belief function theory on a continuous space. This development introduces the concepts of the basic belief density (BBD) function, the pignistic density function, etc. But first let us describe the main steps of the proposed classification scheme, to explain why the theoretical development, which follows in this section, is necessary. A. Classification Scheme The basic idea of the proposed classification scheme is to treat the densities p(z j µi ) as the class-conditioned pignistic densities. Then, as stated in the introduction, we adopt a continuous-valued feature space and proceed with classification in three steps as follows. Step 1 From p(z j µi ), build the plausibility density function plZ (z j µi ) corresponding to the q-LC isopignistic belief density function. Step 2 Apply the GBT to obtain m£ (A j z). Note that: 1) the plausibility density (from Step 1) is all we need to apply (8), and 2) m£ (A j z) is a special case of m£ (A j Z) expressed by (8). 5 The origin of this order results from the fact that q(A) measures the amount of uncertainty in the context where A is true. Indeed q(A) is the mass given to A after revising the belief represented by m given the fact that A is true. This mass q(A) is free to be allocated to any subset of A and thus represented the part of belief completely uncommitted in the context where A is true. The larger the q, the less committed the masses [17]. Note that of all BBAs, the q-LC one is the vacuous BBA: its q function equals 1 for all A µ £. 6 Mathematically, consonant belief and plausibility functions are also known as necessity and possibility functions, respectively. They form a basis of the possibility theory [5].
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 41, NO. 2 APRIL 2005
Step 3 Apply the pignistic transformation (7) to m£ (A j z) to compute the pignistic class probabilities over the frame £. Handling multiple independent features or attributes is straightforward. Let u be a feature measurement, independent from z. Then the combination of m£ (A j z) with m£ (A j u) is performed using the conjunctive rule of combination (6) after step 2, but before step 3. This method results from one of the essential properties of the GBT: combining the two BBAs on £ or computing the BBA on £ induced by the product of the class-conditioned plausibility functions produces the same result [13]. Steps 2 and 3 of the proposed scheme are straightforward to understand and implement (see [3] for details). Hence in the next two subsections we concentrate on derivation of the q-LC plausibility density function induced by a pignistic density function whose domain is the real axis R. In order to simplify our presentation, in the remaining text of this section we drop from our notation 1) the conditioning on the class and 2) the superscript to denote the frame (since we deal with a single frame Z).
Fig. 1. Point K = (a, b) inside triangle T[®,¯) , uniquely defines interval [a, b) µ [®, ¯).
Fig. 2. Graphical representation of (a) belief, (b) commonality, (c) plausibility.
B. Belief Function Theory on R To fix ideas, first we consider the case where the frame Z is a segment on a real axis R, discretised into a finite number of intervals. Then we look at the case where the frame is the entire real axis R with an infinite cardinality. The mathematical rigour is omitted from our presentation, but the details can be found in [15]. 1) Finite discretisation of R: Let [®, ¯) be a segment of the real axis R, such that ® < ¯. Consider the frame of discernment Z = fz1 , z1 , : : : , zL g whose elementary events are the intervals z` = [z ` , z¯` ),
` = 1, 2, : : : , L
where z ` = ® + ¢(` ¡ 1), z¯` = ® + ¢`, and ¢ = (¯ ¡ ®)=L. As usual the beliefs are expressed over the subsets of Z. However, we restrict ourselves to those belief functions that assign nonzero belief masses only to the simple segments of R. A simple segment is a union of consecutive elements of Z. For example a subset fz1 , z2 g = [®, ® + 2¢) is a simple segment, but fz1 , z3 g = [®, ® + ¢) [ [® + 2¢, ® + 3¢) is not. Thus the focal sets belong to © L Z[®,¯) = Zi = [ai , bi ) : ® · ai < bi · ¯; ai 2 fz 1 , : : : , z L g; ª bi 2 fz¯1 , : : : , z¯L g; i = 1, 2, : : : , SL [ Ø P where SL = L`=1 ` = L(L + 1)=2. For convenience let L us use notation Z0 = Ø. Function m : Z[®,¯) ! [0, 1] is a BBA with the property L(L+1)=2
X i=0
m(Zi ) = 1:
There is a very convenient graphical representation of these intervals [12, 19]: every A = [a, b), such that a, b 2 [®, ¯) and a · b, corresponds to a single point in the triangle of Fig. 1, and vice versa. This triangle is defined as: T[®,¯) = f(x, y) : x, y 2 [®, ¯), x · yg. To each point in the triangle T[®,¯) that corresponds to a focal set of BBA m, we assign a probability mass equal to the BBA mass. Hence the value of m([a, b)) is assigned to the point (a, b) 2 T[®,¯) for L every Z 2 Z[®,¯) . The result of this assignment is a (discrete) probability distribution function on T[®,¯) (provided that m(Ø) = 0). The convention for axes x and y (the start and the end point of an interval) is adopted as shown in Fig. 1. The corresponding graphical representations of belief, plausibility and commonality functions are shown in Fig. 2. For example, function bel(Z), where Z = [a, b), represents the sum of all masses given to the subsets of Z, thus to intervals Zi = [ai , bi ), where [ai , bi ) µ [a, b), i.e., ai ¸ a and bi · b. Graphically, every mass included in bel(A) must lie in the shaded triangle of Fig. 2(a), and to compute bel(A) one has to add up the masses of all the focal sets located in this triangle. Similarly, to compute q(A) and pl(A), one needs to add up the masses of focal sets located inside the shaded rectangle of Fig. 2(b) and the shaded area of Fig. 2(c), respectively. 2) Real Axis as a Frame: The case where the frame is the entire real axis R, corresponds to the one we just considered in Section IIIB1, with L ! 1, ® ! ¡1 and ¯ ! 1. The beliefs are now expressed over the set of all intervals over R plus the empty set,
RISTIC & SMETS: TARGET CLASSIFICATION APPROACH BASED ON THE BELIEF FUNCTION THEORY
577
that is, the focal sets belong to Z = fZ = [a, b) : a, b 2 R, a · bg [ Ø
(10)
where by convention [a, a) denotes a singleton fag (formally, a singleton is a zero-length interval lim²!1 [a, a + ²), which degenerates into a point a 2 R). Hence the concepts we introduced in Section IIIB1 remain valid except that belief masses become belief densities and sums become integrals [10, 15]. Let m([a, b)) be now a BBD defined on domain Z. The value of m([a, b)) is assigned to a point on triangle T = f(x, y) 2 R2 : x · yg. Let f(a, b) = m([a, b)). Then f : T ! [0, 1) is a probability density function on T with the property that:7 Z 1 Z 1 f(x, y)dx dy = 1: (11) x=¡1
y=x
The belief function bel : Z ! [0, 1] is defined as an integral of f(x, y) with the limits of integration defined by the shaded triangle in Fig. 2(a): Z b Z b bel([a, b)) = f(x, y)dx dy: (12) x=a
y=x
Similar expressions can be written for q([a, b)) and pl([a, b)), see [10, 15]. Finally the pignistic probability density function Bet f(s) in this case is defined on singletons s 2 R as follows [10, 15]: Z s Z 1 f(x, y) Betf(s) = lim dx dy: (13) ²!0 x=¡1 y=s+² y ¡ x We do not directly put ² = 0 in (13) in order to avoid division with zero. A brief, mathematically nonrigorous explanation of (13) follows next. Recall the definition of pignistic probability in the discrete case, given by (7), and assume we deal with a normalised BBA, that is m(Ø) = 0. Equation (7) states that the pignistic probability Bet P(µi ) is a sum of masses m(A), weighted by 1=jAj terms, and taken over all sets A that contain the singleton µi . For the case of a finite discretisation of R (Section IIIB1), this would mean that Bet P(s), where s is a singleton, is a weighted sum of all masses that lie in the shaded triangle of Fig. 3: every point in this shaded area defines an interval which contains s. When dealing with intervals on R, the summation weights are the reciprocal values of interval lengths, hence the y ¡ x term in the denominator of (13). The limits of integration in (13) correspond to the shaded area of Fig. 3, when ® ! ¡1 and ¯ ! 1. The theory of belief functions can be extended to Rn provided every focal set can be defined by a finite number of parameters. For instance in R2 , focal sets could be rectangles, or ellipses, or hexagons, etc. 7 Normalisation
of the BBD as in (11) is not necessary. The integral of f(x, y) over T may be allowed to result in a value that is less than 1, with the missing belief allocated to the empty set, just as it was done in TBM [17].
578
Fig. 3. Area of integration for computation of pignistic probability (finite discretization of R.
C.
LC BBD on R
Given a pignistic density function Bet f, the q-LC belief density among the set of belief functions isopignistic with Bet f, is a consonant belief density. On the frame which is a real axis R, the focal sets of the q-LC BBD (which by definition are intervals [a, b) ½ R) can be ordered in such a way that one is contained by the following one. We further focus on a unimodal pignistic density on R, with a mode ¹ = arg maxs Bet f(s). (The multi-modal densities can be treated separately as the mixtures of unimodals.) The focal sets of the q-LC belief density are nested intervals [a, b), and in this case the interval limits have to satisfy the following property: Bet f(a) = Betf(b) [15]. As a result, for every focal interval of the q-LC belief density, [a, b), we have that ¹ 2 [a, b). Another very important property of the focal intervals of the q-LC belief density is that they form a line on the triangle T . This line has the following two properties: 1) It starts from (x, y) = (¹, ¹); the plausibility at this point equals 1. 2) For all symmetrical pignistic densities Bet f(s) (e.g., normal, Laplace, Cauchy), centered at ¹, this is a straight line given by y = ¡(x ¡ 2¹),
¡1 < x · ¹:
Fig. 4 shows the line of focal intervals in T for (a) normal pignistic density with ¹ = 2:5 and ¾ = 1, and (b) gamma pignistic density Bet f(s) = se¡s , (s > 0), with the mode ¹ = 1. The relationship between Bet f(s) and any BBD in general is expressed by (13). However, as the q-LC BBD induced by Bet f(s) is characterised by focal sets that form a line in T the expression in (13) can be simplified. From the discussion so far and Fig. 4 we note that the probability density function on T , which corresponds to the q-LC BBD induced by Bet f(s), can be expressed as ˜ y)±(x ¡ Á(y)) f(x, y) = ·(x,
(14)
˜ y) is the function that we wish if y ¸ ¹ ¸ x. Here ·(x, to determine, ±(x) is the Dirac delta function and the
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 41, NO. 2 APRIL 2005
For model-based classification problems, the belief function theory applies the GBT. From (8) we have seen that the GBT requires to compute the plausibility function from the BBD. Since the q-LC BBD is a consonant belief function, with the property that its focal sets are the points along a line in T , we can write Z 1 Z 1 d Bet f(a) da: ·(a)da = ¡ (a ¡ Á(a)) pl(x) = da x x (19) The limits of integration in (19) reflect the fact that only the focal intervals with the property x · a · 1 will have a nonempty intersection with x. Using the differentiation rule: (uv)0 = u0 v + uv0 and the property of our unimodal BBD: limx!1 Bet f(x) = 0 we obtain ¶ Z 1µ pl(x) = (x ¡ Á(x))Bet f(x) +
x
1¡
dÁ(a) da
Bet f(a)da:
(20) One can also prove that relation (19) can be written as Z 1 pl(x) = min(Bet f(a), Bet f(x))da: (21) ¡1
Fig. 4. Focal sets of q-LC belief density (solid line in upper triangle) induced by (a) normal pignistic density, and (b) gamma pignistic density.
value of Á(y), which is unique due to unimodality of Bet f, satisfies Bet f(Á(y)) = Bet f(y), with y ¸ ¹ ¸ Á(y). The substitution of (14) into (13) yields for s ¸ ¹ [10, 15]: Z s Z 1 ˜ y)±(x ¡ Á(y)) ·(x, Bet f(s) = lim dx dy ²!0 x=¡1 y=s+² y¡x (15) which simplifies to Bet f(s) =
Z
1
˜ ·(Á(y), y) dy: y ¡ Á(y) y=s
˜ Using notation ·(Á(y), y) = ·(y) we have Z 1 ·(y) dy: Bet f(s) = y ¡ Á(y) y=s
(16)
(17)
By differentiation of (17) we obtain that ·(s) = ¡(s ¡ Á(s))
d Bet f(s) : ds
(18)
The BBD ·(s) is always positive since s ¸ Á(s) and (d Bet f(s)=ds) < 0 for s ¸ ¹.
This result was derived in [6, eq.(6)]. Analytic expressions for the q-LC BBD and its plausibility are given below for the Gaussian and exponential pignistic density, respectively. Expressions for other densities can be derived likewise.8 Gaussian Pignistic Density: Suppose the pignistic density is Gaussian, Bet f(x) = N (x; ¹, ¾). In order to work out the q-LC BBD ·(x) and its corresponding plausibility pl(x), we make the standard substitution y = (x ¡ ¹)=¾. In this case y ¡ Á(y) = 2y and thus dÁ(y)=dy = ¡1. Application of (18) and (20) yields for y ¸ 0: 2y 2 2 ·(y) = p e¡y =2 (22) 2¼ p 2y 2 (23) pl(y) = p e¡y =2 + erfc(y= 2) 2¼ p R1 2 where erfc(s) = (2= ¼) s e¡t dt. It follows then: ·(x) = ·(y)=¾ and pl(x) = pl(y). The two functions are shown in Fig. 5 for ¹ = 1 and ¾ = 1:5. Exponential Pignistic Density: Let Bet f(x) be an exponential density: (1 x¸a e¡(x¡a)=µ Betf(x) = µ : (24) 0 x