defined by an extension of the concept of impulse response to the nonlinear case. Linear-quadratic filters are a special example of Volterra filters limited to the ...
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
36, NO. 5, SEPTEMBER 1990
1061
Geometrical Properties of Optimal Volterra Filters for Signal Detection BERNARD PICINBONO, FELLOW, IEEE, AND PATRICK DUVAUT
Absfrucf-Volterra filters are a particular class of nonlinear filters defined by an extension of the concept of impulse response to the nonlinear case. Linear-quadraticfilters are a special example of Volterra filters limited to the second order. In the first part of this paper it is shown that all the results recently published, valid in the linear-quadratic case, can be extended with the appropriate notations to Volterra filters of arbitrary order. In particular, the optimum Volterra filter giving the maximum of the deflection for the detection of a signal in noise is wholly calculated. In the second part several geometrical properties of optimal Volterra filters are investigated by introducing appropriate scalar products. In particular the concept of space orthogonal to the signal and the noise alone reference (NAR)property are introduced allowing a decomposition of the optimal filter that exhibits a relation between detection and estimation. Extensions to the infinite case and relations with the likelihood ratio are also investigated.
I. INTRODUCTION
T
H E CONCEPT of an optimal linear-quadratic system for detection and estimation was introduced in a previous paper [l]. It was also noted that linear-quadratic systems can be described by a Volterra expansion limited to the second order term. Volterra filters (VF) have been introduced into many aspects of nonlinear filtering problems, and we shall not mention here all the papers discussing their properties. For an introduction to this field applied to problems similar to those discussed hereafter, the reader should consult [21-[51. It is thus natural to extend the results presented in [ l ] to VF of arbitrary order, and this is the first purpose of this paper. At first glance this extension is quite obvious [6], and to avoid excessive length, we will present the main ideas and fundamental results. Our principal effort will be to introduce simplified notations in such a way that the analogy between the second order case and the general case appears evident. In fact, the input-output relationship of a VF is extremely tedious to write out, and without our simplified notation the paper would be full of complex equations contributing little to the understanding of the physical meaning of the results. It should be clear, however, that the exact analytical solution of our condensed equations would require extensive calculations. Manuscript received December 10, 1988; revised September 10, 1989. This work was supported in part by the Direction des Constructions Navales under Contract No. C8848 603 003 from the GERDSM (Groupe d'etude et de recherche de dttection sous marine) du Brusc. The authors are with the Laboratoire des Signaux et Systtmes, ESE, Plateau du Moulon, 91192 Gif sur Yvette, France. IEEE Log Number 9036190.
The extension of a second order to an arbitrary order VF exhibits new problems which were not analyzed in the linear-quadratic case. Among these problems we are mainly interested in the relation between optimal VF and the likelihood ratio (LR) filter that is the absolute optimal system for the test, or the detection, between two simple hypotheses, say the null hypothesis H , corresponding to the noise only situation, and the hypothesis H , , corresponding to the presence of a signal. For this purpose filters are considered as vectors in appropriate Hilbert spaces. In particular it is shown that the filters belonging to the space orthogonal to the LR have the NAR property. This is the basis of a decomposition of the optimal VF in two terms that exhibits relations between detection and estimation. Some extensions to the infinite case are also discussed. VOLTERRA FILTERS 11. OPTIMAL
A. Principles of Volterra Filtering A Volterra filter of order U ( U positive integer) is a nonlinear system the input-output relationship of which is described by a finite Volterra expansion. To simplify the presentation we will consider the discrete time case only, and we call x[k] and y[k] the input and output, respectively. The output is expressed by
. x [ k -i:].
(2.1)
In this expression k and i," are arbitrary integers and the symbol {i,"} means that the sum is taken on all the i,"s. The term h , corresponding to rn = 0 is a constant and is sometimes omitted. In this latter case, and if U = 1, we obtain the classical convolution characterizing linear filters. By an obvious extension of the terminology used in the linear case, we will say that the Volterra filter has a finite impulse response if the integers i," are taken in a finite interval, as, for example, 1 Ii," IN . We make this assumption henceforth to avoid convergence problems. In this case the input at time k can be considered as a vector x with components x[iI, 1 Ii IN , and the output can be
0018-9448/90/0900- 106 1$01 .OO 01990 IEEE
Authorized licensed use limited to: Bernard Picinbono. Downloaded on December 15, 2008 at 09:15 from IEEE Xplore. Restrictions apply.
1062
IEEE TRANSAC'TIONS O N INFORMATION THEORY, VOL.
written as 1
y
=h,
+
h,[ i?,i?; . . , i ~ ] xi [~ ] xi?][ . . . x [ i;;] m =I
(1:)
(2.2)
ir
where the i r s satisfy 1 I IN . For the discussion that follows it is necessary to simplify the notations a little more, and to this end we first write
h m [ i y , i ? , - . . , i K ]=h,[i,]. (2.3) Furthermore we associate with the L' kernels h , the quantity I h , ~ ) = { h , [ ~ , ] , h , [ ~, ., .] . , h ,[ i , I ) . (2.4) The quantity Alh, U ) is obtained by replacing the hs by Ah and the sum Ih, U ) + Ih', U ) is obtained by replacing the hs by h + h'. With these rules the quantity Ih,v) becomes a vector of a linear space called R N [ u ] . For c = 1 and without a constant term we find the classical vector space [ W N . As in any linear vector space, we can introduce linear operators such as
Alh,LI) = Ih',u) (2.5) and all the concepts associated with these operators are valid in rWN[u]. More important is the concept of a scalar product of . scalar product of two vectors Ih,v) vectors of R N [ u ] This and Ih',u) is defined by 1'
(h,U(h',U)=
Ch[i,]h'[i,]
(2.6)
m = l i,
which is a direct extension of that used in RN.This allows the introduction of the transpose of an operator A defined by (g
4 4, U ) = (h 4 A 7 g 7
3
L'
5 , SEPTEMBER 1990
Im)
E[IA')I
(2.11)
which is a vector of R'"[L'] obtained by taking the expectation value of the terms of (2.9). Then
E ( Y )=ho+(hlm)
(2.12)
and if ho = - ( h l m ) , y becomes, of course, zero-mean. For any arbitrary ( h ) it is then possible to use the constant term h,, in the Volterra expansion (2.2) in such a way that y becomes zero-mean. This procedure is equivalent to using the vector defined by IA'O)
=
I X ) - Im)
(2.13)
which is, of course, zero-mean and to writing (2.10) with h , = 0. This is done in what follows. This was already the case in [l]where linear-quadratic systems were written as y = hTx + xTMx - tr(CM), where x has zero-mean and covariance matrix C. B. Optimal Volterra Filter for Detection As in [l],our purpose is to find the optimal filter giving the maximum of the deflection of the output y defined by
(2.14)
D( Y ) = N / D
).
( h , v l A l h , u )20. (2.8) Let us now define the input vector J X , r ) . The input signal in (2.2) is a real N-dimensional vector x with components x [ i ] . To this vector of RN we associate the vector of rWN[ U ] defined by
NO.
order of the filter. Thus in the following [ / I ) means Ih,r%) whenever the order L' of the filter is well specified. In detection and estimation problems the input x is usually random. When this is the case, the vector /A') also becomes random, as does the output y given by (2.10). The randomness of the input introduces two points that must be briefly discussed. First, and most important, the output y can be undefined, in the sense that it is not a second order random variable. To avoid this situation we must assume that the input vector x has finite moments up to order 2zi. This ensures that y has a finite second order moment, or is a second order random variable. Secondly, y does not in general have a zero-mean value. To define this value let us introduce
(2.7) Finally, an operator A is said to be symmetric if A = AT and positive definite if for any Ih,u) we have 9
36.
N = [ E , ( Y ) - E,,(Y)12
(2.15)
D = V(l(Y)
(2.16)
where E,, and E , mean-expectation value under hypotheses H o and H , , respectively, and V,, denotes the variance under HO. To calculate the numerator we introduce the notation E , - , ( Y ) E , ( Y ) - Eo(Y) (2.17)
I X , U4>{ x [ i i ], x [ i : ] x [ i ; ] ; . , x [ i ;] x [ i ; ]. . . x [ i : ] } . and we define the signal vector by (2.9) It is the U order input vector associated with the input x. With this notation (2.2) can be written as Y
= h,,
+ ( h ,U I x,U )
3
(2.10)
which greatly simplifies the notations. Of course the exact calculation of y requires the return to (2.2). It is clear that if U = 1 or U = 2 we again find the linear and linearquadratic filters. Finally, to simplify a little more, we will omit the letter v when no ambiguity is possible in the
Is)
E,-l,[IX)I = ~ l - , l [ I ~ O ) l
..>
= { E , - ( ) ( x [ i ; ]E),,- , ( x [ i : ] x [ i ; ] ) , .
(2.18) x [ ill x [ i; ] . . . x [ i: 3 )} . E, It is clear that the first elements of (2.18) are the mean value of x and its covariance. The other terms have the same meaning for higher-order moments. With this notation we get N = (his)'. (2.19)
Authorized licensed use limited to: Bernard Picinbono. Downloaded on December 15, 2008 at 09:15 from IEEE Xplore. Restrictions apply.
PICINBONO A N D DUVAUT: GEOMETRICAL PROPERTIES OF OPTIMAL VOLTERRA FILTERS FOR SIGNAL DETECTION
As y is zero-mean under Ho,its variance is E o ( y 2 ) ,or Vo(Y)= ~ o [ ( h l ~ o ) ( ~ o l h ) l
B. The Space H,
(2.20)
where l X o ) is defined by (2.13). Note that as the expectation values in (2.111, (2.13), and (2.20) are taken under H,, y is zero-mean under Ho but not under H I , which appears in (2.15) and (2.17). Finally the variance can be written Vo(Y) = ( h l K l h )
(2.21)
K A E,[IXO)(X,Il.
(2.22)
with This is a positive definite symmetric operator. With these notations and using the Schwarz inequality we find that the maximum value of the deflection obtained with a Volterra filter of order I,' is d,Z = ( S , U I K - ~ ~ . S , L ~ )
(2.23)
and the corresponding optimal filter is To,,( x ) = a(s,ulK-'IXO,u).
(2.24)
We will assume henceforth that a = 1. This result gives the classical matched filter [71 for L' = 1, and the optimal linear-quadratic filter for U = 2. The exact calculation of this filter requires the solution of the equation
KIS) = Is)
(2.25)
which can, of course, be a tedious job for high values of As a result, To,, takes the form To,,( X I
= (SIX,).
The output y ( x ) of a VF considered in Section I1 is a zero-mean second-order random variable written as y ( x ) = ( h , c l X , , v ) . This output can then be considered as a vector of the space L , of scalar second order random variables. In this space the scalar product between two vectors U and U noted ( u , ~ is) E,[uu], and the orthogonality as well as the projection is written with the symbol AL to avoid any confusion with the concepts introduced before. Note that the expectation value is taken under the hypothesis H, as defined after (2.16). The space H,. is the subspace of L , containing all the outputs of VF with zero-mean value, or H, = ( F ( x ) l F ( x )= ( h , u I X , , u ) , F ( x ) E L,}. (3.4) C. The Signal Subspace and Its Complement Consider the vector Is) of the space W N [ u , M ]defined by (2.18) and the subspace R ~ [ u , M ; sof] vectors of R N [ u , M 1 that are orthogonal ( 1 , M ) to Is). Using the projection theorem, any vector l a ) of R N [ u , M ] can be written as la) = @,Is) + la,s,)
Furthermore, the component a, is given by a, = ( s l M l s ) - ' ( s ( M J a ) .
Is) I(M)la,s,>.
The optimal filter (2.26) can also be written as 7 2 x 1 = (4MIXo) = (4X,>,
(3.1)
* (alMlb) = 0
(3.8)
D. The Signal Decomposition of the Optimal Filter
A. The Space R"'[u,MI
The space R"'[LI, MI is the space deduced from R N [U ] in which the scalar product of vectors is defined by (3.1). We obviously have R N [U , I ] = R N [U ] , where I is the identity operator. In this space, orthogonal vectors are called orthogonal ( I ,M ) and denoted by
(3.7)
It is obvious that
(2.26)
Consider an arbitrary symmetric positive definite operator M and two vectors l a ) and Ib) of R"'[u]. The quantity ( a ( M 1 b ) satisfies all the requirements necessary to define a scalar product, written as
(3.5)
where
U.
FOR FINITE-ORDER 111. GEOMETRICAL CONCEPTS FILTERS VOLTERRA
(O),= ( 4 M l b ) .
1063
(3.9)
where la)= M - ' K - ' l s ) = M-IIS). (3.10) Applying the decomposition (3.5) to the vectors la) and IX,) of (3.9) and taking into account the various orthogonality ( I , M ) properties, we get (3.11) with
l a ) I( M ) l b ) . (3.2)
(3.12)
As the projection is a concept deduced from orthogonality, we will use the term projection ( I , M).In other words the projection ( I, M ) onto a subspace S is written M ) [ l a ) l S ]and defined by the orthogonality prinproj ( I, ciple, or
The term T,, , ( X I is deduced from (3.11) with the values (3.9) and (3.12) for T J x ) and T,(x), respectively. Let us investigate a little more precisely, in a particular case, the meaning of the term T,(x). Suppose that M = I , which is the more common situation. We deduce from (3.12)
(alb),
=0
Ila>-proj( - ~ M ) [ l a ) l S 1l }W ) l b ) for any vector 16) of S .
(3.3)
T,(x) = d:s-2(slX,)
(3.13)
where df: is given by (2.23) and s2 = (sls). Suppose now,
Authorized licensed use limited to: Bernard Picinbono. Downloaded on December 15, 2008 at 09:15 from IEEE Xplore. Restrictions apply.
1064
I E E E TRANSACTIONS ON INFORMATIONTHEORY, VOL. 3 6 , NO.
for simplification, that L’ = 2, which corresponds to linear-quadratic systems studied in [ l l . In this case, we obtain easily
B. The NAR Subspace Let us introduce the subspace of L , defined by
H,
T , ( x ) = d , f ~ - ~ [ s ~ x + . r ~ K x - t r ( C K(3.14) )]
K = E , -,,[nT] and C = E , [ n T ] .The where s = E , -,,[XI, first term in the bracket (3.14) is the output of the matched filter in a white noise, sometimes also called correlation receiver, because it performs a correlation between the input observation and the given vector s. The second term plays a similar role, but for a quadratic receiver. In conclusion, the term T , ( x ) uses directly the primary data on the hypotheses (s,K , C ) without any particular processing. This processing is performed through the term T,. .(XI. IV. GEOMETRY I N L , FOR THE OPTIMAL VOLTERRA FILTER
A. Basic Relations and the Deflection Property Let us assume that the observation vector x ( o ) of R N is a random variable with probability densities p o ( x ) and p , ( x ) under H,, and HI, respectively. The LR appearing in the statistical decision theory is defined by L ( x ) P,(X)/Po(X). (4.1) We assume in what follows that L ( x ) E L , , which means that ( L , L ) = E,( L2) 9 1, < + W . (4.2) There are some probability densities for which this is not satisfied. Such situations require a different approach from the following and are thus outside the scope of our discussion. In detection problems it is interesting to introduce the displaced LR, &x), defined by [8] R ( X ) = L( x ) - 1. (4.3) It is clear that E , ( R ) = 0 and ( R , R ) = I 2 - 19 d 2 . (4.4) Property 4.1: Any function F ( x ) of L , satisfies the “deflection property” characterized by ( R , W =E,-,[F(x)]. (4.5) Proof It is a direct consecwence of the definition of the scalar product by E , [ R ( x j F ( x ) l where R ( x ) is replaced by (4.3). o Comments: The term “deflection” is due to the fact that (4.5) gives the numerator of the deflection criterion, as seen in (2.15). As all the components of I X , U ) defined by (2.9) are vectors of L , , we deduce immediately from (2.18) that Is) may be written as Is> = ( R , l X ) ) = ( R 9 I X o ) ) .
(4.6)
(4.8)
= { F ( x ) l ( F , R )= 0)
which is then the subspace of functions of the observation x orthogonal (1)to R . It is clear from (4.5) that any vector of this subspace satisfies E,( F ) = Eo( F ) ,
(4.9)
which is called the NAR (noise alone reference) property [71, [9]. This means that the mean value of F is the same under H , and HI, and as H , is the noise only hypothesis, while H , corresponds to the signal-plus-noise, the mean value of F is a property of the noise alone. Furthermore, it is clear from (2.15) that any function of this space gives a null deflection. It is important to note that the NAR property is only valid in the mean, as seen on (4.9). It is sometimes possible to introduce an almost surely NAR property, which is especially the case for U = 1 and for the detection of a deterministic signal in noise. In fact the observation vector is n ( o ) under H,, and do)+ s under HI. If we use (3.5) in R N = RN[l,I ] , we obtain =ns(wb+ns,.(4
(4.10)
where n,,. ( U ) is the projection of the noise vector n ( w ) onto the subspace orthogonal to s. From this it is clear that n,,I(01 is independent of the presence or absence of a signal. Property 4.2: The function T,, .(XI introduced in (3.11) belongs to the NAR subspace. Proof From (3.111, we get T,, L ( x ) = T , ( x ) - T,(x). As a result of (4.71, we deduce that E,-,,[Ts,. ( x ) l = 0 and this gives with (4.5) ( R , T,, I( X I ) = 0, which characterizes a 0 vector of the NAR subspace.
C. Optimal VF and the Likelihood Ratio Property 4.3: The projection (1)of the function R ( x ) given by (4.3) onto the subspace H , defined by (3.4) is the optimal filter (3.91, or pro] ( 11[ R ( 1 )IHI
1 = To, ( 4 .
(4.11)
I
Proof We have to show that for any element G , ( x ) of HI defined by G , ( x )= (gtUIX0,U)
(4.12)
( G I R - T*,[) = 0.
(4.13)
we have 3
This last equation can be written E , , { ( g , u l X , , u ) [ R ( x )-(Xo,UlK-’Is,U>]}
= 0.
(4.14)
Using (4.6), this equation becomes
We can apply that to (3.9) and (3.121, which gives E,-,,[T,(x)l = E,-,[Ts(x)l = d , f where d : is defined by (2.23).
5, SEPTEMBER 1990
(4.7)
(g,ul[ls,u)-
KK-’ls,u>] = 0,
which gives the result.
Authorized licensed use limited to: Bernard Picinbono. Downloaded on December 15, 2008 at 09:15 from IEEE Xplore. Restrictions apply.
(4.15) 0
106.5
PlClNBONO A N D DUVAUT: GEOMETRICAL PROPERTIES OF OPTIMAL VOLTERRA FILTERS FOR SIGNAL DETECTION
Comments: This property establishes the relation between the likelihood ratio L ( x ) , or its displaced version R ( x ) , and the optimal Volterra filtering. It is a consequence of the fact that the filter with zero-mean value that maximizes the deflection without any constraint is R ( x ) [8]. As the deflection can be interpreted as a distance [8], it is natural that the optimal Volterra filter is the projection of R ( x ) onto the Volterra subspace. Furthermore note that in the linear case ( U = 1) it is possible to use either R ( x ) or L ( x ) if E , ( x ) = 0. In fact the orthogonality relation (4.14) can be written E,(g'x[ L( X ) - ~ ' h ]=}0, If E&)
= 0,
E,[xL(x)]
V g.
Comments: Let us present some consequences of these relations.
M
(4.16)
= E , ( x )= s,
and (4.16) gives Th=s (4.17) where r = E,(xr'). This is the equation defining the classical linear matched filter. Of course, as E , ( x ) = 0, we have E,[xL(x)] = E,[xR(x)], and the matched filter is the projection of the LR, or its displaced version, onto the subspace of linear filters.
D. Scalar Products in H , Property 4.4: The scalar products in the L, sense between the functions R ( x ) , T,, l(~), T,(x) and To(x) are given in Table I. TABLE I SCALARPRODUCTS BETWEEN R , T,,, T,, AND T,.
We see on Table I that T,, I(x) is orthogonal (1) to R ( x ) and To(x). This result will be extended in Section IV-E. If k 2 # 0, the functions T,, ,(XI and T,(x) are not orthogonal in L, even though, they are constructed from two orthogonal vectors ( 1, M ) as in (3.5). We deduce from the Schwarz inequality that k 2 = 0 for any value of the vector signal Is) if and only if (4.23)
=AK-I
where A is an arbitrary positive scalar that can be assumed to be equal to 1. In this case, T,, I(x) = 0, and T o ( x ) = T,(x). For L' = 1, we find the classical reproducing kernel Hilbert space (RKHS) used in the case of Gaussian noise in [lo] and [ll]. The reproducing property may also be found in the space RN[c,K-'] but it is not in the scope of this paper. All these geometrical properties are pictured in Fig. 1. In this figure the "horizontal plane" represents the subspace H,. that contains the vectors To, T, and T,, I. The projection of R onto this subspace is To. Furthermore, T,, I is orthogonal to R and To. Finally the distance between R and To is, of course, d;.,,= d 2- d:.
,
(4.24)
Proofi At first we deduce from (4.11) that R ( x ) T J x ) is orthogonal to H,., and as To(x),T,(x) and T,, I(XI are vectors of H,,, we obtain
( R , T o )= ( T o , T o ) ; ( R , T ,=) ( T o , T , ) ; ( R , T , J (4.18) = ( To T ,I ) . Furthermore, we deduce from (2.22), (2.231, and (2.24) with a = 1 that ( To,To)= d : . (4.19) Property 4.2 gives (R,C,.)=O. (4.20) 3
To Fig. 1. Representations of vectors R , T(,,and T,, i .
Let us now discuss the consequence of an increase in the order of the Volterra filter. Consider the subspace H , + defined as H , by (3.4). It is clear that H , c H , ( T , ~ , , , ) = ( ~ o - T , . , ~ , , I ) = - ( ~ , , , , ~ , , , ) and ~ - kwe2 can . thus write (4.21) H, + I = H, + H, ,Y (4.25) This term k 2 is equal to (T,,T,)-(To,To), and using (3.121, we get orthogonal (a)to where H , , 11 is the subspace of H , H , . We then deduce from (4.11) that
As a result we deduce from (3.11) and (4.19) that (To,T,) = d:.
Finally, we have
,
+,,
+,
T, -(slM(s)2]. 0
(4.22)
, + I ( x ) = T, ( x + proj ( a [ R ( x I H , , I . (4.26) 11
As d 2 defined bv (2.23) is eaual to E[T?.(x)Idefined by
Authorized licensed use limited to: Bernard Picinbono. Downloaded on December 15, 2008 at 09:15 from IEEE Xplore. Restrictions apply.
1066
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
(2.24) where a = 1, we deduce from the orthogonality of the terms of (4.26) that
d ; , , = d ; + 8,' (4.27) where 8; is the increment of the deflection when passing from a Volterra filter of order U to another of order U + 1. It is clear that in this operation the distance d i , , defined by (4.24) decreases.
36, NO. 5, SEPTEMBER 1990
can be written as (hlX,,, sI). Furthermore, since ( R ,I&, sI)) = 0, S is a subspace of H,. n H,. As a consequence, 4 x 1 = proj ( A )[ x,lS], which gives (4.34). Comments: We can then write (4.32) in the form
(4.35) To(X ) = d,f[x , ~ a,] = d : i , . In this expression x , is the component of the vector IXO) in the signal direction Is). We can also calculate IX,, s,) by (3.6) and solve the mse problem of x , in terms E. Interpretation of (3.11) in Terms of Estimation of IXO,s,), which yields i sIt. is worth noting that in this Consider the subspace H, n H,. It is the subspace of estimation problem, the innovation is nothing but i s . functions a ( x ) of the form 4 x 1 = (alX,) orthogonal (1) Using the previous results we easily get to R ( x ) with the NAR property introduced in Section IV-B. (4.36) Property 4.5: The optimal filter T J x ) belongs to H , and is orthogonal to H,. n H,. (4.37) E [ i : ] = ( s I K - ' l s ) - ' = dL2 Proof Any element a ( x ) of H,. may be written (4.38) E [a:] = E [x : ] - d F 2 . (alX,) and if it belongs to H,, we have also ( a , R ) = 0, The relation (4.35) has been obtained by a completely from the definition (4.8). Since different method when M = I , for the linear matched (a,R)%J(alX,)R(x)] =( 4 s ) (4.28) filter in [7] and for linear-quadratic filters in [l]. It is then from (4.61, a ( r ) as an element of H,. n H , is character- valid for arbitrary M defining the scalar product (3.1) and for arbitrary value of the order U of the VF. It is interestized by ( a , s ) = 0. But as = 0. This ing to note that if we take M = K - ' , we get is ( T A = ~o[(slK-'1X">