(MCE) estimation by using an unconstrained dual convex program. In this dual setting the Lagrange multipliers in the original minimum cross spectral density ...
236
IEEE
TRANSACTIONS
ON INFORMATION
THEORY,
VOL.
IT-32,
NO.
2, MARCH
1986
Computation of M inim u m Cross Entropy Spectral Estimates: An Unconstrained Dual Convex Programming Method PATRICK
L. BROCKETT,
ABRAHAM
Ahsfracf-The minimum cross entropy spectral analysis procedure (a generalization of maximum entropy spectral analysis) is formulated as a convex programming problem, and its unconstrained dual convex programming problem is shown. In this dual setting the Lagrange multipliers are precisely the dual variables, and the numerical solution values are easily determined by any of a number of nonlinear programming codes. This result vastly simplifies the computation of all such spectral density estimates.
I.
INTRODUCTION
M
INIMUM ENTROPY spectral density estimation and its generalizations has become an effective technique for estimating the power spectrum. When prior or initial estimates of the spectrum are available, the technique of minimum cross entropy (also called minimum discrimination information (MDI), directed divergence, I-divergence, relative entropy, or Khinchin-KullbackLeibler (K2L) estimation) can be used to incorporate the prior estimate in an optimal manner. This technique has also been extended to include multiple signals [l] and weighted initial estimates [2]. For reviews and justifications of this technique see [2]-[4]. It has been applied in numerous fields, e.g., speech processing [l]-[4], statistical mechanics and thermodynamics [6]-[12], statistics [13]-[15], reliability theory [16], [17], traffic networks [18], image processing [19], [20], and, of course, maximum entropy spectral analysis [l]-[4]. In this paper we formulate the minimum cross entropy (MCE) estimation by using an unconstrained dual convex program. In this dual setting the Lagrange multipliers in the original minimum cross spectral density estimate are precisely the dual variables. The solution values can be easily obtained numerically because of the simple unconstrained nature of the dual problem. A summary of the paper is as follows. Section II contains the mathematical formulation of minimum cross entropy in its primal form. In Section III we present a general review of spectral analysis and a development of spectral estimation as done Manuscript received May 14, 1984; revised July 29, 1985. This work was supported in part by the Office of Naval Research under Contracts NO0014-81-K-0415 and N00014-82K0295. P. L. Brockett is with the Department of Finance, University of Texas at Austin, Austin, TX 78712. A. Charnes and K. H. Paick are with the Center for Cybernetic Studies, CBA 5.202, University of Texas at Austin, Austin, TX 78712. IEEE Log Number 8406620.
CHARNES,
AND
KWANG
H. PAICK
by Shore in [3]. In Section IV we present the unconstrained dual formulation for Lagrange multipliers. A concluding discussion and extension is given in Section V. II.
MINIMUM
CROSS ENTROPY
The information theoretic approach of tion is based upon the mean information tion between two densities. Mathematically, to estimate that density function p which possible” to some given function q, and certain given moment constraints, e.g.,
density estimafor discriminathe problem is is “as close as which satisfies
m inG lq) = /P(x) ln bbVq(xN X (M (2.1) subject to jh,(x)p(x)h
(dx) = e, = 1
/mP(xP
@x) = Ok.
(2.2)
Here A is some dominating measure for p and q (usually Lebesgue measure in the continuous case, or counting measure in the discrete case), (e,, . . . , 6,) are given constant values for a known set of functions hi;. a, h,, and h,(x) = 1. The explicit calculation of the MCE density subject to the given constraints is carried out by Lagrange multipliers. Introducing a Lagrange multiplier pi for constraint i, it can be shown that the optimizing density p* is given by p*(x)
= q(x) w[
- &Vd-d]
(2.3)
(see Kullback [21] for details). W e shall call (2.3) the MCE density subject to the constraints. The parameters pi, i = 1,. s ., k are chosen so that the density (2.3) satisfies all the moment constraints (2.2). Since solving for the MCE estimate p*(x) entails solving the highly nonlinear constrained equation, it has been difficult to solve for the B explicitly in order to obtain a closed-form solution ex-
OOlS-9448/86/0300-0236$01.00
01986 IEEE
BROCKETT
et ~1.:
MINIMUM
CROSS ENTROPY
SPECTRAL
237
ESTIMATES
pressed directly in terms of the known expected values 19; rather than in terms of the Lagrange multipliers [l]-[3]. The purpose of this paper is to develop a general MCE estimation from a dual convex programming approach and to establish analytical properties of the estimates directly from this duality. The principle of maximum entropy is a special case of MCE under the condition q(x) = 1 in (2.1). Of course, the maximum entropy refers to the x log x entropy function, not the log x entropy functional form. The question of which entropy functional form should be used in different situations is a controversial issue. This issue was discussed by Frieden [22] and Johnson and Shore [23]. Here we use the x logx as a functional for the posterior probability density p, while Lang and McClellan [30] use the log x functional as a functional for the spectrum estimate. III.
SPECTRALANALYSISAND
= f
{ a,cos (27rfkt) + b,sin (27if,t)}
= J f
8,
cikxkp*(x)
t, = 0, i = l;..,
= 2cos (27rfktr),
C lh
max - J pb)ln P 1 = 0, =
[pb)/d-dl
xk = (a: + bi)/2, the random process (3.1) can be expressed in terms of a joint probability density p(x), where x = (xi; . ., x,). W e assume the a priori or initial estimate of the power spectrum at frequency fk is S,, and the joint prior density is given by the exponential form
p cx) dX N
6,
=
c x=1
J
clkxkdx)
i =
dx,
exp [-(a;
+ b@S,].
. .,
A4 + 1. (3.4)
r p*cx>
=
dxjexp
M+l -’
-
1
N
c i=l
8
c k=l
XkClk
(3.5) I
where p, are Lagrange multipliers determined by the constraint (3.3) and h is determined by the normalization constraint. For comparison with the more traditional presentation of maximum entropy spectral density estimation, some modifications of these equations are provided. By substituting (3.2) for q(x) and eliminating X, (3.5) becomes N Pb)
=
/M+l
hFl
\
c i=l
&,k
+
‘/‘k I M+l ;Fl
&k
+
1/sk
i1 xk
.
c3@
Hence, for a Gaussian prior density (3.2), the posterior estimate of the power spectrum can be expressed as
or in terms of the amplitudes a, and b,,
q(a, b) = 2 (1/2dk)
1,.
According to (2.3) the minimum cross entropy posterior estimate of p* is
(3.2a)
(l/Sk)exp[-xk/Skl
dx
J
(3.1)
where the a, and b, are random variables and the fk are nonzero frequencies that are not necessarily uniformly spaced. Since the power at each frequency can be expressed by the random variables xk,
f k=l
M + 1.
W e wish to obtain a posterior estimate of the spectral density S consistent with this new information via minimum cross entropy. Thus the posterior estimate p*(x) is the solution to
L
=
(3.3)
where
k=l
d-d
dx
k=l
subject to
MCE
This section is a general review of spectral analysis and follows the development of spectrum estimation by Shore in [3], using some notation and results from [l], [2], and [4]. In MCE spectral analysis it is assumed that the timedomain signal is a stationary random process. The discrete spectrum approximation to the stationary random process has the form s(t)
When an a priori estimate is available and we have new information in the form of values 0,, i = 1; . ., M + 1 of the autocorrelation function, we may use the inverse Fourier transform to obtain
(3.2b)
k=l
Since jx,q(x) dx = Sk, the expected spectral power at frequency fk is precisely the prior estimate Sk. Note that (3.2) is equivalent to a Gaussian assumption for each of the amplitudes ak, b,. An additional advantage of the assumption for (3.2) is that (3.2) itself has the same structure as a maximum entropy (or MCE) posterior estimate. This is a member of the exponential family of distributions; hence the usual results concerning statistical properties, such as the existence of sufficient statistics, are valid in this form. For more discussion about this assumption see [24].
M+l
T, =
J
xkp(x)
dx = 1/
c i
&,k
+ l/s,
(3.7)
r=l
where the p, are selected to satisfy the autocorrelation constraints, 8,=/t
c,kxkP(x)dx k=l
= f
TkCik,
i = l;..,
M + 1.
k=I
This method is different from previous maximum ent-+ ropy spectral analysis (MESA) in its explicit inclusion of a
238
IEEE TRANSACTIONS
prior estimate of the power spectrum. W e can have, however, the very same results under certain conditions. If S, goes to infinity and q(x) approaches one, the objective function of problem (3.4) becomes
J
p(x)lnp(x)dx=
-
f
lnT,-
Thus dx e max f
In Tk,
k=l
the MESA functional. In Brockett et al. [27] or Charnes et al. [28] it is shown how to obtain the constants B’s and h as dual variables in an unconstrained convex programming problem. W e shall discuss these matters further in Sections IV and V. IV.
UNCONSTRAINED DUAL FORMULATION LAGRANGE MULTIPLIERS
Theorem 1[32], [33]: The linear constrained maximum entropy primal problem in the discrete case is = -8’ln(G/ec)
THEORY,
VOL.
IT-32,
NO.
2, MARCH
1986
obtained between the optimal primal and dual variables a) inf {(z*) = sup v(6*) = max v(6*) = min[(z), b) ~(6) has a unique maximum at 6* > 0, and c) a*r = c’exp [AZ*].
Let K(6, x) = fj [ cieXl - iSix,] r=‘l
for 6, 2 0, c, > 0, x, E R, and define g(6)
FOR
Numerous papers have been written regarding the dual formulation for Lagrange multipliers in spectral analysis [14], [29]-[31]. Mainly, however, the computation of the MESA has used the primal problem formulation. Here we present the explicit dual form for the discrete case and the continuous case of MCE. In the first part of this section, we shall present MCE estimation in the discrete case via constrained convex programming. As we mentioned before, the problem of determining the Lagrange multipliers using the primal formulation is in general quite difficult. The following Charnes-Cooper extremal principle, however, makes this computationally very easy when addressing the problem (3.4) rather than (3.7).
supv(B) 8
INFORMATION
Note that State 3) is the usual state considered in applied problems. Proof (adapted from Charnes et al.) (281: Because state 3) is the most usual and encountered state, we shall present the proof for state 3) only. The proof of 1) and 2) may be found in [28].
1.
k=l
min P(x)lnp(x) P J
ON
(44
such that 6’A =b’ 610 where 6 = (6,; . ., 6,)’ and c = (ci; . ., c,)’ are two finite measures and A is a k X m matrix and b is a k X 1 vector of constants. The unconstrained dual problem of (4.1) is infS(z) = c’exp {Az} - b’z, (4-2) i where z = (zi; . ., zk). There are exactly three mutually exclusive and collectively exhaustive (MECE) duality states. 1) A = { 6: 6’A = b’, 6 2 0} = 0 and l(z) is unbounded below. 2) Every feasible solution of (4.1) has a zero component for all 6 E A # 0 and l(z) has only an infimum. In this case inf l(z) = min lo(z) where T,(z) contains only those terms of S(z) for which 6; > 0 in some 6 E A. 3) There exists 6 E A with 6 > 0 and l(z) has a minimum at z*. In this case the following relationships are
By the duality lx,,. . ., xn), g(6)
= i;fK(S,
x) = -g’ln(G/ec).
inequality,
(4.3)
for 6 = (a,,. * +, 6,) and x =
= - C6;ln(GJec,) 2 C [c, exp (xi)
- 6,x;] = K( 6, x)
where e is the base of the Napierian logarithms, and we set Oln(0) = 0. Because of the constraint CJia,, = b, and by setting x, = C,aijz, for constant a,, and b,, we have g(6)
= g’ln(G/ec)
5 c’exp(x)
= c’exp [Az]
- 6’Az
= c’exp [Az]
- b’z = l(z)
- 6%
(4.4)
which holds for all z, c > 0, and 6 = { 6: 6’A = b’, 6 2 O}. The equality holds for 6: = c, exp [C,ajjz,*]. Q.E.D. If the requisites for state determination are not obvious, the state may be characterized by means of the following linear programming problem: max p such that pw* - 6’ 5 0 654 =b’ 620 where w’ = (1; . . ,l). State 1) corresponds to feasibility, state 2) corresponds to p * = 0, and state 3) corresponds to p* > 0. Obviously, there is no linear independence requirement, and all possible behavior for the system A is considered. W e note that the methods of Gokhale and Kullback [14] and of Agmon et al. [29] require linear independence for their calculation. The result is very attractive since the dual problem (4.2) is a convex programming problem involving only exponential and linear terms. Moreover, the desired Lagrange multipliers for the maximum entropy estimate (4.1) are precisely the dual variables to (4.2), and (4.2) is easily solved numerically because of the simple unconstrained nature of the problem (even in the inequality constrained case of the primal [see [33]]). Any of a number of readily
BROCK!Xr
et cd. : MINIMUM
CROSS ENTROPY
SPECTRAL
239
ESTIMATES
available nonlinear programming codes can be used to solve the dual formulation. Moreover, since we explicitly know the parametric form of the optimizing density in terms of the unknown parameters, the procedure we employ in obtaining the Lagrange parameters via the dual convex programming problem and the substituting into the parametric form is stable numerically. In the extension of the foregoing discrete CharnesCooper duality to the continuous density case, we may assume that the variables in (4.1) and (4.2) are functions of t. For a given summable positive function q: T + R (i = l;.., m) and real scalar 0; (i = 1; . ., m), the MCE estimation primal problem in the continuous case is inf&+d
= /Tp(t)ln
(p>
inf P
JT
PM
p(2 ,)ln
dt = 0
40) exp ( ;lBip,(t))/m(i) with p(t) since In(l) h(B)
= q(t)exp
i
iT P,dt>
i
1=1
P(t),
= 0. Therefore, = -ln~~(t)expi~l~,g,(i))dt+etB.
So the dual problem of (P) is
such that lp(t)p(t)
i = l;..,
dt = 8,,
S> = I(plq)
- :
P;/$t)p(t)
dt . i
m,
with nonnegative p and positive q. The Lagrangian function associated with the problem (P) is L(P,
Since q(t) exp (Czi,P;g,( t))/@( t) is a density, the integral is nonnegative. Thus
dt + 0 %
CD)
Although primal problem (P) is an infinite dimensional one with finite constraints, the dual problem is a finite unconstrained convex problem involving only exponential and linear terms. The duality theory for the primal (P) and its dual (0) are expressed in the following results [35].
i=l
Let h(p)
= inf, L(p, I(P;
Theorem 2: a) The problem (D) is bounded above if and only if problem (P) has a feasible solution. (b) If problem (P) is feasible, then the infimum is attained and
B). Then
4) - ? P,/Tg,b)p(t)
min(P)
dt + U3)
i=l
= sup(D).
c) If a density p exists satisfying the constraints /$;(t)p(t) strictly, then sup (0) p(t)
dt - In
i = l;..,
dt = 8,,
m
is attained and
min(P)
= max(D)
= sup(D).
To satisfy the constraints strictly means that a regular Bore1 measure u exists whose derivative p = dp./dt satisfies the constraints. Moreover, if p* solves (P) and B*
solves(D~*t~:”
,;;;;;;I; T
g;;$ I
I
i=l
(almost everywhere).
- In@(t)
+
eq3
where w>
= /q(t)exp T
[
5 &g,(t)
i=l
1
dt.
Proof: W e set problem (P) as a convex problem, in an appropriate vector space, with finitely many linear constraints as follows without loss of generality. Let T be a locally compact Hausdorff space, F the u-field of Bore1 subsets of T, dt a nonnegative regular Bore1 measure (rBm) on T and M(T) the linear space of real-valued finite rBm’s on T. For the element p E M(T) we shall denote by dp/dt its Radon-Nikodym derivative. For a given summable positive function q: T -+ R; continuous functions g,: T + R (i = l;.., m) and real scalar B, (i = 1; . ., m) we denote p (t ) = dp/dt a density.
240
IEEE TRANSACTIONS
By taking = 00.
J(P) =
if p is an absolutely continuous probability measure and p = dp./dt, otherwise, co, i
and consider the linear operators A : M(t)
+ R” given by
& I Jg,(t) T
Then, problem
(P) can be written as inf { J(p): P
Ap = 6, p E S}
(4.5)
where en,),
e = (6,;~.,
S = dom J = {p: J(p)
K = { y E R”‘: y = Ap
continuous with cone, it follows passing through K, i.e., z E R”,
z’B> 0
z’y 5 0
VOL.
?a;
h
+
c
b,‘$
-
lp(X)exp
x
+
However, problem
for every p E M(T) (A*z, IL) s 0 i z’e > 0 with p nonnegative and absolutely continuous with respect to dt. Here A* is the adjoint of A. Now (A*z, p) = /,( A*z) dp and the latter is nonpositive for every nonnegative rBm only if A*z 5 0 (a.e). Therefore, nonzero z E R” exists such that A*z $ 0 (a.e.) and z’8 > 0. The dual problem (D) is equivalent to
6
r=l
c
c,kxk
k=l
1 dx.
one can derive the following
equivalent
dual
1
(4.7)
M+l n-y
C /=1
PA
-
1nSyb) M+l
N
c
,f3, c
i=l
k=l
c,~x~ dx - 1.
T o d erive (4.7) from (4.6) one merely maximizes (4.6) with respect to X for fixed B, which can be done by equating its derivative to zero, i.e., M+l
1 - ex ,4(x exp J )
N
c
/3, c
i=l
k=l
1
c,~x~ dx = 0.
The analytic solution A = -1n
for every p E M(T)
or
N
(4.6)
for every y E K
\ z’tl > 0
1
r=l
M+l
I 0
large, sup (0’) Q.E.D.
M+l
M+l
or ( z’(Ap)
IT-32, NO. 2, MARCH 1986
x = Uz with U > 0 arbitrarily
for some p E M(T)}
where p is nonnegative and absolutely respect to dt. Since K is a closed convex from 8 P K that there exists a hyperplane the origin that strictly separates 8 and z # 0 exists such that
THEORY,
As noted previously, l(z) involves only an unconstrained finite vector z if we have a finite number of constraints. Note that we have now shown that the MCE estimation problem subject to constraints is equivalent to an unconstrained convex programming problem. Also the unconstrained dual convex programming problem to (3.4) involves only exponential and linear terms. Moreover, the desired parameters X and pi, i = 1,. . . , M + 1, for the MCE spectrum estimate (3.7) are precisely the optimal dual variables to (3.4) [cf. (4.2)] and may easily be obtained by any of a number of nonlinear programming codes (e.g., the SUMT method codes due to McCormick and Mylander or the generalized reduced gradient code of Lasdon). This vastly simplifies the computation of MCE and other entropic spectral density estimators. To be more explicit, referring back to (3.4) and the determination of A and (fir,. . . , /3,,,,+r), we note that the dual problem of MCE is the unconstrained
< co}.
Problem (D) is just the Lagrangian dual of (4.5) (which here coincides with the Fenchel-Rockafellar dual), and most of the results in the theorem follow from standard duality relations. Parts b) and c) are just the usual dual statement. The “if” part follows from the weak duality relation inf (P) 2 sup(D). Thus only the reverse implication, the “only if” part, is to be proved. Suppose (P) has no feasible solution, and consider the following subset of R”
ON INFOIUdATION
s
Tq(x) exp
N
1
fi, c
r=l
k=l
1
cZkxk dx
is substituted again in (4.6) and the result becomes (4.7). As mentioned before, (4.6) or (4.7) is the same as the dual problem of MESA under the condition q(x) = 1. The dual problem (4.6) or (4.7) involves an integral whose dimension is the number of discrete frequencies used to approximate a continuous spectrum. This integral can be vast, so (4.7) does not appear to be a convenient form for numerical work. Equation (4.7) can be simplified further in the following manner. From (3.2) /$x)
exP
[ ~,&,kxk]
dx
=
v
l,sk~;pzc~k~ i
BROCKEXT
et d. : MINIMUM
CROSS ENTROPY
SPECTRAL
241
ESTIMATES
Therefore, -in/p(x)
exp [~&w,]
131
dx
141 =
;ln(l/Sk-
cPic;k) i
-
$ln(1/sk).
Hence (4.7) is equivalent to the simpler expressions:
[51
[61 xln(l/Sk k
-
Epic,“)
i
+
CPiei-
i
[l
+
Cln(l/sk)]. k
171 (4.8)
Since the term inside brackets in (4.8) is a constant, we can simply delete this from the functional. If Sk goes to infinity, the prior estimate density becomes more and more closely proportional to Lebesgue measure (cf. (3.26)). V.
EXTENSION AND CONCLUSION
As stated in [l], multisignal minimum cross entropy spectrum analysis can be performed as a generalization of minimum cross entropy spectrum analysis. This is a method for simultaneously estimating a number of power spectra when a prior estimate of each is available and new information is obtained in the form of values of the correlation function of their sum. Also, it is possible to have prior information about the spectrum of an individual signal, e.g., that one is more reliable in some frequency ranges than others. Thus one might wish to give different weight to different frequency ranges. In other situations, it might be plausible to assign more weight to the initial noise estimate than to the initial speech estimate in spectrum analysis [2]. In both cases, we might apply the technique of MCE spectrum analysis (see [l], [2] for the details), and we can apply the Charnes-Cooper duality theory to determine Lagrange multipliers. W e finally remark that a set of constraints for an MCE estimate might possibly consist of two different types; equality constraints and inequality constraints. Frequently, in hypothesis testing or estimation of an “inferential” distribution by MCE estimation one has information about the possible candidate distributions in the form of inequality constraints in addition to equality constraints. An explicit convex programming duality theory for the MCE estimate with inequality constraints can be found in Charnes et al. [36]. Again, this duality result greatly simplifies computation. For one thing, linear independence checks are avoided, and for another, by working with the dual program and the Lagrange multipliers, we have continuity of the objective functional throughout and hence relative stability calculation.
WI [91
[lO I Pll WI v31 D41
WI P61 [I71
U81 D91
WI WI P21 [231 ~241 1251
WI [271
WI REFERENCES R. W . Johnson and J. E. Shore, “Minimum cross-entropy spectral analysis of multiple signals,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-31, pp. 574-582, June 1983. R. W . Johnson, J. E. Shore, and J. P. Burg, “Multisignal minimum-cross-entropy spectrum analysis with weighted initial
1291 [301
estimates,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 532-539, 1984. cross-entropy spectral analysis,” IEEE J. E. Shore, “Minimum Truns. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 230-237, Apr. 1981. J. E. Shore and R. W . Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Trans. Inform. Theory, vol. IT-26, pp. 26-27, Jan. 1980. E. T. Janes, “Information theory and statistical mechanics I,” Phys. Reu., vol. 106, pp. 620-630, 1957. _ “Information theory and statistical mechanics II,” Phys. Rev.: vol. 108, pp. 171-190, 1957. _ “Information theory and statistical mechanics,” in Statistical New York: Ph.&s, vol. 3, Brandeis Lectures, K. W . Ford, Ed. Benjamin, 1963, pp. 182-218. _ “Foundations of probability theory and statistical mechanics,” in Delaware Seminar in the Foundation of Science, vol. I, M. Bunge, Ed. New York: Springer-Verlag, 1967, pp. 77-101. 0. C. de Beauregard and M. Tribus, “Information theory and thermodynamics,” Helvetica Physica Actu, vol. 47, pp. 238-243, 1974. M. Tribus, Thermostatics and Thermodynamics. Princeton, NJ: Van Nostrand, 1961. A. Katz, Principles of Statistical Mechanics- The Information Theory Approach. New York: Freeman, 1967. A. Hobson, Concepts in Statistical Mechanics. New York; Gorden and Breach, 1971. I. J. Good, “Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables,” Ann. Math. Statist. vol. 34, pp. 911-934, 1963. D. V. Gokhale and S. Kullback, The Information in Contingency T&es. New York: Marcel Dekker, 1978. J. P. Noonan, N. S. Tzannes, and T. Costello, “On the inverse problem of entropy maximizations,” IEEE Trans. Inform. Theory, vol. IT-22, pp. 120-123, Jan. 1976. M. Tribus, Rational Descriptions, Decisions, und Designs. New York: Pergamon, 1969. -, “The use of the maximum entropy estimate in the estimation of reliability, ” in Recent Developments in Information and Decision Processes, R. E. Machole and D. Gray, Eds. New York; Academic, 1965. V. E. Benes, Mathematical Theory of Connecting Networks and Telephone Traffic. New York: Academic, 1965. P. Brockett, A. Chames, and K. Paick, “Information-theoretic non-parametric unimodal density estimation,” Univ. of Texas at Austin, CCS Rep. 467, 1983. _ “A method for constructing a unimodal inferential or prior distribution,” Univ. of Texas at Austin, CCS Rep. 473, 1984. S. Kullback, Information Theory and Statistics. New York: Wiley, 1959. B. R. Frienden, “Image enhancement and restoration,” in Picture Berlin, Processing und Digital Filtering, T. S. Hwang, Ed. Germany: Springer-Verlag, 1975, pp. 177-248. R. Johnson and J. E. Shore, “Which is the better entropy expression for speech processing-S log S or log S?” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 129-136, 1981. R. W . Johnson, “Axiomatic characterization of the directed divergences and their linear combination,” IEEE Trans. Inform. Theory, vol. IT-25, pp. 709-716, Nov. 1979. E. T. Janes, “Prior probabilities,” IEEE Truns. Syst. Sci. Cyhern ., vol. SSC-4, Sept. 1968. T. J. Ulrych and T. N. Bishop, “Maximum entropy spectral analysis and autoregressive decomposition,” Rev. Geophys. Space Phys., vol. 43, pp. 183-200, 1975. P. Brockett, A. Charnes, and W . W . Cooper, “MD1 estimation via unconstrained convex programming,” Commun. Statist. Simul. Computation, vol. B9, pp. 223-234, 1980. A. Charnes, W . W . Cooper, and L. Seiford, “Extremal principles and optimization dualities for Khinchin-Kullback-Leibler estimation,” Muth. Operarionsforsch. Statist. Ser. Optimization, vol. 9, pp. 21-29, 1978. N. Agmon, Y. Alhassid, and R. D. Levine, “An algorithm for finding the distribution of maximum entropy,” .I. Comput. Phys., vol. 30, pp. 250-258, 1979. S. W . Lang and J. H. McClellan, “Multidimensional M E M spectral estimation,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
242
[31] [32]
[33]
IEEE
ASSP-30, pp. 880-887, Dec. 1982. J. H. McClellan and S. W. Lang, “Duality for multidimensional M E M spectral analysis,” Proc. Inst. Elec. Eng., vol. 130, Part F, pp. 230-235, Apr. 1983. A. Chames and W. W. Cooper, “An extremal principle for accounting balance of a resource-value transfer economy: Existence, uniqueness and computation,” Rend. Acad. Naz. Lincei, pp. 556-561, Apr. 1974. A. Chames and W. W. Cooper, “Constrained Kullback-Leibler estimation, generalized Cobb-Douglas balance, and unconstrained convex programming,” Rand. Acad. Naz. Lincei, pp. 568-576,
TRANSACTIONS
[34] [35]
[36]
O N INFORMATION
THEORY,
VOL.
~~-32, NO. 2, MARCH 1986
Apr. 1975. T. Rockafellar, “Duality and stability in extremum problems involving convex functions,” Pac. J. Math., vol. 21, pp. 167-186, 1976. A. Ben-Tal and A. Chames, “A dual optimization framework for some problems of information theory and statistics,” Prob. Contr. Inform. Theory, vol. 8, pp. 387-401, 1979. A. Chames, W. W. Cooper, and J. Tyssedal, “KhinchinKullback-Leibler estimation with inequality constraints,” Math. Operazionsforsch. Statist. Ser. Optimization, vol. 14, no. 3, pp. 1-4, 1978.