We derive an estimate-maximise formulation of a Bayesian super-resolution algorithm for re- constructing .... i,j=1 fâ i(Ïâ1) ij fj. (2.5). In the first line of Equation 2.3 we ignore any correla- ..... We represent L1(Ï|g) as a large inverted cup, and.
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm ∗
Stephen P Luttrell RSRE, Malvern, WR14 3PS, UK
We derive an estimate-maximise formulation of a Bayesian super-resolution algorithm for reconstructing scattering cross sections from coherent images. We generalise this result to obtain an `autofocus/super-resolution' method, which simultaneously autofocuses an imaging system and super-resolves its image data. We present an explanatory numerical example to illustrate the implementation of our method on images of single and double point targets that are defocused by O(depth of focus). These are successfully super-resolved by autofocus/super-resolution, but not by pure super-resolution. We conjecture that autofocus/super-resolution might usefully be applied to the interpretation of airborne synthetic aperture radar images that are subject to defocusing eects. I.
INTRODUCTION
Super-resolution is the name that we give to the process of increasing the eective bandwidth of an image (or time series) by introducing collateral information to augment the dataset: the classical Rayleigh resolution limit for distinguishing two point targets can thereby be overcome. Super-resolution belongs to the general class of `inverse problems' because it attempts to recover an object from its image by deconvolving the imaging operator. A limitation of our existing super-resolution technique [1] is its assumption that the parameters of the imaging system are known precisely, which causes problems for some applications. Although this problem of imaging system calibration is quite general, this work was originally prompted by the need to super-resolve images obtained from synthetic aperture radar (SAR) systems which have time varying imaging system parameters (due to phase shifts caused by anomalous motion of the transmitter/receiver). An ideal SAR can be modelled as if it were a simple linear imaging system, and the anomalous motion can (in rst order) be modelled as a simple defocusing of this linear imaging system. It is therefore convenient to visualise a SAR as being a microwave version of an optical bench experiment using coherent illumination and with the lens misplaced from its correct focus. We shall consider O(depth of focus) errors in the placement of the lens, which can cause severe degradation in super-resolved image quality [2]. This O(depth of focus) criterion applies independently of what the actual physical dimensions of a depth of focus happen to be for a particular imaging system, because it can be expressed alternatively as an upper bound on the quadratic phase error (which is dimensionless) that is acceptable at the edge of the aperture. We need to use a precise autofocus method in order to super-resolve successfully. We therefore develop a hybrid `autofocus/super-resolution' technique, in which autofo-
∗ This
paper appeared in Inverse Problems, 1990, vol. 6, pp. 975996. Received 5 February 1990, in nal form 8 June 1990. c Controller, Her Majesty's Stationery Oce, 1990.
cusing uses the super-resolved image (rather than the original image) to adjust the focusing parameter(s). The philosophy of this technique is to nd the set of imaging system parameters and super-resolved image that simultaneously t the image data and collateral information. In this paper we develop a theoretical framework that is applicable to any coherent imaging system, so it is not important which specic application (namely SAR image analysis) originally prompted this work, and our results are therefore of interest to a wide audience. Throughout this paper we use Bayesian calculus, because it is the only fully consistent means of perfoming inferences from limited information [3, 4]. Bayesian calculus uses probabilities to encode information, and makes inferences by manipulating these probabilities. For clarity, we use physical arguments to justify the form of each probability that we introduce. One could also note the equivalence between inverse problems and inference problems as a justication for making Bayesian calculus the appropriate language for formulating and solving inverse problems. In Section II we summarise our Gaussian scattering model and our linear imaging model. We construct our models in terms of probabilities to facilitate the use of Bayes' theorem to solve the inverse problem of determining what caused a particular dataset. In Section III we present a complete and rigorous derivation of an iterative Bayesian super-resolution algorithm. This is an improved version of an algorithm that we described in [5, 6][7], and should be regarded our current denitive treatment of super-resolution. Our method is an application of the estimate-maximise (EM) method of solving maximum-likelihood problems [8, 9]. In Section IV we extend our iterative algorithm to account for uncertainties in the imaging system. The technique that we derive is a precise autofocus method, which eectively focuses on structure in the super-resolved image (rather than the original image). We also present a linearised version of the algorithm for use in simple cases. In Section V we present an explanatory numerical example to demonstrate how our autofocus/superresolution method might be applied in practice. Note that we do not attempt to construct a robust generalpurpose algorithm in this paper. Thus we use synthetic
2
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
data that is generated by a defocused `sinc' function, in which case the linearised version of autofocus/superresolution is sucient, provided that the lens is within O(depth of focus) of the correct focus. We demonstrate autofocus/super-resolution for both the single and double point target cases. In Appendix A we gather together various denitions and derivations that would otherwise distract the ow of the argument in the main body of the paper. II.
THE MODEL
In this section we summarise our coherent scattering and imaging model. We attempt to formulate our Bayesian model by appealing to physical reasoning, wherever possible. In Figure 1 we represent as a network the various stages of image formation.
This has the form of a Markov tree, where each argument depends directly on only a limited number of other arguments, and there are no circular dependencies. We frequently use analogous Markovian decompositions of parts of Figure 1 as intermediate steps in our derivations, so it is helpful to keep Figure 1 in mind when reading this paper. We have deliberately omitted the annotation from the left hand part of Figure 1. This indicates that, in general, we are ignorant of the physical origin of P (σ) and P (θ): they serve only as prior probabilities to encode our state of ignorance about the values that σ and θ might have. This is not a failing of the Bayesian approach, rather it is an honest expression of our ignorance about the ner details of the imaging model.
A.
Prior probability over cross sections
The upper left part of Figure 1 generates the prior probability over cross sections P (σ). The cross section σ is an idealised model of those properties of the illuminated object that aect the scattered eld f . This use of a cross section is phenomenological, because it is incapable of capturing the full range of properties of the process that generates the scattered eld. For completeness, we include the P (σ) term in our Bayesian derivations. However, the main goal of this paper is to demonstrate autofocus/super-resolution, which we manage to simulate in simple cases without introducing P (σ). In more complicated cases we might need to use explicit prior knowledge, such as Γ-distributed cross section models [10, 11], or Markov random eld cross section models [12].
Figure 1: Network decomposition of imaging.
The notation in Figure 1 is dened as σ f g θ P (σ) P (f |σ) P (g|f, θ) P (θ)
≡ scattering cross section ≡ scattered eld ≡ image data ≡ imaging parameters ≡ prior probability over cross sections ≡ scattering model ≡ imaging model ≡ prior probability over imaging parameters
(2.1) Note that we are somewhat cavalier in our choice of notation, because for instance P (σ) and P (θ) are dierent functions of their respective arguments, yet we use the notation P (.) for both. Thus the meaning of P (.) should be deduced from context. Figure 1 may be used to decompose the joint probability P (g, f, θ, σ) into a product of factors P (g, f, θ, σ) = P (g|f, θ)P (f |σ)P (σ)P (θ)
(2.2)
B.
Stochastic scattering model
The upper centre part of Figure 1 generates a scattered eld f according to a scattering model P (f |σ). We assume that each cross section element σi acts as if it produces a large number of scattered wavelets that combine coherently to produce an element of scattered eld fi , which leads to Gaussian statistics via the central limit theorem. Note that f is a near eld which retains essentially the same spatial structure as the scattering cross section itself, so it could be directly imaged without using a lens (in principle). For m cross section elements we obtain P (f |σ) = =
m Q
exp(−|fi |2 /σi ) πσi
i=1 1 det(πσ)
exp(−f † σ −1 f )
(2.3)
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm where we dene
III.
σ≡ f † σ −1 f ≡
σ1 0 0 · · · 0 σ2 0 · · · 0 0 σ3 · · ·
.. .
.. .
0
0
m X
.. . . . .
0 0 0 .. .
(2.4)
0 · · · σm
fi∗ σ −1
f ij j
(2.5)
i,j=1
In the rst line of Equation 2.3 we ignore any correlations that might exist between dierent components of f , whereas in the second line of Equation 2.3 we use a notation that allows us to include o-diagonal elements in σ to model such correlations, should we wish to do so. For generality, we assume that σ is matrix-valued throughout our derivations, unless we state otherwise. We use operator/state notation (as in f † σ −1 f ) throughout this paper, because it is both economical and it facilitates calculations that would be tedious if performed using explicit summations over indices. Note also that each component of the state vector f is a complex number, which explains the unusual normalisation of the Gaussian probability in Equation 2.3. C.
Stochastic imaging model and its parameters
The right hand part of Figure 1 depends on both the scattered eld f and on the imaging parameters θ. Dene an imaging model P (g|f, θ) as P (g|f, θ) =
P (θ) =
1 exp(−θ† Λ−1 θ) det(πΛ)
(2.7)
where Λ is a positive semi-denite covariance matrix. We choose P (θ) to be zero mean, because all Gaussian prior probabilities can easily be transformed into this form.
SUPER-RESOLUTION
In this section we assume that the imaging parameters θ are exactly known, and we derive an iterative `reestimation' scheme for computing the cross section σ0 that maximises the posterior probability P (σ|g). We call this `super-resolution' because σ0 can have a higher spatial resolution than the image data, although this eect is signicant only where there are small bright regions embedded in the image data [13]. A.
Posterior probability over cross sections
Suppose one asks the question: `What can I deduce about the cross section σ , given that I know the imaging system model (including all prior probabilities) and that I have available a dataset g ?'. Bayesian calculus says categorically: `The answer to your question is the posterior probability P (σ|g)' [3, 4]. Any reply that does not include enough information to construct P (σ|g) has not answered the stated question. P (σ|g) can therefore be used to deduce the answer to any other question that might have been asked. For instance, the σ that maximises P (σ|g) (let us call it σ0 ) is usually requested. If P (σ|g) has a single well-dened peak in σ , then σ0 can be used as a representative reconstruction of the cross section σ given the data g . In order to calculate P (σ|g), we must rst of all use Bayes' theorem to express P (σ|g) in terms of quantities that are dened in the imaging system model. Thus
1 † exp[−(g − T (θ)f ) N −1 (g − T (θ)f )] det(πN )
(2.6) where N is a positive semi-denite covariance matrix that is used to model additive Gaussian image data noise. In the limit where all the eigenvalues of N tend to zero this reduces to P (g|f, θ) = δ(g − T (θ)f ) (where δ(g − T (θ)f ) is the Dirac delta function), which describes the noiseless imaging equation g = T (θ)f . We use the parameter vector θ to parameterise variability in the imaging system, and we use P (θ) to model our prior knowledge of these parameters. In general, θ will be a low dimensional vector that describes those components of the imaging system that cannot be calibrated once and for all. We nd that we do not need to introduce P (θ) in order to demonstrate autofocus/super-resolution in simple cases, but we include it in our derivations, for completeness. In Section IV B we derive a linearised autofocus scheme which uses a Gaussian model for P (θ). Thus
3
P (σ|g) = =
P (g|σ)P (σ) dσ 0 P (g|σ 0 )P (σ 0 ) P (g|σ)P (σ) P (g) ´
(3.1)
which leads to σ0 =
arg max {log[P (g|σ)] + log[P (σ)]} σ
(3.2)
σ0 is a compromise between maximising P (g|σ) and maximising P (σ). The P (g|σ) term attempts to maximise the probability that the data g could derive from σ , whereas the P (σ) term ignores the data entirely and attempts to maximise the prior probability that σ could have occurred irrespective of the data. σ0 is the cross sec-
tion that best satises these conditions simultaneously. The information contained in σ0 is less than the information contained in P (σ|g) (except for the special case P (σ|g) = δ(σ−σ0 )). P (σ|g) contains everything that can be inferred from the data and the stated prior knowledge, whereas σ0 is merely the mode of P (σ|g). In [14] we presented a calculation of the derivatives of P (g|σ), which provided a mechanism for iteratively computing σ0 by a `gradient ascent' (i.e. innitesimal update step sizes) scheme. We now improve upon these results by deriving a `re-estimation' (i.e. nite update step sizes) scheme. This turns out to be very similar
4
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
to the empirical scheme that we suggested in [1], and, furthermore, it is very simple to relate this to the theory that we presented in [14]. B.
Lower bound on the posterior probability
We now maximise a quantity that is related to P (σ|g), but which is constructed in such a way that it is much easier to maximise yet has the same local maxima as P (σ|g). The method that we use is based on the estimatemaximise (EM) method of maximising likelihood functions [9]. As a preliminary step we shall transform our probabilities into log-probabilities, because this will make our subsequent derivations much easier to follow. Thus dene L1 (σ|g) as L1 (σ|g) ≡ log[P (σ|g)]
It is convenient to rewrite the nal inequality in Equation 3.4 in the form L1 (σ 0 |g) ≥ L2 (σ 0 , σ|g) + L1 (σ|g) ´ |σ 0 ) P (σ 0 ) L2 (σ 0 , σ|g) ≡ df P (f |g, σ) log[ PP(f (f |σ) ] + log[ P (σ) ]
(3.7) where the function L2 (σ 0 , σ|g) contains all the σ 0 dependence of the lower bound of L1 (σ 0 |g), and is constructed so that L2 (σ, σ|g) = 0. We summarise Equation 3.7 in Figure 2.
(3.3)
Now we shall derive an important inequality that provides a lower bound for L1 (σ|g): 0
1
0
)P (σ ) L1 (σ 0 |g) = log[ P (g|σ ] P (g) 2
´
df P (g|f )P (f |σ 0 )P (σ 0 )
= log[ ] P (g) ´ 3 P (g|f )P (f |σ 0 )P (σ 0 ) = log[ df P (f |g, σ) P (f |g,σ)P (g) ] (3.4) ´ 4 |σ 0 ) P (g|σ)P (σ 0 ) = log[ df P (f |g, σ) PP(f ] (f |σ) P (g) 5 ´ P (f |σ 0 ) P (g|σ)P (σ 0 ) ≥ df P (f |g, σ) log[ P (f |σ) ] P (g)
We have used the following manipulations in the various steps of this derivation Step 1. Use Bayes' theorem as formulated in Equation 3.1 to express L1 (σ 0 |g) in terms of quantities that are specied in the imaging system model. Step 2. Introduce the scattered eld f as intermediate variables, as in Equation A3 of Appendix A. (f |g,σ) Step 3. Introduce a factor of unity in the form PP (f |g,σ) . This tautology prepares the integrand for stage 5 of the manipulation. Step 4. Use the following P (f |g, σ) = =
P (g,f,σ) P (g,σ) P (g|f )P (f |σ) P (g|σ)
(3.5)
to rearrange the P (f1|g,σ) term. Step 5. Use Jensen's inequality for convex functions ˆ ˆ log[ dx u(x)v(x)] ≥ dx u(x) log[v(x)] ´
(3.6)
(where u(x) must satisfy dx u(x) = 1) to move the integral outside the logarithm. Equality holds if, and only if, P (f |σ 0 ) = P (f |σ) for all f . In our model, this requires σ = σ0 .
Figure 2: Jensen's inequality applied to the posterior probability.
Instead of maximising L1 (σ 0 |g) with respect to σ 0 , we now maximise L2 (σ 0 , σ|g) with respect to σ 0 (for some xed σ ): this will recover a greatest lower bound for the true maximum of L1 (σ 0 |g). C.
Iteratively maximising the lower bound
Before proceeding any further, we present an outline of our proposed algorithm for maximising the posterior probability P (σ|g). We do this to provide a concrete framework and motivation for the rather involved calculations that we perform later on. Maximising P (σ|g) is equivalent to maximising L1 (σ|g). In turn, maximising L1 (σ 0 |g) can be replaced by maximising L2 (σ 0 , σ|g), although this leads only to a greatest lower bound for L1 (σ 0 |g), as given in Equation 3.7. If this greatest lower bound process is iterated by replacing σ with the optimum value of σ 0 that was found during the previous iteration, then the greatest lower bound converges towards a local maximum of L1 (σ 0 |g). 1. In broad outline, our proposed algorithm for maximising L1 (σ 0 |g) is: 2. Initialisation step: Make an initial choice of σ . 3. Re-estimation step: Maximise L2 (σ 0 , σ|g) with respect to σ 0 . 4. Update step: σ −→ σ 0 .
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm 5. Iteration step: If the update in σ does not satisfy some convergence criterion, then go to step 2. σ is now close to a local maximum of L1 (σ|g). This does not guarantee that σ ≈ σ0 , which is the required global maximum. Jensen's inequality, together with our model, guarantees only that xed points of the re-estimation step are local maxima of L1 (σ|g). In Figure 3 we show three iterations of this algorithm.
choice is σ = σ1 , so L2 (σ 0 , σ1 |g) + L1 (σ1 |g) ≤ L1 (σ 0 |g) with equality at σ 0 = σ1 , as shown. σ −→ σ2 is then the outcome of the re-estimation step. Accordingly, we show σ2 as the position of the maximum of L2 (σ 0 , σ1 |g) in Figure 3. The remainder of Figure 3 depicts two further iterations, producing σ3 then σ4 , and making steady progress towards a maximum of L1 (σ|g). It is geometrically obvious how this algorithm is guaranteed to nd a local maximum of L1 (σ|g).
D.
Figure 3: Three iterations of the maximisation algorithm.
We represent L1 (σ|g) as a large inverted cup, and L2 (σ 0 , σ|g) + L1 (σ|g) as a small inverted cup. The initial
Greatest lower bound on the posterior probability
The algorithm in Section III C relies critically on the re-estimation step. We therefore derive the stationary point(s) of L2 (σ 0 , σ|g), as dened in Equation 3.7. From Equation 3.7 and Equation 2.3 we obtain
ˆ L2 (σ 0 , σ|g) =
df P (f |g, σ)
− log[det (πσ 0 )] − f † σ −1 f + log[P (σ 0 )] − [σ 0 −→ σ]
Neither the log[det(πσ 0 )] term nor the log[P (σ 0 )] term depend on f , so their f integrals can be discarded (using ´ df P (f |g, σ) = 1). However, the f † σ −1 f term requires more attention. P (f |g, σ) is given in Equation A5 of Appendix A, where
ˆ
†
df P (f |g, σ)f † σ 0−1 f = tr[σ 0−1 C] + f σ 0−1 f
(3.10)
†
To obtain this result we rewrite the (f − f ) σ 0−1 (f − † f ) term by using (f − f ) σ 0−1 (f − f ) = tr[σ 0−1 (f −
0
†
(see Equation A5 of Appendix A) to average under the trace. The term linear in f − f averages to zero, by symmetry. Finally, insert Equation 3.10 into Equation 3.8 to obtain
(3.11)
where we include only the data dependent part in 0
(g|σ ) (σ ) L2 (σ 0 , σ|g) = log[ PPeff ] + log[ PPeff ] eff (g|σ) eff (σ) †
(3.9)
f )(f − f ) ], and then using the covariance C of f − f
h i † L2 (σ 0 , σ|g) = − log[det(πσ 0 )] − tr[σ 0−1 C] − f σ 0−1 f + log[P (σ 0 )] − [σ 0 −→ σ]
which may also be written in the form
(3.8)
we see that it is a Gaussian with mean f . This makes the f † σ −1 f term of Equation 3.8 relatively easy to evaluate. To facilitate this integration we rearrange f † σ −1 f so that f appears only in the combination f − f :
h † i † † f † σ −1 f = f − f σ 0−1 (f − f ) + f σ 0−1 (f − f ) + CC + f σ 0−1 f
where `CC' denotes `complex conjugate'. This yields
5
Peff (g|σ 0 ) ∝ exp(−f σ 0−1 f ) P (σ 0 ) 0−1 Peff (σ 0 ) ∝ det(πσ C]) 0 ) exp(− tr[σ
(3.12)
6
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
Peff (g|σ 0 ). For a diagonal σ -matrix (as in Equation 2.4)
this reduces to
Peff (g|σ 0 ) ∝
m Q
2
exp(−
i=1
Peff (σ 0 ) ∝ P (σ 0 )
m Q i=1
|f i | σi
0
1 πσi 0
) exp(− σCiii0 )
(3.13)
We have now greatly simplied L2 (σ 0 , σ|g) (Equation 3.8) because we have succeeded in integrating out the intermediate variable f .
ability' reconstruction. Secondly, we input f to the bottom half of Figure 4, to construct an eective posterior probability (proportional to Peff (σ 0 )Peff (g|σ 0 )) whose logarithm is (up to an additive constant) L2 (σ 0 , σ|g) in Equation 3.11. It is important to note that the second stage of the above inference process does not construct a true Bayesian posterior probability, because σ 0 is merely a mathematical convenience, not a physical reality. We have written the second stage in the style of a posterior probability merely to aid its interpretation. The true Bayesian posterior probability is P (σ|g), which is maximised by an algorithm that consists of several iterations as described above. We now dierentiate each term in Equation 3.11 with respect to σ 0 . It turns out to be convenient to dierentiate with respect to the transpose of the inverse matrix −1 (˜ σ 0 ) . Dierentiating the log[det(πσ 0 )] term requires considerable care, because matrix-valued quantities do not necessarily commute, so we give a compact derivation in Appendix A. The other terms in Equation 3.11 pose no problem, and we obtain nally ∂ log[det(πσ 0 )] ∂(˜ σ 0 )−1 ∂ tr[σ 0−1 C] ∂(˜ σ 0 )−1 † ∂ f σ 0−1 f ∂(˜ σ 0 )−1
Figure 4: Network decomposition of reconstruction.
In Figure 4 we show the relationship between the various probabilities that we have introduced. Figure 4 contains two coupled inference processes. Firstly, the top half of Figure 4 constructs the posterior probability P (f |g, σ). This is a Gaussian probability with mean f that is normally used as the `maximum posterior prob-
= −σ 0 =C = ff
(3.14)
†
Combine Equation 3.14 with Equation 3.11 to obtain ∂
−1 L2 (σ ∂(˜ σ0 )
0
†
, σ|g) = σ 0 − f f − C +
∂ ∂(˜ σ0 )
−1
(3.15) This is the central result from which our super-resolution `re-estimation' algorithm can be deduced. We may express Equation 3.15 in a form that is appropriate for a diagonal σ -matrix. Thus
2 f + Cii − σi 0 ∂ ∂ 0 L2 (σ , σ|g) = i + log[P (σ 0 )] 0 ∂ log[σi ] σi 0 ∂ log[σi 0 ] h 2 D 2 Ei f i − f i + [σi − σi 0 ] ∂ ∂ 0 L (σ , σ|g) = + log[P (σ 0 )] 2 0 0 ∂ log[σi ] σi ∂ log[σi 0 ]
where we have used the results in Equation A1 and Equation A2. The appropriate update σ −→ σ 0 (to obtain the required greatest lower bound of L1 (σ 0 |g)) is to replace σ by the value of σ 0 that makes the right hand side of Equation 3.16 (and Equation 3.17) equal to zero. On the other hand, we may relate Equation 3.15 to the results of [14]. Thus we paraphrase equation (5.14)
log[P (σ 0 )]
(3.16) (3.17)
of [14] in the notation of this paper as follows ∂ L1 (σ|g) = ∂ log[σi ]
2 D 2 E f i − f i σi
+
∂ log[P (σ)] ∂ log[σi ]
(3.18) When we set σi 0 = σi in Equation 3.17 we recover Equation 3.18, as required. This demonstrates the consistency of the expressions for the gradient of P (σ|g) that are
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
tem) at the same time as we recover the cross section σ (i.e. `super-resolve' the image).
contained in this paper and in [14]. In a sense, [14] implements the bottom half of Figure 4 in a half-hearted fashion, by suggesting small `gradient ascent' style updates using Equation 3.18, rather than by making the better `re-estimation' style updates using the zero(s) of Equation 3.16 or Equation 3.17. IV.
A.
(σ0 θ0 ) =
0
arg max {log[P (g|σ, θ)] + log[P (σ)] + log[P (θ)]} σ, θ
(4.1) where we assume that the prior probabilities of σ and θ are independent. Equation 3.3 becomes
In this section we extend the results of Section III to account for uncertain imaging parameters θ. We derive an iterative scheme for computing (σ0 , θ0 ) that maximises the posterior probability P (σ, θ|g). We call this `autofocus/super-resolution' because we recover the imaging parameters θ (i.e. we `focus' the imaging sys-
0
Derivation of the re-estimation equation
Equation 3.2 becomes
SIMULTANEOUS SUPER-RESOLUTION AND AUTOFOCUSING
0
7
L1 (σ, θ|g) ≡ log[P (σ, θ|g)]
(4.2)
and the derivation in Equation 3.4 becomes
0
(σ )P (θ ) L1 (σ 0 , θ0 |g) = log[ P (g|σ ,θP)P ] (g) ´
df P (g|f,θ 0 )P (f |σ 0 )P (σ 0 )P (θ 0 )
= log[ ] P (g) ´ P (g|f,θ 0 )P (f |σ 0 )P (σ 0 )P (θ 0 ) = log[ df P (f |g, σ, θ) ] P (f |g,σ,θ)P (g) ´ P (f |σ 0 ) P (g|f,θ 0 ) P (g|σ,θ)P (σ 0 )P (θ 0 ) ] = log[ df P (f |g, σ, θ) P (f |σ) P (g|f,θ) P (g) ´ P (f |σ 0 ) P (g|f,θ 0 ) P (g|σ,θ)P (σ 0 )P (θ 0 ) ] ≥ df P (f |g, σ, θ) log[ P (f |σ) P (g|f,θ) P (g)
(4.3)
where we have manipulated the expressions in the same way as in Equation 3.4. Equation 3.7 becomes L1 (σ 0 , θ0 |g) ≥ ´L2,σ (σ 0 , σ, θ|g) +L2,θ (θ0 , σ, θ|g) + L1 (σ, θ|g) L2,σ (σ 0 , σ, θ|g) ≡ ´ df P (f |g, σ, θ) − log[det(πσ 0 )] − f† σ 0−1 f + log[P (σ 0 )] − [σ 0 −→ σ] L2,θ (θ0 , σ, θ|g) ≡ df P (f |g, σ, θ) f † T 0† N −1 g + CC − f † T 0† N −1 T 0 f + log[P (θ0 )] − [θ0 −→ θ]
which should be compared with the result in Equation 3.8. The most important property of Equation 4.4 is the separation of the σ 0 and the θ0 dependences. This implies that we can optimise σ 0 and the θ0 independently. We take advantage of this by interleaving separate σ 0 and θ0 update iterations in our implementation of autofocus/super-resolution in Section V. The σ 0 reestimation process is the same as in Section III, except
ˆ
that the imaging operator T (θ) is used with the current value of θ inserted. On the other hand, the θ0 reestimation process requires some further calculation in order to obtain the corresponding re-estimation equation. There are two terms in L2,θ (θ0 , σ, θ|g) that require attention. The f † T 0† N −1 T 0 f term can be obtained by a derivation analogous to Equation 3.9 and Equation 3.10 to yield
†
df P (f |g, σ, θ)f † T 0† N −1 T 0 f = tr[T 0† N −1 T 0 C] + f T 0† N −1 T 0 f
The f † T 0† N −1 g term simply yields ˆ
†
df P (f |g, σ, θ)f † T 0† N −1 g = f T 0† N −1 g
(4.6)
(4.4)
(4.5)
Finally, L2,θ (θ0 , σ, θ|g) may be expressed (together with L2,σ (σ 0 , σ, θ|g), for completeness) as
8
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
0 L2,σ (σ 0 , σ, θ|g) ≡ h− log[det(πσ 0 )] − tr[σ 0−1 C] − f † σ 0−1 fh + log[P (σ 0 )] − [σ i −→ σ] i † † L2,θ (θ0 , σ, θ|g) ≡ − tr[T 0† N −1 T 0 C] − f T 0† N −1 T 0 f + f T 0† N −1 g + CC + log[P (θ0 )] − [θ0 −→ θ]
We use the result for L2,θ (θ0 , σ, θ|g) to derive a reestimation equation for θ. B.
(4.7)
where T0 is the ideal imaging operator, and Ti , i = 1, 2, · · · , r is a set of operators that we use to model the variation in T (θ). Furthermore, we use an explicit Gaussian model for P (θ), as specied in Equation 2.7. With these substitutions, Equation 4.7 for L2,θ (θ0 , σ, θ|g) becomes
Greatest lower bound on the posterior probability
We now introduce a linearised model of T (θ) T (θ) = T0 +
r X
(4.8)
θi Ti
i=1
"
0
L2,θ (θ , σ, θ|g) = −tr
†
T0 +
r P
θ
0∗
i Ti
†
N
−1
T0 +
i=1
r P j=1
! # 0
θ j Tj !
C
r r P P † † 0∗ θ i Ti N −1 T0 + θ0 j Tj f − f T0 + i=1 j=1 r r P P † + f T0 † + θ0∗ i Ti † N −1 g + CC − log[det(πΛ)] − θ0∗ i θ0 j Λ−1 ij †
i=1
(4.9)
i,j=1
− [θ0 −→ θ]
Dierentiating this result to obtain
∂L2,θ (θ 0 ,σ,θ|g) ∂θ 0∗ i
yields
" ∂ 0 ∂θ 0∗ i L2,θ (θ , σ, θ|g)
†
= −tr Ti N −1
T0 +
†
r P
θ0 j Tj
which depends linearly on the quantity of interest θ0 . We may simplify Equation 4.10 by making the denitions †
Aij ≡ − tr[Ti † N −1 Tj C] − f Ti † N −1 Tj f − Λ−1 †
†
ij
bi ≡ tr[Ti † N −1 T0 C] + f Ti † N −1 T0 f − f Ti † N −1 g
(4.11) Note that the matrix A and the vector b depend only on the old parameter values (σ, θ). Finally, we obtain the re-estimation equation for θ by ∂L2,θ (θ 0 ,σ,θ|g) locating the stationary point = 0, which is ∂θ 0∗ i the solution of the linear matrix equation Aij θ0 j = bi
(4.12)
†
†
C − f Ti N −1
T0 +
r P j=1
θ0 j Λ−1
j=1
j=1
! #
j=1
+ f Ti † N −1 g −
r X
r P
! θ0 j Tj
f
(4.10)
ij
This is a remarkably simple re-estimation formula for the imaging system parameters θ, which depends on quantities that are easily computed. For completeness, we quote the analogous result for real-valued imaging parameters θ (which have a real symmetric covariance matrix Λ)
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
9
o n † Aij ≡ Re − tr[Ti † N −1 Tj C] − f Ti † N −1 Tj f − 12 Λ−1 ij n o † † bi ≡ Re tr[Ti † N −1 T0 C] + f Ti † N −1 T0 f − f Ti † N −1 g
In our numerical simulations in Section V we use a single real imaging parameter, so Equation 4.13 will be the appropriate re-estimation equation to use. In Section V we also explain an important modication to the true noise covariance N that we found to be necessary in order to obtain convergence in a reasonable number of θ re-estimation iterations. V.
AN EXPLANATORY NUMERICAL EXAMPLE
In this section we apply our super-resolution/autofocus re-estimation method to a variety of one dimensional images (e.g. time series). We wish only to demonstrate the principle of our method, so we provide an explanatory numerical example, rather than extensive numerical simulations and renement of algorithms. A detailed analysis of a low complexity algorithm for the super-resolution part of our theory can be found in [1]. A.
Imaging system - a defocused lens
For simplicity we consider the case of a scalar imaging parameter θ. In order to ensure that our simulations are relevant to a commonly encountered practical situation, we consider the problem of super-resolution in a defocused linear imaging system. Specically, we consider the case where a lens is defocused by O(depth of focus). We model a defocused lens as follows ´ +c 1 T (θ) = 2c dk exp(ikx + iθk 2 x2 ) ´−c +c 1 ≈ 2c −c dk exp(ikx)[1 + iθk 2 x2 = T0 + θT1
(5.1)
(4.13)
where we have dened
T0 ≡ sin(cx) cx −iT1 ≡ cx sin(cx) + 2 cos(cx) −
2 sin(cx) cx
The perturbation expansion in Equation 5.1 is appropriate when |θc2 x2 | ≤ 1, which ensures that the quadratic phase term exp(iθk2 x2 ) is small at the edges (k = ±c) of the aperture. Physically, this condition requires that the lens be within O(depth of focus) of perfect focus. For |x| < πc (i.e. within the main lobe) we therefore require |θ| < π12 ' 0.1. In our numerical simulations we use θ = 0.1 in order to defocus the lens by O(depth of focus). We sample the output space of the imaging system dened in Equation 5.1 and Equation 5.2 at intervals ∆x = 0.8 πc (i.e. 0.8 of the Nyquist length), and we restrict our attention to an output space containing just ve such samples. For simplicity, we do not use a continuous variable for position in input space, instead we merely sample it more nely than the output space. This strategy simplies the software, without losing the essential properties of super-resolution and autofocusing. For our purposes we choose to sample the input space at intervals ∆x = 0.4 πc (i.e. two samples per output sample), which we superimpose on the output sample positions (and the midpoints of the intersample intervals). Thus we have 9 input samples. In summary, our sampling lattices are (in Nyquist units)
input sample positions = {−1.6, −1.2, −0.8, −0.4, 0.0, 0.4, 0.8, 1.2, 1.6} output sample positions = {−1.6, −0.8, 0.0, 0.8, 1.6}
These sampling lattices are not very long, but they are sucient for us to demonstrate the properties of our reestimation method.
(5.2)
B.
(5.3)
One-target case
We now perform the simplest possible numerical simulation to demonstrate autofocus/super-resolution, so our numerical simulation should be regarded as an explanatory example, rather than a detailed practical implementation of our method. Furthermore, we omit the prior
10
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
probability terms P (σ) and P (θ), to perform a maximum likelihood (rather than maximum posterior probability) t to the data. We thus rob our method of some of its power in order to produce a simple and uncluttered demonstration of its eectiveness. It is easy to reintroduce the eect of P (σ) and P (θ) into the re-estimation equations when more complicated problems (which do require prior knowledge to resolve ambiguous interpretations of the data) need to be solved.
We use the following point target embedded in a weak surrounding σ = (1, 1, 1, 1, 1000, 1, 1, 1, 1)
which can produce a variety of scattered elds f distributed according to P (f |σ). The particular realisation that occurred in our simulation happened to be
Re(f ) = (−0.4812, 0.0352, −0.5281, −0.6611, 23.8763, −0.0822, 0.1760, 0.4499, 0.6768) Im(f ) = (−0.1526, −0.3091, 0.0117, 0.5673, −15.4640, 0.0822, 0.3951, 0.0117, 1.1462) 2 |f | = (0.2548, 0.0968, 0.2791, 0.7589, 809.2133, 0.0135, 0.1871, 0.2025, 1.7719)
where we use the notation |f |2 informally to denote the vector of modulus squared components of f . We initialise the defocus parameter to θ = 0.1, which is O(depth of focus) away from perfect focus, which yields an image Re(g) = (−10.0906, 2.8887, 23.3954, 5.8459, −9.4112) Im(g) = (−6.5631, −4.2681, −15.2647, −4.3982, −5.0146)
(5.6) We have not included the eects of noise yet. The norm of this image is |g|2 = 1119.0, and we add complex Gaussian noise to each image sample so that the 2 expected norm of the total noise is |g| 100 (i.e. the image signal to noise ratio (SNR) is 100, or equivalently 20dB). Note that we dene SNR as a ratio of total energies, because it is this quantity that determines the information content of the data (the conventional way of measuring SNR by using the peak amplitude is not information theoretically meaningful). A SNR of 20dB requires that the variance of each Gaussian noise variate (one for each real 1119.0 ). The and each imaginary part) is 1.1190 (= 2×5×100 noisy image so generated turned out to be
(5.4)
(5.5)
2. `Defocus' constrains θ = 0 (out of focus) and then recovers σ from the data. This demonstrates what super-resolution (without autofocus) would do if an inappropriate imaging operator were used. 3. `Focus' constrains θ = 0.1 (in focus) and then recovers σ from the data. This demonstrates what super-resolution (without autofocus) would ideally do.
By comparing the results of each of these types of super-resolution we may determine the eectiveness of autofocus/super-resolution. The re-estimation algorithm that we use in all of our experiments consists rstly of two iterations in which we re-estimate only σ by setting the gradient in Equation 3.16 to zero (and dropping the P (σ) term), followed by one iteration in which we re-estimate only the θ by using Equation 4.12 with the results in Equation 4.13 inserted (and dropping the Λ term). Note that because we have only one defocus parameter, Equation 4.12 reduces to a scalar equation. We repeat this prescription of two iterations of σ re-estimation followed by one iteration of θ Re(g) = (−10.6700, 3.2223, 22.3595, 5.7816, −7.9188) Im(g) = (−7.0723, −6.2404, −15.4110, −4.0880, −5.0673) re-estimation until the algorithm converges. Note that 2 any sequence of re-estimations is permitted within our |g| = (163.8664, 49.3264, 737.4455, 50.1384, 88.3842) theoretical model, and they are all guaranteed to con(5.7) 2 verge towards a local maximum of P (σ, θ|g). where we informally use the notation |g| to denote the For compactness we denote a σ re-estimation iteravector of modulus squared components of g . This is the tion as Σ, and a θ re-estimation iteration as Θ, so our image that we use to explore the basic properties of our algorithm consists of ΘΣ2 (reading from right to left) autofocus/super-resolution method. repeatedly applied to the data. We discovered this reIn each of our experiments we initialise the cross secestimation algorithm by experimentation: it is one of the tion to be σ = (1, 1, 1, 1, 1, 1, 1, 1, 1). This is a featureless few schemes that we discovered that seems to avoid the rst guess in which we do not even bother to initialise the problems of local maxima (and large at regions as well) overall normalisation correctly. We perform experiments of P (σ, θ|g). We do not fully understand the reason why under three dierent focusing conditions: autofocus, dethis particular algorithm works, and it is likely that it focus, and focus. will need to be extended in order to handle more complicated situations successfully. Note that any re-estimation 1. `Autofocus' initialises θ = 0 and then recovers both algorithm based on Equation 4.4 is guaranteed to make σ and θ from the data. This demonstrates the full steady progress towards a local maximum of P (σ, θ|g), power of our autofocus/super-resolution method.
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm because at each stage it locates a greatest lower bound for P (σ, θ|g). The problem of designing an algorithm that makes rapid progress towards the global maximum of P (σ, θ|g) is an extensive research topic in it own right, and it would require that the topography of P (σ, θ|g) be investigated in detail. To improve the rate of convergence we articially increase the noise N −→ 100N in Θ (i.e. we evaluate Equation 4.13 using an articially boosted noise level), because this permits much larger θ updates to occur. If we left the noise at its correct value, then both the old and the new values of θ would have to be consistent with the observed image to a high tolerance, which forces the θ updates to be minuscule. On the other hand, if we boost the noise level, then consistency with the image need only be true to a low tolerance, so large θ updates become possible. We do not bother to change this articial noise level as the algorithm converges. No doubt some further improvements in the rate of convergence could be won at the cost of making the iteration schedule more complicated. Note that this prescription is not part of our basic theoretical machinery, so it must be used with extreme care. For instance, it is important not to use an increased noise level in Σ, because this would destroy our ability to super-resolve.
11
Figure 6: Super-resolve a target (defocus).
Figure 7: Super-resolve a target (focus).
Figure 5: Super-resolve a target (autofocus).
We show the `autofocus' result in Figure 5, the `defocus' result in Figure 6, and the `focus' result in Figure 7. In each of these gures we present the nal result as a bold line, and some of the intermediate results (before convergence) as dotted lines. Reading from right to left, we obtain the `autofocus' result by applying Σ2 [ΘΣ2 ]4 , the `defocus' result by applying Σ10 , and the `focus' result by applying Σ5 . In Figure 5 the algorithm responds initially by tting a W-shaped σ to the data, because at this stage θ is incorrect. The algorithm then gradually corrects θ and simultaneously recovers the correct cross
section. In Figure 6 we show what would have happened had we not allowed θ to be adjusted as in Figure 5: the W-shaped σ is retained. It is one of the main aims of this paper to remove such artefacts. In Figure 7 we present the ideal result that would be obtained if somehow we knew in advance what θ was, and used it throughout the super-resolution iterations without change: the algorithm converges directly towards the correct solution. Dierences between Figure 5 and Figure 7 are largely because we have not run our algorithm all the way to convergence. Figure 5, Figure 6 and Figure 7 provide the simplest demonstration of the utility of our autofocus/super-resolution method. We commented in [2] that a scattered eld that consisted of a point target placed between a pair of appropriately chosen point targets could produce a correctly focused image that mimicked the image that we would obtain from a single point target observed through a defocused imaging system. The artefacts that we observe in Figure 6 are consistent with this. If we have available any prior knowl-
12
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
edge, we can introduce P (σ) and/or P (θ) terms into the re-estimation equations in order to reduce the eect of possibly ambiguous interpretations of the data. C.
vectors become
Two-target case
We now repeat the above experiment using two point targets embedded in a weak surrounding. The various
σ Re(f ) Im(f ) 2 |f | Re(g) Im(g) 2 |g|
= (1, 1, 1, 1000, 1, 1000, 1, 1, 1) = (−0.5751, −0.3091, −0.6768, −29.3196, 0.3482, −17.6908, −0.0430, 0.5125, 0.1682) = (0.0743, 0.5673, 0.4264, −1.1134, 0.1917, −7.2990, 0.3795, −0.3482, 0.3560) = (0.3362, 0.4173, 0.6399, 860.8815, 0.1580, 366.2381, 0.1459, 0.3839, 0.1550) = (4.0108, −21.2726, −35.2127, −9.4149, 0.5343) = (6.8472, 5.4006, −6.9683, 4.4031, 2.1164) = (62.9714, 481.6923, 1288.4935, 108.0277, 4.7648)
(5.8)
where we have quoted only the noisy version of the image (which has a SNR of 20dB as before). Note that the elds scattered by the two point targets are approximately in phase, and the targets are 0.8 Rayleigh resolution cells apart. As before we present the results that we obtain under three dierent focusing conditions. We show the `autofocus' result in Figure 8, the `defocus' result in Figure 9, and the `focus' result in Figure 10. Reading from right to left, we obtain the `autofocus' result by applying 4 Σ2 [ΘΣ2 ] , the `defocus' result by applying Σ10 , and the `focus' result by applying Σ10 .
Figure 9: Super-resolve two targets (defocus, in-phase).
Figure 8: Super-resolve two targets (autofocus, in-phase).
In Figure 8 the algorithm responds initially by constructing a σ that has a featureless bump in the centre. This gradually evolves to a bimodal form, which
then converges towards the correct σ . In Figure 9 we do not update θ, and the result is a featureless bump. In Figure 10, we apply the correct value of θ throughout the super-resolution iterations, and the algorithm converges directly towards the correct solution. Dierences between Figure 8 and Figure 10 are largely because we have not run the algorithm all the way to convergence. These results demonstrate very clearly how autofocus/super-resolution (Figure 8) can produce superior results to pure super-resolution (Figure 9).
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
13
g : we call this `super-resolution'. Secondly, we have ex-
Figure 10: Super-resolve two targets (focus, in-phase). D.
Discussion
The results of our numerical experiments show that our autofocus/super-resolution method performs as expected. In simple cases, we have demonstrated successful super-resolution when the lens is defocused by O(depth of focus). Our numerical experiments were successful because point targets have images with a lot of contrast, which are easy to autofocus. However, we anticipate that our method will not work in all circumstances. For instance, images containing clutter usually have a lower contrast than a point target image, and might therefore have several alternative feasible interpretations at dierent setting of the defocus parameter. The same problem would be encountered in any other method that had available the same information, because ambiguities can be resolved only by supplying appropriate additional information. In [2] we discussed some examples of target congurations that conspire to fool an autofocus algorithm in this way. The imaging parameter re-estimation method in Equation 4.12 caters for a vector of parameters, so we could extend the model in Equation 5.1 to include the eect of other terms if we wished. We did not introduce prior knowledge terms P (σ) or P (θ) into our numerical simulations. Such terms can only improve the performance of our method, provided that the prior knowledge is correct. VI.
CONCLUSIONS
Firstly, we have shown how to use Jensen's inequality (in the form of an estimate-maximise (EM) algorithm) to maximise the posterior probability P (σ|g) for recovering a scattering cross section σ from a coherent image
tended this result to maximise the posterior probability P (σ, θ|g) for simultaneously recovering both the imaging system parameters θ and the cross section σ : we call this `autofocus/super-resolution'. The rst result proves the validity of our heuristic iterative super-resolution algorithm [1]. The second result extends this algorithm to compensate for the eects of an uncertain imaging system. Although our results apply to coherent imaging systems in general, they have the potential to super-resolve synthetic aperture radar (SAR) images. SAR can be modelled as a linear imaging system, and anomalous motion of the transmitter/receiver can be modelled (in simple cases) as a simple defocusing of the type exemplied in Section V. There is a contrast maximisation autofocus method [15, 16] that corrects the focus of a SAR to within O(depth of focus) of the correct value, but this is inadequate for successful super-resolution to be attempted [2]. Our current autofocus/super-resolution method shows potential for correcting these residual focusing uncertainties, and thus oers the possibility of a robust super-resolution algorithm for SAR image analysis. VII.
ACKNOWLEDGEMENTS
I thank J S Bridle for explaining to me the advantages that `re-estimation' methods have over `gradient ascent' methods. Appendix A: MISCELLANEOUS DERIVATIONS
First of all let us make some denitions, whose purpose will become clear later on M ≡ T σT † + N C −1 ≡ σ −1 + T † N −1 T f ≡ σT † M −1 g = CT † N −1 g
(A1)
M −1 = N −1 − N −1 T CT † N −1 † −1 D †C E = σ − σT M T σ ff = σT † M −1 T σ
(A2)
whence
where h· · · i denotes an average over the g as described by the model probabilities (which is not necessarily the same as the average over observed g , because one's model can be, and usually is, wrong).
Deriving P (g|σ)
We derive the probability P (g|σ) that a cross section σ can give rise to dataset g . Potentially, there are two types of hidden variable to consider: the scattered eld f , and the imaging parameters θ, which we showed in
14
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm
Figure 1. Let us assume that θ is known, so that it can be removed from the problem, to yield ˆ
(A3)
df P (g|f )P (f |σ)
P (g|σ) =
f integrated out. We obtain P (f |σ) from Equation 2.3 and P (g|f ) from Equation 2.6 (with θ removed). The
integrand of Equation A3 is Gaussian, so it is easy to perform the integral. The steps are as follows:
Equation A3 represents the top half of Figure 1 with
P (g|σ) = = = =
´ † 1 df exp[−f σ −1 f − (g − T f ) N −1 (g − T f )] det(πN ) det(πσ) ´ † 1 df exp[− f − f C −1 (f − f ) + g † (−N −1 + det(πN ) det(πσ) det(πC) † −1 + N −1 − M −1 )g] det(πN ) det(πσ) exp[g (−N 1 † −1 g) det(πM ) exp(−g M
In the last step we simplify the normalisation factor by noting (a posteriori ) that the result must be a correctly normalised probability. P (g|σ) is a zero mean complex Gaussian probability with covariance M . The form of M given in Equation A1 is very simple: the T σT † piece is the linearly ltered scattered eld covariance matrix σ , and the N piece is the data noise covariance. These
P (f |g, σ) = = = =
P (g|f )P (f |σ) P (g|σ) det(πM ) det(πN ) det(πσ) det(πM ) det(πN ) det(πσ) 1 det(πC) exp[−
N −1 T CT † N −1 )g]
(A4)
terms sum together because the data noise is statistically independent of the scattered eld.
Deriving P (f |g, σ)
We derive the posterior probability P (f |g, σ) that a scattered eld f can occur, given that both the scattering cross section σ and the data g are known. The steps in the derivation are
†
exp[−f σ −1 f − (g − T f ) N −1 (g − T f ) + g † M −1 g] † exp[− f − f C −1 (f − f ) + g † (M −1 − N −1 + N −1 T CT † N −1 )g] † f − f C −1 (f − f )]
In the last step we simplify the normalisation factor by noting (a posteriori ) that the result must be a correctly normalised probability. P (f |g, σ) is a Gaussian with mean f and covariance C .
Dierentiating log det
(A5)
We dierentiate the logarithm of the determinant of a matrix-valued quantity. We use this in order to dierentiate Equation 3.11, so we present the derivation using an appropriate notation.
1
δ log[det(πσ 0 )] = − log[det(π(σ 0−1 + δσ 0−1 ))] + log[det(πσ 0−1 )] 2 = − tr[log(σ 0−1 (1 + σ 0 δσ 0−1 ))] + tr[log(σ 0−1 )] 3 = − tr[log(1 + σ 0 δσ 0−1 )]
(A6)
4
' − tr[σ 0 δσ 0−1 ]
We justify the various stages of this manipulation as follows: Step 1. Matrix invert σ 0 , which introduces a minus sign
outside the logarithm function. In order to calculate the derivative, write the dierence that results from changing σ 0−1 innitesimally.
A Bayesian Derivation of an Iterative Autofocus/Super-Resolution Algorithm Step 2. Use the identity log[det(X)] = tr[log(X)]. Step 3. Use the identity log(XY ) = log(X) + log(Y )+(commutator terms from the Baker-Hausdor identity) to obtain tr[log(XY )] = tr[log(X)] + tr[log(Y )] which causes a pair of terms to cancel, leaving only the
[1] L. M. Delves, G. Pryde and S. P. Luttrell (1988). A superresolution algorithm for SAR images. Inverse Probl., 4(3), 681-703. [2] S. P. Luttrell (1985). A super-resolution model for synthetic aperture radar. RSRE, Malvern. Technical Report 3785. [3] H. Jereys (1939). Theory of Probability (Clarendon Press, Oxford). [4] R. T. Cox (1946). Probability, frequency and reasonable expectation. Am. J. Phys., 14(1), 1-13. [5] S. P. Luttrell (1985). Prior knowledge and object reconstruction using the best linear estimate technique. Opt. Acta, 32(6), 703-716. [6] S. P. Luttrell and C. J. Oliver (1986). Prior knowledge in synthetic aperture radar processing. J. Phys. D: Appl. Phys., 19(3), 333-356. [7] S. P. Luttrell (1991). The theory of Bayesian superresolution of coherent images: a review. Int. J. Remote Sens., 12(2), 303-314. [8] L. E. Baum (1972). An inequality and associated maximisation technique in statistical estimation for probabilistic functions of Markov processes. Inequalities, 3(1), 1-8. [9] A. P. Dempster, N. M. Laird and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B, 39(1), 1-38. [10] E. Jakeman and P. N. Pusey (1976). A model for nonRayleigh sea echo. IEEE Trans. Antenn. Propag., 24(6), 806-814. [11] K. D. Ward (1981). Compound representation of high resolution sea clutter. Electron. Lett., 17(16), 561-565. [12] S. Geman and D. Geman (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal., 6(6), 721-741. [13] S. P. Luttrell (1987). The relationship between superresolution and phase imaging of SAR data. RSRE, Malvern. Technical Report BS1/41. [14] S. P. Luttrell (1989). The inverse cross section problem for complex data. Inverse Probl., 5(1), 35-50. [15] I. P. Finley and J. W. Wood (1985). An investigation of synthetic aperture radar autofocus. RSRE, Malvern. Technical Report 3790. [16] C. J. Oliver (1989). Synthetic aperture radar imaging. J. Phys. D: Appl. Phys., 22(7), 871-890.
15
innitesimal part. Note that the trace of any commutator is zero. Step 4. Expand the logarithm using log(1 + X) =
X + O(X 2 ).