Local regularization and Bayesian hypermodels

0 downloads 0 Views 169KB Size Report
Before accepting the computed solution of this linear as the next base point for the quadratic model, we check if the new approximate solution is an improvement ...
Local regularization and Bayesian hypermodels Daniela Calvettia and Erkki Somersalob a Department

of Mathematics and Center for Modeling Integrated Metabolic Systems, Case Western Reserve University Cleveland, OH, USA; b Department of Mathematics, Helsinki University of Technology, Helsinki, Finland ABSTRACT

In this paper, we restore a one-dimensional signal that a priori is known to be a smooth function with a few jump discontinuities from a blurred, noisy specimen signal using a local regularization scheme derived in a Bayesian statistical inversion framework. The proposed method is computationally effective and reproduces well the jump discontinuities, thus is an alternative to using a total variation (TV) penalty as a regularizing functional. Our approach avoids the non-differentiability problems encountered in TV methods and is completely data driven in the sense that the parameter selection is done automatically and requires no user intervention. A computed example illustrating the performance of the method when applied to the solution of a deconvolution problem is also presented.

1. INTRODUCTION In this article we consider a linear discrete ill-posed problem of the form Af = y

(1)

where the matrix A is of ill-determined rank, the right-hand side y is contaminated by additive Gaussian noise and the desired solution is known to be smooth except at jump discontinuities. Since the matrix A is ill-conditioned and the right-hand side is contaminated by noise, the solution of (1) requires some form of regularization. A popular regularization method for problems for which some a priori knowledge about the solution is available is Tikhonov regularization, with a regularizing operator designed on the prior information. Since in our application the solution is expected to have a few jump discontinuities, a natural choice7 for the Tikhonov functional would be the Total Variation (TV),4 Z T TV(f ) = |f 0 (t)|. 0

The numerical implementation of total variation regularization can be challenging.3, 9 The main problem is that, since the corresponding objective function is non-differentiable, it also requires some regularization when using gradient-based optimization methods to compute the Tikhonov solution. Therefore the regularization of the TV functional itself depends on a parameter whose selection is not straightforward, and neither is the selection of the Tikhonov regularization parameter with this regularization functional. For this reasons, no algorithm for TV based regularization with automatic parameter selection has been yet proposed in the literature. The regularizing functional proposed in this paper is similar, in principle, to the TV functional in the sense that it does not penalize the solution for jumping, but has the advantage of not depending on any user supplied parameters. Since the motivation and derivation of the method have their roots in Bayesian inversion and hierarchical prior models, we will start with a quick review of their foundations up and fundamental results. Assume that the signal of interest is supported over the interval [0, T ] and vanishes at the endpoints. Subdivide [0, T ] into n + 1 equal subintervals, denote by τj the corresponding grid points τj = jh,

h=

T , n+1

E-mail: [email protected], [email protected]

0≤j ≤n+1

(2)

and represent the signal as a discretized piecewise linear function, fi = f (τi ) =

n X

αj φj (τi ),

(3)

j=1

where φj (t) is a piecewise linear hat function such that φj (τk ) = δjk . Observe αj = f (τj ) and f (τ0 ) = f (τn+1 ) = 0. The vector α = [α1 , . . . , αn ]T parameterizes f . We remark that the number of basis functions in the representation of f determines the number of unknown parameters and how accurately jump discontinuities can be resolved. The dimension of the basis should be chosen upon considering how accurately the jumps should be resolved: if the basis is too small, the inability of the basis functions to appropriately represent the jumps will give rise to Gibbs-like ringing, as shown in the computed example. The forward model is a blurring by a smoothing kernel a(t, t0 ). Assume that the blurred and noisy signal is measured at M equidistant points. For convenience of the derivation to follow, let tj denote the grid points obtained by subdividing [0, T ] into M − 1 equidistant subintervals. The discretized forward model (1) is obtained by discretizing the integral, Z

M

T

a(tj , t)f (t)dt ≈

yj = 0

T X a(tj , tk )f (tk ), M −1

1 ≤ j ≤ M,

k=1

i.e., the matrix A is a square matrix. Using the representation (3), we can then write our problem in the form AΦ(α) = y, Pn where Φ(α) is the vector with entries Φ(α)k = f (tk ) = j=1 αj φj (tk ).

2. STATISTICAL INVERSION AND HIERARCHICAL PRIOR MODEL In the Bayesian inversion framework, the solution of the inverse problem (1) is viewed as a statistical problem of inference. To begin with, all variables and unknown parameters are consider as random variables, where the randomness is a way of reflecting our lack of information about their values. Thus, instead of seeking a single vectors of values for the unknowns, in the Bayesian setting the solution of an inverse problem amounts to gaining information about probability densities of the variables of interest. Consider a general observation model with additive noise, y = Ψ(α) + e, (4) where y ∈ RM is the observed variable, α ∈ RN is the variable of primary interest that cannot be accessed directly, Ψ is the forward model, and e ∈ RM is the additive measurement noise. In our case Ψ(α) = AΦ(α). The inverse problem is to estimate α from the observation y. Assume that the noise e in y is a random variable that is stochastically independent of the parameter α, and denote its probability density by πnoise (e). Thus, the integral of πnoise (e) over a set B ⊂ RM gives the probability for the event e ∈ B. Clearly, if α is fixed, according to the observation model (4), the variable y must have the same distribution as that of noise e shifted around the point Φ(α), i.e., π(y | α) = πnoise (e),

e = y − Φ(α).

(5)

This conditional probability density is called the likelihood. We emphasize that the likelihood plays a key role on the derivation which follows, thus reasserting the importance of knowing the type of noise in the problem under consideration. Assume further that, before measuring y, we have some information concerning the distribution of the variable α, expressed in terms of the prior probability density. It is quite common in applications that only prior information of qualitative nature is available. This may lead to a prior probability density which depends, in turn, on additional unknown parameters. We denote the conditional prior probability density by πpr (α | λ), where λ ∈ RL is the new prior parameter, whose determination is also part of the estimation problem. After

encoding any available information about λ in the hyperprior density, πh (λ), the joint probability density of all variables becomes π(y, α, λ) = π(y | α)πpr (α | λ)πh (λ). (6) The task of the statistical inversion is to estimate the probability density of the unknown variables α and λ based on the data y, known as the posterior probability. The connection between the likelihood, prior and hyperprior is given by Bayes’ formula, π(y, α, λ) ∝ π(y, α, λ). (7) π(α, λ | y) = π(y) with y = yobserved . From the posterior we can calculate estimates for the unknowns. In the present paper we are interested in the Maximum A Posteriori (MAP) estimate, defined as  [αMAP , λMAP ] = argmax π(α, λ | y) = argmin − log π(α, λ | y) , (8) provided that such maximizer exists.

3. LOCAL REGULARIZATION FOR DECONVOLUTION We now ready to apply the general statistical inversion framework introduced in the previous section to the deconvolution problem under consideration. From the assumption that the additive noise in the data is Gaussian, mutually independent at different time instances tj , with zero mean and variance σj2 it follows that the probability density of the noise is ! 2 ! M M 1 X yi − [AΦ(α)]i 1 X e2i = exp − . (9) πnoise (e) ∝ exp − 2 i=1 σi2 2 i=1 σi2 Thus, from (5) and (9), the likelihood is    1 π(y | α) ∝ exp − kS y − AΦ(α) k2 , 2

(10)

where   S = diag 1/σ1 , . . . , 1/σM . We now describe how to construct a prior for α which corresponds to our a priori information. Since we expect the solution to be smooth except at some isolated locations where it has jump discontinuities, we introduce the integral Z T  2 I= w(t) f 0 (t) dt, (11) 0

where w : [0, T ] → R+ is a suitably chosen weight function which is small in correspondence of the jumps and larger elsewhere. The integral (11) can be approximated by the Riemann sum n+1 X



f (τj ) − f (τj−1 ) I≈h w(sj ) h j=1 where sj denote the midpoint of the jth subinterval [τj−1 , τj ],  1  −1 1  .. 1   . √ L=  h . ..  1 −1

2

= kDLαk2 ,

(12)

1 ≤ j ≤ n + 1, L is the finite difference matrix      ∈ R(n+1)×n ,   

and D is the diagonal matrix D = diag(λ) ∈ R(n+1)×(n+1) ,

  λ = λ1 , . . . , λn+1 ,

λj =

q w(sj ).

To keep the value of (12) small f 0 must be small, or the weight function must be small where f 0 is large, i.e., in proximity of the jumps. This suggests that we can choose the dynamic smoothness prior of α conditioned on λ to be      1 1 (13) πpr (α | λ) = C(λ) exp − kDLαk2 = exp − kDLαk2 + log C(λ) , 2 2 where C(λ) is a norming constant. Although in general the norming constant of the prior plays no role and can be ignored in the computation of the MAP estimate, here it cannot be ignored because is a function of the unknown weights λj . It has been shown1 by an induction argument that  C(λ) =

n+1 · · · λ2n+1 X (2π)n h j=1

λ2  1

1/2 1 λ2j

.

Thus the prior probability density of α, conditioned on the hyperparameter λ becomes    n+1 n+1 X X 1 1 1  πpr (α | λ) ∝ exp − kDLαk2 + . log λj + log  2 2 2 λ j=1 j=1 j

(14)

(15)

Since we only assume that the weights λ are nonnegative,  πh (λ) = π+ (λ) =

1 λ≥0 . 0 λ

Suggest Documents