EM-based sparse channel estimation in OFDM systems - IEEE Xplore

0 downloads 0 Views 149KB Size Report
EM-BASED SPARSE CHANNEL ESTIMATION IN OFDM SYSTEMS. Rodrigo Carvajal, Boris I. Godoy, Juan C. Agüero and Graham C. Goodwin. Centre for ...
2012 IEEE 13th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

EM-BASED SPARSE CHANNEL ESTIMATION IN OFDM SYSTEMS Rodrigo Carvajal, Boris I. Godoy, Juan C. Ag¨uero and Graham C. Goodwin Centre for Complex Dynamic Systems and Control, The University of Newcastle, Australia Email addresses: [email protected], {boris.godoy,juan.aguero,graham.goodwin}@newcastle.edu.au ABSTRACT In this paper, we address the problem of estimating sparse communication channels in OFDM systems. We consider the case where carrier frequency offset is present. The problem of estimation is then approached by maximizing a regularized (modified) likelihood function. This regularized likelihood function includes a new term accounting for the a priori probability density function for the parameters, represented by a Gaussian mean-variance mixture. The maximization of the regularized likelihood function is carried out by using the Expectation-Maximization (EM) algorithm. We show that the E-step in the proposed algorithm has a closed-form solution, and in the M-step, the cost function is concentrated in one variable (carrier frequency offset). 1. INTRODUCTION Multicarrier communications have become the core techniques used in many communication systems and standards, such as 3GPP, Long-term evolution (LTE) radio access network [1], digital subscriber lines, European digital audio and digital video broadcasting (DAV/DVB), IEEE802.11a wireless local area network (WLAN), among others (see e.g. [2]). This high utilization can be explained by the many desirable properties that multicarrier communications exhibit, including relative low complexity, robustness against fading caused by multipath propagation and bandwidth efficiency. These characteristics are attained by keeping the subcarriers orthogonal to each other. This orthogonality, however, is broken by distortions (perturbations) of the phase of the subcarriers. The distortions include the occurrence of phase noise (PHN) and carrier frequency offset (CFO), to which multicarrier systems are sensitive. In practice, this lack of orthogonality generates intercarrier interference (ICI) [2, 3], which degrades overall system performance. Sparse channel estimation is an important topic found in many applications such as high definition television (HDTV), communications near a hilly terrain, and underwater acoustic communication near the surf zone (see e.g. [4, 5, 6, 7] ). The problem found in sparse channel estimation is that the This work was partially supported by CONICYT - Chile through grant ACT- 053. Boris I. Godoy acknowledges support given by CONICYT - Chile through its Postdoctoral Fellowship Program 2011.

978-1-4673-0971-4/12/$31.00 © 2012 IEEE

530

traditional techniques used for estimation problems, such as least squares, result in poor estimates due to the lack of prior knowledge of the structure of the channel [8]. In this paper, we consider CFO estimation, and sparse channel impulse response (CIR) estimation based on an 1 norm constraint. Our work generalizes a previous work on joint CFO and CIR estimation, see [11]. Sparsity can be promoted in different ways. For example, in [9], sparsity is promoted by generating a pool of possible models, and then performing model selection. Our approach to the estimation problem of sparse channels in OFDM systems is to immerse it in the general framework of maximum likelihood (ML) estimation. In addition, we use regularization of the likelihood function to account for the a priori probability density function of the parameters. This regularized likelihood function is then iteratively maximized using the EM algorithm. 2. CHANNEL MODELS IN OFDM SYSTEMS In OFDM systems, the channel is typically modelled as a finite impulse response (FIR) filter h = [h0 h1 . . . hL−1 ]T ∈ CL with L taps [10]. Carrier frequency offset (CFO)  canbe modelled as a diagonal matrix: Cε = exp{jdiag 2πεk NC }, with k = 0, 1, . . . , NC − 1. Here, ε is the normalized frequency offset (|ε| ≤ 1/2 ) [11]. At the receiver, the cyclic prefix (CP) is removed. Consequently, the received signal can be expressed as: ˜ + η, r = Cε Hx

(1)

˜ is a (NC × NC ) circulant matrix where the channel matrix H whose first column is given by [h0 h1 · · · hL−1 0 . . . 0]T (see [10]), x is the transmitted signal (after the inverse discrete Fourier transform - IDFT) and η ∼ N (0, ση2 INC ) is additive white Gaussian noise (AWGN). 2.1. General State-Space Model In recent years, there has been interest in exploiting the complex representation of signals in signal processing (see e.g., [12]). Complex signals can be classified as proper or improper. Properness accounts for the statistical relationship

between the real and imaginary part of the signal. If the autocorrelation matrix of the real part and the autocorrelation matrix of the imaginary part are equal, and the pseudocovariance matrix is zero, then the signal is said to be proper [12, 13]. In general, the (statistical) treatment of (complex) proper signals does not depart much from that of real signals. Also, most of the modulation schemes used in wireless communications yield signals having the proper characteristic. However, for example, binary phase shift keying (BPSK) and Gaussian minimum shift keying (GMSK) signals are improper (see e.g. [14]), and special attention needs to be paid to the pseudocovariance matrix in the development of techniques and algorithms. This motivates our real variable representation of the complex multicarrier signals, which extends our results to all modulation scheme applied in OFDM systems. Many communication systems involve training sequences. These sequences, in general, allow one to obtain good CIR estimates. Then, when expressing the transmitted signal1 in terms of its real part, xö , and its imaginary part, xI , we also need to express it in terms of the known (training) component, x(T) , and the unknown component, x(U) . Thus, we have T

T

T

T

¯ x = [x(T) x(U) x(T) x(U) ]T ∈ R2NC , ö ö I I

(2)

where (·)ö , (·)I , (·)(T) and (·)(U) represent the real part, imaginary part, training part, and unknown part of x, respectively. When the transmitted signal is partially known, the stochastic part of ¯ x, ¯ x(U) , can be regarded as a constant Gaussian linear state. In this case, a state-space representation of the model in (1) can be expressed as2 : (U) (U) ¯ xk+1 = ¯xk ,     a −b Re {ηk } ¯ xk + , (3) yk = b a Im {ηk } where yk = [Re {rk } Im {rk }]T , k = 0, 1, ..., NC − 1 is the time sample index of the OFDM symbol, Re {·} and Im {·} denote the real and imaginary parts, respectively,     ˜ − (sin ψk )eTk+1 Im H ˜ (4) a =(cos ψk )eTk+1 Re H     T T ˜ ˜ b =(sin ψk )ek+1 Re H + (cos ψk )ek+1 Im H (5) ψk =

2πkε , NC

(6)

and ek is the kth column of the identity matrix. This statespace representation is equivalent to (1), but it is convenient for the identification approach used in this paper. 2.2. Statistical properties of multicarrier signals For the unknown transmitted signal, the probability density function (pdf) is given by p(¯ x(U) ) =

1 (2π)

NC

|Σx¯(U) |

1/2

e

−0.5(¯ x(U) )T Σ−1 (¯ x(U) ) (U) x ¯

,

(7)

1 In this paper, the “transmitted” signal corresponds to the time domain multiplexation of a training sequence and data coming from a data terminal equipment, after the application of the IDFT. 2 The algorithm we propose in this paper is non-restrictive for multicarrier systems. However, in general, PHN and CFO are not considered in single carrier systems.

531

where



Σx¯(U)

Σx(U) ö = Σx(U) x(U) I

ö

 Σx(U) x(U) ö I , Σx(U)

(8)

I

and Σx(U) , Σx(U) , are the (known) covariance of the real and ö I imaginary part of x¯(U) , respectively. Σx(U) x(U) is the cross(U)

(U)

I

ö

correlation matrix of xI and xö . The conditional pdf of y = [yT0 , . . . , yTNC −1 ]T is, then, given by: T −1 1 e−0.5(y−M¯x) Σy (y−M¯x) , (9) p(y | ¯x, θ) = 1/2 (2π)Nc |Σy | where Σy = 0.5ση2 I2NC is the received covariance matrix, and I2NC is the identity matrix of dimension 2NC ,   A −B M= , Ψ = diag(ε), (10) B A     ˜ − (sin Ψ)Im H ˜ , A =(cos Ψ)Re H (11)     ˜ + (cos Ψ)Im H ˜ B =(sin Ψ)Re H (12) where ε = [0 2πε NC

4πε NC

. . . 2π(NNCC−1)ε ].

3. REGULARIZED ML ESTIMATION IN OFDM SYSTEMS If regularization in ML estimation refers to the inclusion of an extra term that accounts for statistical knowledge of the parameters, then this extra term is regarded as an a priori distribution for the parameters θ. Hence, instead of solving an ML estimation problem, we solve a maximum a posteriori (MAP) problem. In general, with an a priori distribution for the parameters, the associated MAP problem is given by: ˆ = arg max p(y|θ)p(θ), θ θ

= arg max log p(y|θ) + log p(θ), θ

(13) (14)

where p(θ) is the a priori distribution of θ. The second term, log p(θ), on the right hand side of (14) can be expressed as a function of θ (or equivalently of the “individual” terms of θ, θj , j = 1, 2, ...) as (see e.g. [15])

p  θj log p(θ) = g , (15) τ sj j=1 where g(·) is a function specifying the log-prior, p is the number of elements of the vector parameter θ, θj is the jth element of θ (j = 1, ..., p), τ is a factor controlling the strength of the regularization, and the sj are fixed scale factors. For instance, in Ridge regressions (assuming sj = 1, j = 1, ..., p), the function g corresponds to g(θj /τ ) = (θj /τ )2 , and for Lasso, g(θj /τ ) = |θj /τ |. Other commonly used regularizations are shown in Table 1 with the corresponding function g(·) [15]. For sparse representations, one of the most commonly used approaches is the Lasso algorithm ([16]), where an 1 norm regularization is used to obtain estimates with coefficients that are exactly zero. This procedure can be cast into

the framework of MAP estimation. However, the Lasso algorithm is not applicable to the system under study, since it is applicable to linear systems only (unlike the 1 -norm regularization, which is applicable to any system). This motivates our study of sparse identification for the non-linear system in (3). In our particular case, we need to maximize (14), where p(y|θ) is given by p(y | θ) = p(y | ¯ x(U) , θ)p(¯ x(U) )d¯ x(U) , (16) ¯(U) , θ) are given in (7) and (9), rewhere p(¯ x(U) ) and p(y | x spectively, and θ = [h, ε]. For the sake of simplicity, we have assumed that the variance of the measurement noise ση2 is known. However, the algorithm can straightforward be extended to include the estimation of ση2 . 4. EM-BASED ESTIMATION The MAP estimation problem in (14) can be solved by using the Expectation-Maximization (EM) algorithm [17]. ˆi, The EM algorithm generates a succession of estimates θ i = 1, 2, . . . , of the parameters θ, alternating between an expectation step (E-step) and a maximization step (M-step). This succession is known to converge to a local maximum of the cost function (14) [17]. The E-step corresponds to the computation of the joint likelihood function using the conditional density of the hidden ˆ i . Thus, we variables based on a given parameter estimate, θ have (see e.g. [18]): ˆ (i) ) = QML (θ, θ ˆ (i) ) + Qprior (θ, θ ˆ (i) ), Q(θ, θ where

ˆ (i) ) = log[p(θ)], Qprior (θ, θ (i)

(17) (18)

(i)

ˆ ) = E{log[p(z, y|θ)]|y, θ ˆ }, QML (θ, θ

(19)

and where y is the received signal, z is the hidden variables, ˆ i is the estimated parameter vector at the ith iteration of and θ the EM algorithm. Since, in general, there are no measurements of the transmitted signal, it is convenient to regard it as hidden variables in the EM algorithm. The transmitted signal can be considered as being partial training, in which case z=¯ x(U) . The M-step corresponds to the maximization of the function Q obtained in the E-step.

Table 1. Selection of mean-variance mixture representations for penalty functions. ∞ p(θj )= 0 Nθj (μj +λj uj , τ 2 s2j λj )p(λj )dλj g(θj )

Penalty function

Ridge (θj /τ )2 Lasso |θj /τ | Bridge |θj /τ |α

  Generalized (1+α) log 1 + τ Double-Pareto

uj μj

|θj | (ατ )



0 0 λj = 1 0 0 Exponential 0 0 Stable 0 0 Exp-Gamma

We define ⎡    ⎤ ) IL (U ) IL ˜ ˜ (U − sin Ψ X cos Ψ X ö ö ⎥ ⎢  0  0 ⎥ C=⎢ ⎣ (U ) IL (U ) IL ⎦ ˜ö ˜ö cos ΨX sin ΨX 0 0 ⎡    ⎤ ) IL ˜ (U ) IL ˜ (U cos ΨX I ⎢ sin ΨXI ⎥ 0    0 ⎥ D=⎢ ⎣ (U ) IL (U ) IL ⎦ ˜I ˜I sin ΨX − cos ΨX 0 0 (U )

The E-step of the EM algorithm is given by ˆ (i) ) = Kx + Ky − 1 E{(y − M(θ)¯ x(U) )T Σ−1 QML (θ, θ y × 2 ˆ (i) } − 1 E{¯ ˆ (i) }, x(U)T ¯ x(U) |y, θ (y − M(θ)¯ x(U) )|y, θ 2 (20) where M(θ) is a (matrix) function of the parameters, M(h, ), Kx = −NC log(2π) and Ky = −NC log(2π) − 0.5 log |Σy |.

532

(21)

(22)

(U )

˜ ö and X ˜ I are the circulant matrices generated by where X (U ) (U ) xö and xI , respectively. Then, we can write M¯ x(U) = T T T ¯ ¯ (C − D)h, h = [hR hI ] . ˆ (i) ) with respect to h ¯ we have Deriving QML (θ, θ ∂QML −2 ˆ (i) T ¯ = −0.5ση [−2E{(C − D)|y, θ } y ∂h ˆ (i) }]. ¯T E{(C − D)T (C − D)|y, θ + 2h

(23)

The expectations on the right side of (23) are calculated by applying Kalman filtering to the model in (3). ˆ (i) ) and its derivative 4.2. Evaluating Qprior (θ, θ In general, the distribution of an r-dimensional random vector θ is a normal mean-variance mixture with mixing distribution p provided that θ, for a given λ > 0, follows an rdimensional normal distribution with covariance matrix λΔ and mean vector μ+λu, and provided λ follows a probability distribution p on [0, ∞). Then (see e.g. [19]), θ|λ ∼ Nθ (μ + λu, λΔ) ; λ ∼ p.

ˆ (i) ) and its derivative 4.1. Evaluating QML (θ, θ

p(λj )

(24)

Here, Δ denotes a constant, positive-definite r × r matrix and μ and u are constant vectors of dimension r. In other words, ∞ p(θ) = p(θ|λ)p(λ)dλ. (25) 0

In (25), the random variable λ can be considered as a hidden variable in the EM algorithm. On the other hand, depending on the penalty function, the pdf of λ will be different. For common penalty functions, see Table 1 (for each component of θ, θj , there is an associated λj , see [15]).

=

j=1

p 





 

ˆ (i) ), can be expressed The E-step for the penalty Qprior (θ, θ p in terms of the components of θ, since log p(θ, λ) = j=1 log p(θj , λj ). Then, we have: p  (i) (i) ˆ Qprior (θ, θ ) = log[p(θj , λj )]p(λj |θˆj )d(λj )



           









(log[p(θj |λj )] +

(i) log[p(λj )]) p(λj |θˆj )d(λj ).







j=1











(26) ˆ (i) ) is given by (26), then Lemma 1: In the case that Qprior (θ, θ its derivative is given by   ˆ (i) ) ∂Qprior (θ, θ θj − μ j − λj u j (i) p(λj |θˆj )d(λj ) = − ∂θj τ 2 s2j λj   uj θj − μ j −1 = − Eλj |θˆ(i) {λj } , (27) j τ 2 s2j τ 2 s2j

Fig. 1. N-MSE average value (30 Monte Carlo simulations) for the estimates obtained by using (no regularized) ML and regularized ML. where C, D are functions of , and x(U ) . Replacing the ex¯ in (17), we can optimize Q in (17) with repression for h ¯ is obtained spect to the parameter . Thus, the parameter h by replacing the result of the optimization for in (31). This concentration of the cost is straightforward in our method.

where Eλj |θˆ(i) {λ−1 j } is the expectation obtained from j

(i) θˆj − uj uj − Eλj |θˆ(i) {λ−1 j } = g˙ j τ 2 s2j τ 2 s2j



(i) θˆj τ sj



5. NUMERICAL EXAMPLE .

(28)

Proof: See Appendix. (i) Once an estimate (θˆj ) of θj has been obtained, it can be placed into (28), and the solution obtained corresponds to an estimate of Eλj |θˆ(i) {λ−1 j }, which in turn can be utilized in j

the maximization of the Q function (M-step). Once the new (i+1) estimate θˆj has been obtained, it is inserted into (28) and the iteration continues until convergence has been reached. In particular, we have that our chosen penalty function is ¯ j /τ |. Using (28) we have that E ˆ (i) {λ−1 } = −τ × |h ¯ j λ |h j

j

ˆ (i) . Using this value for E ˆ ¯ ¯ (i) )/h sign(h j j λ that

ˆ ¯ (i) j |h j

∂Qprior 1 ¯ ¯ = − τ 2 Eh ∂ h where E = diag Eλ |hˆ¯ (i) {λ−1 1 }, ..., Eλ 1

1

{λ−1 j } , we have

ˆ ¯ (i) 2NC |h2NC

(29)

{λ−1 2NC } .

ˆ (i) ) and Qprior (θ, θ ˆ (i) ) 4.3. Combining QML (θ, θ From a nonlinear estimation problem using EM, QEM and dQEM /dθ are known. The strategy is then to derive the augmented E-step considering both QEM and Qprior with respect ¯ that is, to h, ∂Qprior ∂QML ∂Q (30) ¯ = ∂h ¯ + ∂h ¯ . ∂h ¯ as a function of Thus, we can obtain a stationary point for h

, that is ˆ (i) } + 1 E]−1 × ¯ = ση−2 [ση−2 E{(C − D)T (C − D)|y, θ h τ2 (i) T ˆ } y E{(C − D)|y, θ (31)

533

In this section, we present a numerical example of our approach for an OFDM system with CFO. We assume that the channel noise variance (ση2 ) is known and equal to 0.1. In this example, we consider a system with 64 subcarriers (NC = 64), and a sparse channel impulse response of length 17. The channel exhibits 7 taps equal to zero, accounting for a sparsity of 41%. The transmitted signal is considered Gaussian, and the signal to noise ratio is 10[dB]. The normalized frequency offset is equal to 0.1. We also consider that the transmitted signal is partially known, i.e., some training has been performed, but some of the OFDM symbol samples are unknown at the receiver. In particular, we consider 50% training. ˆ H (h − The normalized mean square error (N-MSE:= (h − h) H ˆ h)/(h h)) average value for the estimates considering different values of τ is shown in Fig. 1. We can observe that, in most of the cases, the estimates including regularization have less error than the estimates with no regularization. For 100% training, it was observed that there is a small difference between the values for N-MSE including (and not including) regularization. Therefore, we can conclude that regularization helps when the amount of data is limited, as the case where a 50% training is considered. 6. CONCLUSIONS We have shown a general sparse channel estimation method, based on Gaussian mean-variance mixtures, for promoting sparsity via 1 -norm regularization. Using the 1 -norm regularization can be seen as a generalization of the Lasso algorithm. In addition, we concentrate the cost function in the M-step in order to numerically optimize in one single variable ( ). The numerical example illustrates the effectiveness of this approach, obtaining most of the times (for different

values of τ ) a lower value for N-MSE including regularization compared to the value for N-MSE with no regularization.

A. APPENDIX In the M-step for the penalty, we need to obtain the derivative ˆ (i) ) with respect to each component θj . Then, of Qprior (θ, θ

7. REFERENCES [1] 3GPP TS 36.201 v.10.1.0, Physical channels and modulation, March 2011. [2] A. Goldsmith, Wireless Communications, Cambridge, UK: Cambridge University Press, 2005. [3] R. van Nee and R. Prasad, OFDM for Wireless Multimedia Communications, Boston, USA: Artech House, 2000. [4] S.F. Cotter and B.D. Rao, “Sparse channel estimation via matghinc pursuit with application to equalization,” IEEE Trans. on Comms., vol. 50, no. 3, pp. 374–377, 2002. [5] W. Li and J.C. Preisig, “Estimation of rapidly time-varying sparse channels,” IEEE Journal of Oceanic Eng., vol. 32, no. 3, pp. 927–939, 2007. [6] J. Homer, I. Mareels, R.R. Bitmead, B. Wahlberg, and F. Gustafsson, “Lms estimation via structural detection,” IEEE Trans. Signal Process., vol. 46, no. 10, pp. 2651–2663, 1998. [7] J. Homer, “Detection guided lms estimation of sparse channels,” in IEEE Globecom, 1998, vol. 6, pp. 3704–3709. [8] C. Carbonelli, S. Vedantam, and U. Mitra, “Sparse channel estimation with zero tap detection,” IEEE Trans. on Wireless Comms., vol. 6, no. 5, pp. 1743–1753, 2007. [9] E.G. Larsson and Y. Sel´en, “Linear regression with a sparse parameter vector,” IEEE Trans. Signal Process., vol. 55, no. 2, pp. 451–460, 2007. [10] Z. Wang and G.B. Giannakis, “Wireless multicarrier communications: Where fourier meets shannon,” IEEE Signal Process. Mag., vol. 17, pp. 29–48, 2000. [11] R. Mo, Y. H. Chew, T. T. Tjhung, and C. C. Ko, “An embased semiblind joint channel and frequency offset estimator for ofdm systems over frequency selective fading channels,” IEEE Trans. Veh. Technol., vol. 57, no. 5, pp. 3275–3282, 2008. [12] P.J. Schreier and L.L. Scharf, Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals, Cambridge Uni. Press, 2010. [13] K.S. Miller, Complex stochastic processes: An introduction to theory and applications, Addison-Wesley Publishing Co, Inc, 1974. [14] S. Buzzi, M. Lops, and S. Sardellitti, “Widely linear reception strategies for layered space-time wireless communications,” IEEE Trans. Signal Process., vol. 54, no. 6, pp. 2252–2262, 2006. [15] N.G. Polson and J.G. Scott, “Sparse bayes estimation in non-gaussian models via data augmentation,” http : //arxiv.org/abs/1103.5407v2. [16] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Royal. Statist. Soc B, vol. 58, no. 1, pp. 267–288, 1996. [17] F. McLachlan and T. Krishnan, The EM Algorithm and Extensions, John Wiley & Sons, 2nd edition, 2008. [18] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” J. R. Stat. Soc. B, vol. 39, no. 1, pp. 1–38, 1977. [19] O. Barndorff-Nielsen, J. Kent, and M. Sorensen, “Normal variance-mean mixtures and z distributions,” Int. Stat. Review, vol. 50, no. 2, pp. 145–159, 1982.

534

ˆ (i) ) d log p(θj |λj ) ∂Qprior (θ, θ (i) = p(λj |θˆj )d(λj ) ∂θj dθj On the other hand ([15]), d log p(θj |λj ) dp(θj |λj ) 1 θj − μ j − λj uj = =− . dθj p(θj |λj ) dθj τ 2 s2j λj (32) dNθj (μj + λj uj , τ 2 s2j λj ) θj − μ j − λj uj =− dθj τ 2 s2j λj Nθj (μj + λj uj , τ 2 s2j λj ). (33) dp(θj |λj ) dp(θ ) We observe that dθjj = p(λj )dλj . Then, dθj ∞ dNθj (μj + λj uj , τ 2 s2j λj ) dp(θj ) = p(λj )dλj dθj dθj 0 uj θj − uj = 2 2 p(θj ) − 2 2 × τ sj τ sj ∞ 2 2 λ−1 (34) j Nθj (μj + λj uj , τ sj λj )p(λj )dλj . 0

On the other hand, p(θj |λj )p(λj ) = p(λj |θj )p(θj ). Hence, ∞ 2 2 λ−1 j Nθj (μj + λj uj , τ sj λj )p(λj )dλj 0 ∞ = λ−1 j p(λj |θj )p(θj )dλj 0 ∞ −1 = p(θj ) λ−1 j p(λj |θj )dλj = p(θj )Eλj |θj {λj } (35) 0

Then, combining (32), (34) and (35), we obtain uj θj − uj dp(θj ) = 2 2 p(θj ) − 2 2 p(θj )Eλj |θj {λ−1 j } dθj τ sj τ sj

(36)

Hence, 1 dp(θj ) d log p(θj ) uj θj − uj = = 2 2 − 2 2 Eλj |θj {λ−1 j }. p(θj ) dθj dθj τ sj τ sj (37) On the other hand, the regularization term is given by

p  θj log p(θ) = g τ sj j=1   θj

dg τ sj θj d log p(θj ) = = g˙ . (38) =⇒ dθj dθj τ sj (i) Finally, combining (37), (38), and evaluating θj = θˆj , we obtain (28).

Suggest Documents