Approximating the Best Linear Unbiased Estimator of Non ... - CiteSeerX

IEICE Transactions on on Information and Systems, vol.E91-D, no.5, pp.1577-1580, 2008.

1

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama ([email protected]) Department of Computer Science, Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan Motoaki Kawanabe ([email protected]) Fraunhofer FIRST.IDA, Kekuléstrasse 7, 12489 Berlin, Germany Gilles Blanchard ([email protected]) Fraunhofer FIRST.IDA, Kekuléstrasse 7, 12489 Berlin, Germany Klaus-Robert M¨ uller ([email protected]) Department of Computer Science, Technical University of Berlin Franklinstrasse 28/29, 10587 Berlin, Germany

Abstract Obtaining the best linear unbiased estimator (BLUE) of noisy signals is a traditional but powerful approach to noise reduction. Explicitly computing the BLUE usually requires the prior knowledge of the noise covariance matrix and the subspace to which the true signal belongs. However, such prior knowledge is often unavailable in reality, which prevents us from applying the BLUE to real-world problems. To cope with this problem, we give a practical procedure for approximating the BLUE without such prior knowledge. Our additional assumption is that the true signal follows a non-Gaussian distribution while the noise is Gaussian. Keywords signal denoising, best linear unbiased estimator (BLUE), non-Gaussian component analysis (NGCA), Gaussian noise.

Approximating the Best Linear Unbiased Estimator

1

2

Introduction and formulation

Let x ∈ Rd be the observed noisy signal, which is composed of an unknown true signal s and unknown noise n. x = s + n. (1) We treat s and n as random variables (thus x also), and assume s and n are statistically independent. We further suppose that the true signal s lies within a subspace S ⊂ Rd of known dimension m = dim(S), where 1 ≤ m < d. On the other hand, the noise n spreads out over the entire space Rd and is assumed to be of mean zero. Following this generative model, we are given i.i.d. observations {xi }ni=1 . Our goal is to obtain denoised signals that are close to the true signals {si }ni=1 . A standard approach to noise reduction in this setting is to project the noisy signal x onto the true signal subspace S, by which the noise is reduced while the signal component s is still preserved. Here the projection does not have to be orthogonal, thus we may want to optimize the projection direction so that the maximum amount of noise can be removed. In statistics, the linear estimator which fulfills the above requirement is called the best linear unbiased estimator (BLUE) [1]. The BLUE has the minimum variance among all linear unbiased estimators. More precisely, the BLUE of s denoted by b s is defined by b s = Hx,

(2)

where f − En Hx∥ f 2 H ≡ argmin En ∥Hx f d×d H∈R

f = s. subject to En [Hx]

(3)

In the above equation, En denotes the expectation over the noise n. Let Q be the noise covariance matrix: Q ≡ En [nn⊤ ], (4) which we assume to be non-degenerated. Let P be the orthogonal projection matrix onto the subspace S. Then, the estimation matrix H defined by Eq.(3) is given as (see e.g., [1, 4]) H = (P Q−1 P )† P Q−1 , (5) where † denotes the Moore-Penrose generalized inverse. Let N ≡ QS ⊥ , where S ⊥ is the orthogonal complement of S. Then it is possible to show that H is an oblique projection onto S along N (see e.g., [4]): ½ x if x ∈ S, Hx = (6) 0 if x ∈ N . This is illustrated in Figure 1.


3

Figure 1: Illustration of the BLUE. When computing the BLUE by Eqs.(2) and (5), the noise covariance matrix Q and the signal subspace S should be known. However, Q and S are often unknown in practice, and estimating them from data samples {xi }ni=1 is not generally a straightforward task. For this reason, the applicability of the BLUE to real-world problems has been rather limited so far. In this paper, we therefore propose a new method which enables us to obtain an approximation of the BLUE even in the absence of the prior knowledge of Q and S. Our procedure is essentially based on two facts: the noise covarince matrix Q is actually not required for computing the BLUE (see Section 2) and the signal subspace S can be estimated by a recently proposed method of dimensionality reduction called non-Gaussian component analysis [2, 3] (see Section 3). We effectively combine these two ideas and propose a practical procedure for approximating the BLUE. Our additional assumption is that the true signal follows a non-Gaussian distribution while the noise is Gaussian.

2

Computing the BLUE without knowledge of the noise covariance matrix Q

In this section, we show that the noise covariance matrix Q is not needed in computing the BLUE. Let Σ ≡ Ex [xx⊤ ]. Then the following lemma holds. Lemma 1 The subspace N is expressed as N = ΣS ⊥ . Since s and n are independent and En [n] = 0, the above lemma can be immediately confirmed as ΣS ⊥ = Es [ss⊤ S ⊥ ] + Ex [sn⊤ + ns⊤ ]S ⊥ + En [nn⊤ ]S ⊥ = QS ⊥ .

(7)

Lemma 1 implies that the noise reduction subspace N can be characterized by Σ, without using the noise covariance matrix Q. Then the following lemma holds. Lemma 2 The estimation matrix H is expressed as H = (P Σ−1 P )† P Σ−1 .


4

Letting P ⊥ be the orthogonal projection matrix onto S ⊥ , we have, for any x ∈ Rd , (P Σ−1 P )† P Σ−1 (P x) = (P Σ−1 P )† (P Σ−1 P )x = P x, (P Σ−1 P )† P Σ−1 (ΣP ⊥ x) = (P Σ−1 P )† (P P ⊥ )x = 0,

(8) (9)

from which the above lemma can be confirmed. Lemma 2 implies that we can obtain the BLUE using Σ, without using Q. Roughly speaking, the “N -part” of Q should agree with that of Σ because the signal s lies only in S. Therefore, it intuitively seems that we can replace Q in Eq.(5) by Σ because H only affects the component in N (see Eq. (6)). The above lemma theoretically supports this intuitive claim. A practical advantage of the above lemma is that, while estimating the noise covariance matrix Q from the data samples {xi }ni=1 is not a straightforward task in general, Σ can be directly estimated in a consistent way as X b = 1 xi x⊤ Σ i . n i=1 n

3

(10)

Estimating the signal subspace S by non-Gaussian component analysis

The remaining issue to be discussed is how to estimate the true signal subspace S (or the orthogonal projection matrix P ). Here we additionally assume that the signal s follows a non-Gaussian distribution, while the noise n follows a Gaussian distribution. The papers [2, 3] proposed a framework called non-Gaussian component analysis (NGCA). First we briefly review the key idea of NGCA and then show how we employ NGCA for finding the signal subspace S.

3.1

Non-Gaussian component analysis

Suppose we have i.i.d. observations {x′i }ni=1 , which follow a d-dimensional distribution with the following semiparametric probability density function: p(x′ ) = f (T x′ )ϕQ′ (x′ ),

(11)

where f is a function from Rm to R, T is an m × d matrix, and ϕQ′ is the centered Gaussian density with covariance matrix Q′ . Note that we know neither p, f , T , nor Q′ , but we only know that the density p(x′ ) is of the form of Eq.(11). Let S ′ = R(T ⊤ ), where R(·) is the range of a matrix. Then the following proposition holds.


5

Proposition 1 [2, 3] Suppose Ex′ [x′ x′⊤ ] = I d , where I d is the d-dimensional identity matrix. For an arbitrary smooth function h(x′ ) from Rd to R, β ≡ Ex′ [g(x′ )] ∈ S ′ ,

(12)

where g(x′ ) is a function from Rd to Rd defined by g(x′ ) ≡ x′ h(x′ ) − ∇h(x′ ).

(13)

∇ is the differential operator. The above proposition shows that we can construct vectors which belong to S ′ . Generally, different functions h(x′ ) produce different vectors β. Therefore, we may construct a set of vectors which spans the entire subspace S ′ . However, β can not be computed in practice since the unknown p(x′ ) is included through Ex′ in Eq.(12); we employ its consistent estimator given by n 1X b g(x′i ). (14) β= n i=1 b obtained from a set of By applying principal component analysis (PCA) to the set of β ′ ′ different h(x ), we can estimate the subspace S . b is linear. This means that Eqs.(14) and (13) imply that the mapping from h to β b just by multiplying h by an arbitrary scalar. we can arbitrarily change the norm of β This can totally change the PCA results since vectors with larger norm have stronger b (or h) in a reasonable impacts on the PCA solutions. Therefore, we need to normalize β b has a way. A desirable normalization scheme would possess a property that resulting β ′ ′ b larger norm if it is close to S (i.e., the angle between β and S is small). This desirable b by its standard deviation [2]. property can be approximately achieved by normalizing β b is given by A consistent estimator of the variance of β X b 2. b= 1 N ∥g(x′i )∥2 − ∥β∥ n i=1 n

(15)

Further discussions of NGCA including sophisticated implementation, rigorous theoretical error analysis, and extensive experimental studies are given in [2, 3].

3.2

Applying NGCA for estimating signal subspace S

Here we show how NGCA is employed for approximating the BLUE. Let us apply the whitening transformation to the data samples, i.e., x′ = Σ− 2 x. 1

Then the following lemma holds:

(16)


6

Lemma 3 The probability density function of x′ is expressed in the form of Eq.(11) with 1 R(T ⊤ ) = Σ− 2 S. (Proof ) Let us denote S ′ ≡ Σ− 2 S and N ′ ≡ S ′⊥ = Σ 2 S ⊥ = Σ− 2 N (via Lemma 1). Decompose the whitened sample as follows: 1

1

1

x′ = s′ + n′S ′ + n′N ′ ,

(17)

with s′ = Σ− 2 s, and n′S ′ , n′N ′ the orthogonal projections of Σ− 2 n on S ′ , N ′ . By characterization of the whitening transform, we have Ex′ [x′ x′⊤ ] = I d . Since n′N ′ and (s′ + n′S ′ ) are the orthogonal projections of x′ onto two orthogonal subspaces, this entails 1

1

O = En [n′N ′ (s′ + n′S ′ )⊤ ] = En [n′N ′ n′⊤ S ′ ],

(18)

where O is the null matrix, and the second equality holds because s and n are independent. This shows that n′S ′ and n′N ′ are independent. Let us consider an orthonormal ⊤ ⊤ ⊤ ′ ′ basis [T ⊤ S ′ , T N ′ ] such that R(T S ′ ) = S and R(T N ′ ) = N . Let z S ′ ≡ T S ′ x′ = T S ′ s′ + T S ′ n′S , z N ′ ≡ T N ′ x′ = T N ′ n′N .

(19) (20)

Since s follows a non-Gaussian distribution, z S ′ is also non-Gaussian. Let us denote the probability density function of z S ′ by q(z S ′ ). On the other hand, z N ′ follows a centered Gaussian distribution with identity covariance matrix I d−m . Since z S ′ and z N ′ are independent, the probability density function of x′ is given by p(x′ ) = q(z S ′ )ϕI d−m (z N ′ ) q(z S ′ ) ϕI (z S ′ )ϕI d−m (z N ′ ) = ϕI m (z S ′ ) m q(z S ′ ) ϕI ([z S ′ , z N ′ ]⊤ ) = ϕI d−m (z S ′ ) d q(T S ′ x′ ) ϕI (x′ ). = ϕI d−m (T S ′ x′ ) d Putting f (z) =

q(z) , ϕI d−m (z)

(21)

T = T S ′ , and Q′ = I d concludes the proof.

Lemma 3 shows that, by NGCA, we can estimate S ′ from the whitened samples x . Let P ′ be the orthogonal projection matrix onto the subspace S ′ . We claim that 1 1 H = Σ 2 P ′ Σ− 2 , which is readily checked by the characterization (6), using the fact that 1 N = Σ 2 S ′⊥ . Therefore, the BLUE b s can be expressed in terms of P ′ and x′ as ′

b s = Σ 2 P ′ x′ . 1

(22)

Finally, replacing all the unknown quantities by their consistent estimators, we have an algorithm for approximating the BLUE (see Figure 2).


7

Prepare different functions {hk (x′ )}Lk=1 . 1 b − 2 xi with Σ b = 1 Pn xi x⊤ . b ′i = Σ x i i=1 n Pn 1 ′ ′ b β k = n i=1 g k (xi ) with g k (x ) = x′ hk (x′ ) − ∇hk (x′ ). b ∥2 . bk = Pn ∥g k (x′i )∥2 − ∥β N k i=1 P m b bkβ b ⊤ /N bk . {ψ i }i=1 : m leading eigenvectors of Lk=1 β k b ψ b ⊤. b ′ = Pm ψ P i=1

i

b ′x b 2P b ′i . b si = Σ 1

i

Figure 2: Algorithm for approximating the BLUE.

4

Numerical example

We created two-dimensional samples, where S = (1, 0)⊤ , the first element of sµfollows ¶ 1 1 . the uniform distribution on (0, 10), the second element of s is zero, and Q = 1 2 We used hi (x) = sin(w⊤ i x) (i = 1, 2, . . . , 1000) in NGCA, where w i is taken from the centered normal distribution with identity covariance matrix. The samples in the original space, the transformed samples in the whitened space, b functions are plotted in Figure 3. The mean squared errors of {b and the β si }ni=1 obtained by the true BLUE (using the knowledge of Q and S) and the proposed NGCA-based method as a function of the number of samples are depicted in Figure 4. This illustrates the usefulness of our NGCA-based approach.

5

Conclusions

For computing the BLUE, prior knowledge of the noise covariance matrix Q and the signal subspace S is needed. In this paper, we proposed an algorithm for approximating the BLUE without such prior knowledge. Our assumption is that the signal is nonGaussian and the noise is Gaussian. We described a naive implementation of NGCA for simplicity. We may employ a more sophisticated implementation of NGCA such as multi-index projection pursuit (MIPP) algorithm [2] and iterative metric adaptation for radial kernel functions (IMAK) [3], which would further improve the accuracy.

Acknowledgments We acknowledge partial financial support by DFG, BMBF, the EU NOE PASCAL (EU # 506778), Alexander von Humboldt Foundation, MEXT (18300057 and 17700142), and Tateishi Foundation.


6

x S N S−hat N−hat

4

2

x’−hat S’ N’ S’−hat N’−hat

1.5 1

2

8

β−hat S’ N’ S’tilde−hat N’−hat

10 8 6 4

0.5

2 0

0

0

−0.5

−2

−1

−4

−2

−6

−1.5

−4

−8 −2 −10

0

2

4

6

8

10

−1

(a) Samples in the original space

0

1

2

3

−10

−5

0

5

10

b used in NGCA (c) β

(b) Samples in the whitened space

b functions, but plotted Figure 3: Illustration of simulation when n = 100. We used 1000 β only 100 points for clear visibility. 10 True BLUE NGCA−BLUE

9 8 7

MSE

6 5 4 3 2 1 0

0

200

400

600

800

1000

n

Figure 4: Mean squared error

Pn

si i=1 (b

− si )2 /n averaged over 100 runs

References [1] A. Albert, Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York and London, 1972. [2] G. Blanchard, M. Kawanabe, M. Sugiyama, V. Spokoiny, and K.R. M¨ uller, “In search of non-Gaussian components of a high-dimensional distribution,” Journal of Machine Learning Research, vol.7, pp.247–282, Feb. 2006. [3] M. Kawanabe, M. Sugiyama, G. Blanchard, and K.R. M¨ uller, “A new algorithm of non-Gaussian component analysis with radial kernel functions,” Annals of the Institute


9

of Statistical Mathematics, vol.59, no.1, pp.57–75, 2007. [4] H. Ogawa and N.E. Berrached, “EPBOBs (extended pseudo biorthogonal bases) for signal recovery,” IEICE Transactions on Information and Systems, vol.E83-D, no.2, pp.223–232, 2000.

Approximating the Best Linear Unbiased Estimator of Non ... - CiteSeerX

Approximating the Best Linear Unbiased Estimator of Non ... - CiteSeerX

Suggest Documents

The Best Linear Unbiased Estimator for

OBTAINING THE BEST LINEAR UNBIASED ESTIMATOR OF NOISY ...

Approximate Best Linear Unbiased Channel

Best linear unbiased prediction of herbivore preferences

Best linear unbiased estimator approach for time-of-arrival ... - EEE, HKU

WHY NOT ROUTINELY USE BEST LINEAR UNBIASED ... - CiteSeerX

A Novel Linear, Unbiased Estimator to Fuse Delayed Measurements ...

Comparison of the Performance of Best Linear Unbiased ... - cimmyt

best quadratic unbiased estimator (bque) for timing ... - Semantic Scholar

Best Linear Unbiased Prediction of Genomic ... - Semantic Scholar

Optimal Linear Unbiased Filtering with Polar ... - CiteSeerX

RIDGE REGRESSION ESTIMATOR: COMBINING UNBIASED AND ...

Approximating the Bias of the LSDV Estimator for Dynamic ... - Stata

an unbiased channel estimator for space- time

an unbiased estimator for the parameter of a homographic distribution

Best Linear Unbiased State Estimation with Noisy and Noise-Free ...

Empirical best linear unbiased prediction in cultivar ... - Springer Link

Genetic evaluation using single-step genomic best linear unbiased ...

Supporting Information 3 Best linear unbiased predictors for ... - PLOS

Best Linear Unbiased Filtering for Target Tracking ... - Semantic Scholar

Approximating Transition Densities for Non-Linear Jump Diffusion ...

linear & non - CiteSeerX

State Estimator for Linear Systems with Non-integer

Journal of Science Hartley-Ross Unbiased Estimator of Population ...