Image Denoising with a Multi-Phase Kernel Principal Component

0 downloads 0 Views 180KB Size Report
pre-image. Consequently, we generate a pre-image by solving a regularized regression problem. The second method is an ensemble version of the basic ...
Image Denoising with a Multi-Phase Kernel Principal Component Approach and an Ensemble Version Anshuman Sahu and George Runger Department of Industrial Engineering SCIDSE, Arizona State University Tempe, AZ 85287 USA Email: [email protected], [email protected]

Abstract—Image denoising is an important technique of practical significance which serves as a preliminary step for other analyses like image feature extraction, image classification etc. Two novel methods for denoising images are proposed which deal with the case when there is no noise-free training data. The basic method consists of several phases: the first phase involves preprocessing the given noisy data matrix to obtain a good approximation matrix; the second phase involves implementing kernel principal component analysis (KPCA) on the approximation matrix obtained from the first phase. KPCA is one of the useful non-linear techniques applied to image denoising. However, an important problem faced in KPCA is estimating the denoised pre-image. Consequently, we generate a pre-image by solving a regularized regression problem. The second method is an ensemble version of the basic method that provides robustness to noisy instances. Some of the attractive properties of the proposed methods include numerical stability and ease of implementation. Also our methods are based on linear algebra and avoid any nonlinear optimization. Our methods are demonstrated on high-noise cases (for both Gaussian noise and “salt and pepper” noise) for the USPS digits dataset, and they perform better than existing alternatives both in terms of low mean square error and better visual quality of the reconstructed pre-images.

I. I NTRODUCTION Image denoising is of significant importance in practice. It consists of removing noise from corrupted images while retaining overall pattern structure. This serves as the preliminary step for further analyses like image feature extraction, image classification etc. A popular algorithm used in denoising is kernel principal component analysis (KPCA) which finds the nonlinear patterns in the data. An assumption in doing KPCA is the availability of noise-free training datasets. However, it is commonly found in applications like manufacturing variation analysis that the training data is noisy, and the algorithms devised for such applications have to take this fact into consideration. Therefore, the objective of our research is to present a new method for denoising images using KPCA for the case when we don’t have noise-free training data. Our contribution is two-fold: we augment the kernel ridge regression procedure by a preprocessing step which is crucial in practice for high noise scenarios; second we present an

Daniel Apley Department of Industrial Engineering and Management Sciences Evanston, IL, 60208 USA Email: [email protected]

ensemble procedure that is robust to noisy instances and is much faster and easier to implement in practice. The following is the outline of our paper: Section II describes KPCA briefly and summarizes the problem; Section III describes our methodology in detail; Experiment results on USPS data set for different high noisy scenarios for both Gaussian and “salt and pepper” noise are presented in section IV; Finally, we conclude in section V. II. KPCA D ISCUSSION AND P ROBLEM D ESCRIPTION KPCA is a non-linear extension to principal component analysis (PCA) wherein we transform the data in the input space to the feature space and perform PCA in the feature space [1]. The idea is to capture non-linear variations in the data once we transform to a high-dimensional space. Furthermore, the kernel trick [2] makes it easy to extend the linear PCA calculations in the input space to the feature space. Basically, we extract the eigenvectors in the feature space based on the training data, project a test point in the feature space unto the space spanned by leading eigenvectors, and then compute the preimage of this projected point to estimate the denoised test point in input space. The last step of obtaining preimage is a hard one. Since the exact preimage may not exist [3], we try to find an approximate solution. This is achieved by ˆ and 𝑃 𝜑(X) minimizing the squared distance between 𝜑(X) [3] where 𝑃 𝜑(X) is the projected test matrix in feature space, ˆ is the desired preimage, and 𝜑 is defined as a nonlinear X mapping from input space to feature space. The preimage problem has been studied extensively, and a lot of different approaches have been proposed in literature. Mika et al [3] proposed a gradient descent approach which suffers from numerical instabilities and getting trapped in local minima while solving a non-linear optimization approach. Kwok and Tsang [4] proposed a method similar to multidimensional scaling and estimated the preimage as a linear combination of the nearest neighbors. Bakir et al [5] apply kernel regression approach to obtain preimage. Both the above approaches require noise-free training observations. Zheng et al [6] developed a penalized

strategy for preimage learning where some penalization terms are used to guide the preimage learning process. Nguyen et al [7] present a method to handle noise, outlier and missing data in the context of KPCA. Most of the methods described above except for the one proposed by Nguyen et al [7] require learning of eigenvectors from a prototype noise-free training dataset which is not available in our case. However, Nguyen et al [7] attempt to solve a non-linear optimization problem which is sensitive to initial guess and might have numerical issues. We clearly want a simple approach that uses linear algebra. We are faced with the problem of denoising a noisy data matrix which itself serves as the training set. We present two approaches in the next section-a basic two-phase method and its ensemble version- to solve this problem. III. A PPROACHES FOR D ENOISING A. Two-Phase Method The two-phase method essentially consists of a preprocessing phase and a kernel regression phase each of which is described in a separate subsection below for convenience. We, henceforth, denote this two-phase method by BKPCA. 1) First Phase: For the first phase, given a 𝑛 × 𝑑 noisy data matrix X where 𝑛 is the number of data points and 𝑑 is the data dimensionality, we preprocess it to obtain an ˜ The idea behind preprocessing is approximation matrix X. that the approximation matrix retains the overall structure of the original data while getting rid of some noise so that we get a better starting point for the second phase. This preprocessing is crucial because the second phase involves learning from this approximation matrix the reverse mapping from feature space to input space. For our purpose, we use singular value decomposition ˜ Consider the decomposition of X (SVD) of X to obtain X. ′ ˜ = UDV ˜ is the same ˜ ′ where D as X = UDV . Then X matrix as D except for the fact that it consists only of the top 𝑠 singular values of D (truncated to top 𝑠 values) and all other singular values are set to zero. Here 𝑠 is a user defined parameter. In practice, since the data matrix is noisy, lower singular values might be non-zero but still correspond to noise. Thus, by selecting only the top 𝑠 (and not all non-zero) singular values, we expect to get rid of some noise. We can also employ several other techniques developed for matrix approximation. Concept Decomposition was proposed in [8] where the decomposition is obtained by taking the leastsquares approximation onto the linear subspace spanned by all the concept vectors. [9] proposes Generalized Low Rank Approximations of Matrices (GLRAM) as an iterative algorithm to reduce the approximation error in an iterative manner that converges rapidly. The author also states that using GLRAM with SVD, where GLRAM precedes SVD, can reduce the approximate error greatly compared to GLRAM only while keeping the computation cost low. Another approximation method CUR matrix decomposition is developed in [10] where the low-rank matrix decomposition is expressed in terms of a small number of actual rows and/or columns of the original

data matrix making it more interpretable. The above references also empirically establish that the approximation errors of their methods are close to truncated SVD while achieving a significant speed-up and/or memory storage compared to SVD. Therefore, we can use the above methods for preprocessing ˜ phase in case of high data dimensionality to obtain X. ˜ 2) Second Phase: The approximation matrix X obtained from the first phase will be the input matrix to the second ˜ to estimate the denoised phase. We now apply KPCA to X ˆ Let K is the preimage of X in input space denoted by X. kernel matrix given by (1) ˜ ˜ ′ K = 𝜑(X)𝜑( X)

(1)

For our case we use a Gaussian kernel with parameter 𝜎. Also ˜ is the feature space transformation of X ˜ and 𝑃 𝜑(X) ˜ is 𝜑(X) the matrix of projection of the test points in feature space onto space spanned by the chosen number of leading eigenvectors 𝜈. 𝜆 is a positive constant. We will also be using the Frobenius norm of a matrix, denoted by ∥.∥𝐹 , in our case which is calculated by taking the square-root of the sum of the squares of each entry of the matrix. ˜ back to input space to obtain We need to map 𝑃 𝜑(X) ˆ X. We, therefore, seek a non-linear transformation Q from the feature space  back to the input space. Q is obtained ˜′ ˜ ′ − Q𝜑(X) by minimizing X  . A simple choice is to set 𝐹 ˜ where T is a linear transformation matrix. We Q = T𝜑(X) ˆ for this simple choice of Q. Note now show how to derive X that our derivation can be thought of as an alternative to the dual variable approach to kernel ridge regression as shown in [11]. Substituting for Q, we now seek T satisfying   ˜′  − TK (2) min X T

𝐹

where K is the kernel matrix. If K−1 exists, we find T = ˜ ′ K−1 . Since K is positive semi-definite, we add the term 𝜆I X to K for 𝜆 > 0 to ensure numerical stability. This also ensures a certain amount of regularization for non-zero 𝜆. Once we obtain T and thus, obtain the non-linear transformation Q, ˆ by mapping 𝑃 𝜑(X) ˜ from feature space to input we obtain X space using Q. This leads to ˜ ˜ ′ (K + 𝜆I)−1 X ˜ ˆ = 𝑃 𝜑(X)Q ˜ ′ = 𝑃 𝜑(X)𝜑( X) X

(3)

˜ ˜ ′ , we refer to [4] for the In order to calculate 𝑃 𝜑(X)𝜑( X) ˜2, . . . , x ˜ 𝑛 ] be 𝑛 data points. following derivation: Let [˜ x1 , x Let us consider the eigen decomposition of HKH = UΛU′ where H = I − 𝑛1 11′ , 1 = [1, 1, . . . , 1]′ is a 𝑛 × 1 vector, U = [u1 , u2 , . . . , u𝑛 ] is the matrix containing eigenvectors of K, and Λ is the diagonal matrix consisting of eigenvalues (𝜆1 , 𝜆2 , . . . , 𝜆𝑛 ) of K. Also let kx˜ is a 𝑛 × 1 vector where ˜ the 𝑖th entry is defined as the kernel dot product between x ˜ 𝑖 for 𝑖 = 1, 2, . . . , 𝑛, the kernel dot product here using and x a Gaussian kernel is given by ( ) ˜ 𝑖 ∥2𝐹 − ∥˜ x−x ˜ 𝑖 ) = exp 𝑘(˜ x, x (4) 𝜎

𝜈 ∑ 1 ( )u𝑗 u′𝑗 where 𝜈 is the chosen number 𝜆𝑗 𝑗=1 ˜ of top eigenvectors to project onto, then for any data points x ˜𝑙 and x

If we define L =

1 1 K1) + 1′ kx˜ 𝑙 (5) 𝑛 𝑛 The contribution here is to improve the kernel ridge regression procedure through the preprocessing phase for the case when training data is noisy. We present the ensemble version in the next subsection. 𝑃 𝜑(˜ x)𝜑(˜ x𝑙 )′ = k′x˜ 𝑙 H′ LH(kx˜ −

B. Ensemble Version The ensemble version of the basic two-phase method, henceforth called EBKPCA, consists of repeating the twophase method on random subsets of data points chosen from X for a given number of iterations and estimating the final ˆ as average of X ˆ obtained across all matrix of preimage X iterations. The random subset chosen in each iteration can be thought of as a random sample from the training data that has the size a fraction 𝜌 of the size of the training data. The ensemble version provides some robustness to noisy training data. Moreover, it can be parallelized and implemented easily. The ensemble version also reduces the computational burden by computing the kernel matrix on subsets of data rather than the entire data. Let us assume that the number of iterations is 𝐵, and in each iteration we choose a random subset of data points (equal in number for each iteration) from X with replacement.We apply the two-phase method in each iteration and obtain an estimated preimage for a test point. We estimate the final preimage of the given test point as the average of the obtained 𝐵 points. ˆ 𝑏 denote the preimage obtained from the 𝑏th sample. The Let X ˆ is given by final estimated preimage X ˆ = X

𝐵 ˆ ∑ X𝑏 𝑏=1

(6)

𝐵

IV. E XPERIMENTAL R ESULTS AND D ISCUSSION We consider the USPS digit dataset available at http://wwwstat.stanford.edu/ tibs/ElemStatLearn/ for our experiments.We are considering high noise scenarios: Gaussian noise with mean 𝜇 = 0 and 𝜎𝐺 = 2 is considered. Also “salt and pepper” noise with the probability parameter 𝑝 = 0.4 is considered, where 𝑝 denotes the probability that a pixel value is set to zero. The evaluation of our method will be based on the mean squared error (MSE) as well as visual inspection. Let the true data matrix be X0 . Then MSE is given by   ˆ  𝑀 𝑆𝐸 = X − X0  (7) 𝐹

The true data matrix is used only for evaluation purpose. We are using a Gaussian kernel for our purpose. We compare our method with the one proposed in [4] henceforth called KMDS. We carry out our experiments over a range of other parameters to study the behavior of the methods extensively. For our experiments, we chose 300 data points for each digit

(0-9) and added the noise levels described in the beginning of the section.The parameter values were chosen as follows: 𝜎 ∈ {50, 100, 500, 5000}; 𝜈 ∈ {50, 100}. For simplicity, we kept 𝑠 = 𝜈. For the EBKPCA method, the number of iterations 𝐵 was chosen to be 50, and number of random data points chosen at each iteration as kept equal to 150 (𝜌 = 0.5). The number of nearest neighbors was kept equal to ten as described by Kwok and Tsang [4]. We set 𝜆 to a small value, R and compared 𝜆 = 0.01 . We wrote the code in MATLAB⃝, our methods with KMDS using the algorithm provided on the authors website (http://www.cse.ust.hk/ ivor/). First we conducted a few initial experiments where we ran the two-phase procedure for denoising the digits dataset with and without the preprocessing phase for all the chosen parameter values. We randomly chose three digits 4, 8, and 9 and added Gaussian noise with mean 𝜇 = 0 and 𝜎𝐺 = 2 for this initial experiment. The acronym PPC in Fig. 1 stands for the case with preprocessing whereas the acronym NOPPC stands for the case when we don’t use the preprocessing phase. We see that preprocessing reduces MSE significantly. Thus, we can see the preprocessing phase validates our intuition that an approximation matrix results in a better reverse mapping learning in the second phase than the original noisy data matrix. Now we show the results of our other experiments where we use the two-phase method with the preprocessing phase compared to KMDS. Figs 2 through 3 show how the MSE varies for different parameter settings for all the algorithms for the case with Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2 where digits 0 through 4 are shown in Fig. 2, and digits 5 through 9 are shown in Fig. 3. Figs 4 through 5 show how the MSE varies for different parameter settings for all the algorithms for the case with “salt and pepper” noise with 𝑝 = 0.4 where digits 0 through 4 are shown in Fig. 4, and digits 5 through 9 are shown in Fig. 5. We clearly see from the above Figs that the BKPCA and EBKPCA methods outperform the KMDS algorithm significantly for all the chosen paramter values. Finally, we apply all the methods to the digits dataset. First, we show the clean images in Fig. 6 to which we applied the noise. For the instances corrupted with Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2, we show the noisy test images in Fig. 7 and show the visual preimages in Figs 8-10 for the KMDS, BKPCA and EBKPCA methods respectively. For the cases corrupted with “salt and pepper” noise with 𝑝 = 0.4, we show the noisy test images in Fig. 11 and show the visual preimages in Figs 12-14 for the KMDS, BKPCA and EBKPCA methods respectively. We can clearly see from all the above Figs that BKPCA and EBKPCA methods produce visually superior denoised preimages than KMDS method under different high noise scenarios. V. C ONCLUSION We present a novel two-phase approach and its ensemble version to image denoising using KPCA for the case when we don’t have noise-free training data. The first phase enables

KMDSnu50digit0 KMDSnu100digit0 BKPCAnu50digit0 BKPCAnu100digit0 EBKPCAnu50digit0 EBKPCAnu100digit0

280

300 PPCnu50digit4 PPCnu100digit4 NOPPCnu50digit4 NOPPCnu100digit4

280

260

260 240

MSE

MSE

240

220

220 200

200 180

180

160

160

3

4

5

6 log(sigma)

7

8

9

3

4

6 log(sigma)

7

8

9

7

8

9

7

8

9

7

8

9

7

8

9

(a) Digit 0

(a) Digit 4 280

300 PPCnu50digit8 PPCnu100digit8 NOPPCnu50digit8 NOPPCnu100digit8

280

5

KMDSnu50digit1 KMDSnu100digit1 BKPCAnu50digit1 BKPCAnu100digit1 EBKPCAnu50digit1 EBKPCAnu100digit1

260 240

260

MSE

220

MSE

240

200 180

220 160

200

140 120

180

100

160

3

4

5

6 log(sigma)

7

8

3

4

5

9

6 log(sigma)

(b) Digit 1

(b) Digit 8 280

300 PPCnu50digit9 PPCnu100digit9 NOPPCnu50digit9 NOPPCnu100digit9

280

260 KMDSnu50digit2 KMDSnu100digit2 BKPCAnu50digit2 BKPCAnu100digit2 EBKPCAnu50digit2 EBKPCAnu100digit2

240

MSE

260

MSE

240

220

220

200

200

180

180 160

160

3

4

5

6 log(sigma)

7

8

3

4

5

6 log(sigma)

9

(c) Digit 2 (c) Digit 9 280

Fig. 1. and 9

MSE values with and without preprocessing phase for digits 4, 8, 260

MSE

240

KMDSnu50digit3 KMDSnu100digit3 BKPCAnu50digit3 BKPCAnu100digit3 EBKPCAnu50digit3 EBKPCAnu100digit3

220

200

180

160

3

4

5

6 log(sigma)

(d) Digit 3 KMDSnu50digit4 KMDSnu100digit4 BKPCAnu50digit4 BKPCAnu100digit4 EBKPCAnu50digit4 EBKPCAnu100digit4

280

260

240

MSE

us to preprocess the given noisy data matrix obtain a good initial starting point for the second phase. The second phase implements KPCA and obtains the reconstructed preimage by solving a regularized regression problem. The ensemble version also provides robustness to noisy instances in the training set. Both these approaches are simple, numerically stable, and easy to implement, especially the ensemble version which can be easily parallelized. They are based on linear algebra and do not involve non-linear optimization. Our method performs better in terms of low mean square error as well as good visual quality of the reconstructed preimages. Some future research directions will involve considering other ensemble versions like model stacking,model bumping and considering alternate sampling schemes for the ensemble version.

220

200

180

160

3

4

5

6 log(sigma)

(e) Digit 4 Fig. 2. MSE for KMDS versus BKPCA and EBKPCA for Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2

280

266 264

260 262 KMDSnu50digit5 KMDSnu100digit5 BKPCAnu50digit5 BKPCAnu100digit5 EBKPCAnu50digit5 EBKPCAnu100digit5

220

260

MSE

MSE

240

200

KMDSnu50digit0 KMDSnu100digit0 BKPCAnu50digit0 BKPCAnu100digit0 EBKPCAnu50digit0 EBKPCAnu100digit0

258 256 254 252

180 250 160

3

4

5

6 log(sigma)

7

8

248

9

3

4

(a) Digit 5

260

6 log(sigma)

7

8

9

7

8

9

7

8

9

7

8

9

7

8

9

(a) Digit 0

KMDSnu50digit6 KMDSnu100digit6 BKPCAnu50digit6 BKPCAnu100digit6 EBKPCAnu50digit6 EBKPCAnu100digit6

280

5

315 310 305

240

300 KMDSnu50digit1 KMDSnu100digit1 BKPCAnu50digit1 BKPCAnu100digit1 EBKPCAnu50digit1 EBKPCAnu100digit1

295 MSE

MSE

220

200

290 285 280

180

275 160 270 140

3

4

5

6 log(sigma)

7

8

265

9

3

4

(b) Digit 6

260

240

6 log(sigma)

(b) Digit 1

KMDSnu50digit7 KMDSnu100digit7 BKPCAnu50digit7 BKPCAnu100digit7 EBKPCAnu50digit7 EBKPCAnu100digit7

280

5

285 280 KMDSnu50digit2 KMDSnu100digit2 BKPCAnu50digit2 BKPCAnu100digit2 EBKPCAnu50digit2 EBKPCAnu100digit2

275 270 MSE

MSE

220 265

200 260 180

255

160

140

250

3

4

5

6 log(sigma)

7

8

245

9

3

4

(c) Digit 7 285

260

280 275 KMDSnu50digit8 KMDSnu100digit8 BKPCAnu50digit8 BKPCAnu100digit8 EBKPCAnu50digit8 EBKPCAnu100digit8

200

265 260

180

255

160

140

KMDSnu50digit3 KMDSnu100digit3 BKPCAnu50digit3 BKPCAnu100digit3 EBKPCAnu50digit3 EBKPCAnu100digit3

270 MSE

MSE

220

6 log(sigma)

(c) Digit 2

280

240

5

250

3

4

5

6 log(sigma)

7

8

245

9

3

4

(d) Digit 8

260

6 log(sigma)

(d) Digit 3

KMDSnu50digit9 KMDSnu100digit9 BKPCAnu50digit9 BKPCAnu100digit9 EBKPCAnu50digit9 EBKPCAnu100digit9

280

5

300 295 290

240

KMDSnu50digit4 KMDSnu100digit4 BKPCAnu50digit4 BKPCAnu100digit4 EBKPCAnu50digit4 EBKPCAnu100digit4

285 280 MSE

MSE

220

200

275 270 265

180

260 160 255 140

3

4

5

6 log(sigma)

7

8

9

250

3

4

5

6 log(sigma)

(e) Digit 9

(e) Digit 4

Fig. 3. MSE for KMDS versus BKPCA and EBKPCA for Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2

Fig. 4. MSE for KMDS versus BKPCA, EBKPCA for “salt and pepper” noise with 𝑝 = 0.4

Fig. 6.

285 2

2

4

8 10

12

12

14

14

14

16 4

6

8

10

12

14

16

4

6

8

10

12

14

16

8

10

12

14

16

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

10

12

14

16

8

12

14

16 8

10

12

14

6

6

8 10

12

4

4

6

8 10

2

2

4

6

8

16 2

2

4

6

14

16 2

2

4

12

14

16 4

10

12

14

2

2

8 10

12

16 6

6

8 10

14

4

4

6

8

12

2

2

4

6

10

16 2

2

4

6

8 10

12

2

2

4

6

8 10

16

Clean images for digits 0 through 9 serially

2

4

6

280

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

275 KMDSnu50digit5 KMDSnu100digit5 BKPCAnu50digit5 BKPCAnu100digit5 EBKPCAnu50digit5 EBKPCAnu100digit5

MSE

270 265

Fig. 7. Test images with Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2 of digits 0 through 9 serially 2

2

4

2

4

6

260

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

6

8

10

12

14

16

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8

10

10

12

12

14

16 2

2

4

8

14

16 2

2

6

12

14

16 6

10

12

14

4

4

8 10

12

2

2

6

8 10

16 4

4

6

8

14

2

2

4

6

12

16 2

2

4

10

14

16 2

2

8

12

14

16 2

10

12

14

16

6

8 10

12

14

4

6

8 10

12

2

4

6

8 10

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

255 250 245

3

4

5

6 log(sigma)

7

8

9

Fig. 8. Denoised Preimages for KMDS method serially 0 through 9 for Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2 2

2

4

290

6

8

10

12

14

16

6

8

10

12

14

16

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8

10

10

12

12

14

16 2

2

4

8

14

16 2

2

6

12

14

16 6

10

12

14

4

4

8 10

12

2

2

6

8 10

16 4

4

6

8

14

2

2

4

6

12

16 2

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

Fig. 9. Denoised Preimages for BKPCA method serially 0 through 9 for Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2

285

2

2

4

KMDSnu50digit6 KMDSnu100digit6 BKPCAnu50digit6 BKPCAnu100digit6 EBKPCAnu50digit6 EBKPCAnu100digit6

275 270

2

4

6

280

MSE

4

10

14

16 2

2

8

12

14

16 4

10

12

14

2

6

8 10

12

16

4

6

8 10

14

2

4

6

8

12

(a) Digit 5

2

4

6

10

2

4

6

2

4

6

2

4

6

2

4

6

2

4

6

2

4

6

2

4

6

4

6

6

8

8

8

8

8

8

8

8

8

8

10

10

10

10

10

10

10

10

10

10

12

12

12

12

12

12

12

12

12

12

14

14

14

14

14

14

14

14

14

14

16

16

16

16

16

16

16

16

16

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

Fig. 10. Denoised Preimages for EBKPCA method serially 0 through 9 for Gaussian noise 𝜇 = 0 and 𝜎𝐺 = 2

265

2

2

4

2

4

6

260

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

6

8

10

12

14

16

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8

10

10

12

12

14

16 2

2

4

8

14

16 2

2

6

12

14

16 6

10

12

14

4

4

8 10

12

2

2

6

8 10

16 4

4

6

8

14

2

2

4

6

12

16 2

2

4

10

14

16 2

2

8

12

14

16 2

10

12

14

16

6

8 10

12

14

4

6

8 10

12

2

4

6

8 10

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

255 250

3

4

5

6 log(sigma)

7

8

9

(b) Digit 6

Fig. 11. Test images with “salt and pepper” noise 𝑝 = 0.4 of digits 0 through 9 serially 2

2

4

2

4

6

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

6

8

10

12

14

16

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8

10

10

12

12

14

16 2

2

4

8

14

16 2

2

6

12

14

16 6

10

12

14

4

4

8 10

12

2

2

6

8 10

16 4

4

6

8

14

2

2

4

6

12

16 2

2

4

10

14

16 2

2

8

12

14

16 2

10

12

14

16

6

8 10

12

14

4

6

8 10

12

2

4

6

8 10

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

300 295

Fig. 12. Denoised Preimages for KMDS method serially 0 through 9 for “salt and pepper” noise 𝑝 = 0.4

290 KMDSnu50digit7 KMDSnu100digit7 BKPCAnu50digit7 BKPCAnu100digit7 EBKPCAnu50digit7 EBKPCAnu100digit7

MSE

285 280 275

2

2

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

6

8

10

12

14

16

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8

10

10

12

12

14

16 2

2

4

8

14

16 2

2

6

12

14

16 6

10

12

14

4

4

8 10

12

2

2

6

8 10

16 4

4

6

8

14

2

2

4

6

12

16 2

2

4

10

14

16 2

2

8

12

14

16 4

10

12

14

2

6

8 10

12

16

4

6

8 10

14

2

4

6

8

12

14

16 2

4

6

8

10

12

14

16

16

2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

Fig. 13. Denoised Preimages for BKPCA method serially 0 through 9 for “salt and pepper” noise 𝑝 = 0.4

270 265

2

2

4

4

5

6 log(sigma)

7

8

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

6

8

10

12

14

16

8

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8

10

10

12

12

14

16 2

2

4

8

14

16 2

2

6

12

14

16 6

10

12

14

4

4

8 10

12

2

2

6

8 10

16 4

4

6

8

14

2

2

4

6

12

16 2

2

4

10

14

16 2

2

8

12

14

16 2

10

12

14

16

6

8 10

12

14

4

6

8 10

12

2

4

6

8 10

3

2

4

6

260 255

2

4

6

10

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

9

Fig. 14. Denoised Preimages for EBKPCA method serially 0 through 9 for “salt and pepper” noise 𝑝 = 0.4

(c) Digit 7

2

2

4

2

4

6

4

6

8

10

12

14

16

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

4

6

8

10

12

14

16

10

12

14

16

4

6

8

10

12

14

16

4

6

8

10

12

14

16

4

6

6

8

8 10

12

12

14

16 2

2

4

10

14

16 2

2

8

12

14

16 8

10

12

14

6

6

8 10

12

4

4

6

8 10

2

2

4

6

8

16 2

2

4

6

14

16 2

2

4

12

14

16 2

10

12

14

16 2

10

12

14

16

8

10

12

14

6

8

10

12

4

6

8

10

2

4

6

8

285

14

16 2

4

6

8

10

12

14

16

16 2

4

6

8

10

12

14

16

2

4

6

8

10

12

14

16

280 275

KMDSnu50digit8 KMDSnu100digit8 BKPCAnu50digit8 BKPCAnu100digit8 EBKPCAnu50digit8 EBKPCAnu100digit8

MSE

270 265

ACKNOWLEDGEMENTS

260 255 250 245

3

4

5

6 log(sigma)

7

8

9

(d) Digit 8

R EFERENCES [1] B. Sch¨olkopf, A. J. Smola, and K.-R. M¨uller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. [2] M. Aizerman, E. Braverman, and L. Rozonoer, “Theoretical foundations of the potential function method in pattern recognition learning,” Automation and Remote Control, vol. 25, pp. 821–837, 1964. [3] S. Mika, B. Sch¨olkopf, A. J. Smola, K.-R. M¨uller, M. Scholz, and G. R¨atsch, “Kernel pca and de-noising in feature spaces,” in Neural Information Processing Systems NIPS, 1998, pp. 536–542. [4] J. Kwok and I. Tsang, “The pre-image problem in kernel methods,” IEEE Transactions on Neural Networks, vol. 15, pp. 1517–1525, 2004.

295 290 285

KMDSnu50digit9 KMDSnu100digit9 BKPCAnu50digit9 BKPCAnu100digit9 EBKPCAnu50digit9 EBKPCAnu100digit9

MSE

280 275 270 265 260 255 250

3

4

5

6 log(sigma)

This material is based upon work supported by the National Science Foundation under Grant No. 0825331. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

7

8

9

(e) Digit 9 Fig. 5. MSE for KMDS versus BKPCA, EBKPCA for “salt and pepper” noise with 𝑝 = 0.4

[5] G. H. Bakir, J. Weston, and B. Sch¨olkopf, “Learning to find pre-images,” in Advances in Neural Information Processing Systems, vol. 16, 2004, pp. 449–456. [6] W. S. Zheng, J. Lai, and P. C. Yuen, “Penalized preimage learning in kernel principal component analysis,” IEEE Transactions on Neural Networks, vol. 21, no. 4, pp. 551–570, 2010. [7] M. H. Nguyen and F. D. la Torre, “Robust kernel principal component analysis,” in Neural Information Processing Systems NIPS, 2008, pp. 1185–1192. [8] I. S. Dhillon and D. S. Modha, “Concept decompositions for large sparse text data using clustering,” Machine Learning, vol. 42, no. 1/2, pp. 143– 175, 2001. [9] J. Ye, “Generalized low rank approximations of matrices,” Machine Learning, vol. 61, no. 1-3, pp. 167–191, 2005. [10] P. Drineas, M. W. Mahoney, and S. Muthukrishnan, “Relative-error cur matrix decompositions,” SIAM J. Matrix Analysis Applications, vol. 30, no. 2, pp. 844–881, 2008. [11] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning algorithm in dual variables,” in International Conference on Machine Learning ICML, 1998, pp. 515–521.