On Speeding Up Computation In Information Theoretic Learning

1 downloads 370 Views 90KB Size Report
On Speeding Up Computation In Information Theoretic Learninghttps://sites.google.com/site/sohanseth/files-1/IJCNN2009.pd
On Speeding Up Computation In Information Theoretic Learning Sohan Seth and Jos´e C. Pr´ıncipe Computational NeuroEngineering Lab, University of Florida, Gainesville

Introduction

Evaluation

With the recent progress in kernel based learning methods, computation with Gram matrices has gained considerable attention. Given n samples {xi}ni=1 and a positive definite function κ(x, y), Gram matrix KXX is defined as, KXX =

   

κ(x1, x1) .. κ(xn, x1)



· · · κ(x1, xn)  .. . ...  · · · κ(xn, xn)

KXY =

  

κ(x1, y1) .. κ(xn, y1)

ˆ 2(X) can be written as Using G, H



· · · κ(x1, yn)  .. . ...  · · · κ(xn, yn)

1 ⊤ 1 ⊤ ⊤ ˆ H2(X) ≈ 2 1 GXX GXX 1 = 2 ||1 GXX ||22. n n

However, consider the matrix KZZ

"

KXX KXY = KY X KY Y

{z1, . . . , zn, zn+1, . . . , z2n} = {x1, . . . , xn, y1, . . . , yn} such that KZZ





κ(z , z ) · · · κ(z , z ) 1 1 1 n   . .  ... . . =   κ(zn, z1) · · · κ(zn, zn)

Any n × n symmetric positive definite matrix K can be expressed as K = GG⊤.

where G is a n × n lower triangular matrix with positive diagonal entries. This decomposition is known as the Cholesky decomposition. However, if the eigenvalues of K drops rapidly then the matrix can be approximated by a n × d (d ≤ n) lower triangular matrix G with arbitrary accuracy i.e.