On Speeding Up Computation In Information Theoretic Learninghttps://sites.google.com/site/sohanseth/files-1/IJCNN2009.pd
On Speeding Up Computation In Information Theoretic Learning Sohan Seth and Jos´e C. Pr´ıncipe Computational NeuroEngineering Lab, University of Florida, Gainesville
Introduction
Evaluation
With the recent progress in kernel based learning methods, computation with Gram matrices has gained considerable attention. Given n samples {xi}ni=1 and a positive definite function κ(x, y), Gram matrix KXX is defined as, KXX =
κ(x1, x1) .. κ(xn, x1)
· · · κ(x1, xn) .. . ... · · · κ(xn, xn)
KXY =
κ(x1, y1) .. κ(xn, y1)
ˆ 2(X) can be written as Using G, H
· · · κ(x1, yn) .. . ... · · · κ(xn, yn)
1 ⊤ 1 ⊤ ⊤ ˆ H2(X) ≈ 2 1 GXX GXX 1 = 2 ||1 GXX ||22. n n
However, consider the matrix KZZ
"
KXX KXY = KY X KY Y
{z1, . . . , zn, zn+1, . . . , z2n} = {x1, . . . , xn, y1, . . . , yn} such that KZZ
κ(z , z ) · · · κ(z , z ) 1 1 1 n . . ... . . = κ(zn, z1) · · · κ(zn, zn)
Any n × n symmetric positive definite matrix K can be expressed as K = GG⊤.
where G is a n × n lower triangular matrix with positive diagonal entries. This decomposition is known as the Cholesky decomposition. However, if the eigenvalues of K drops rapidly then the matrix can be approximated by a n × d (d ≤ n) lower triangular matrix G with arbitrary accuracy i.e.