Scale? K k (xi ,xj ). The problem is inherently super-quadratic in the number ..... hyper-rectangle .... Each state on e
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods
A Distributed Kernel Summation Framework for Machine Learning and Scientific Applications
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Dongryeol Lee Ph.D. Candidate in Computer Science School of Computational Science and Engineering College of Computing, Georgia Institute of Technology
May 4, 2012 10 AM - 12 PM, KACB 1212
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
1 Kernel Methods
Solution Preview
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
2 Thesis Outline
Problem Definition Thesis Statement and Contributions
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
Distributed Averaging
Conclusion Omitted Research Work
5 Conclusion
Omitted Research Work
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
Distributed Averaging
Conclusion Omitted Research Work
5 Conclusion
Omitted Research Work
A Distributed Kernel Summation Framework
Kernel Methods
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
A kernel function k : X × X → R: Defines a similarity measure between a pair of objects. 0.6
Scaling Kernel Summations
0.4
Approximation Methods Distributed and Shared Memory Parallelism
0.2
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
-0.6
-0.4
0.2
-0.2
0.4
0.6
⇔
-0.2 -0.4 -0.6
Example: Gaussian {Ki,j }1≤i,j≤N = e −||xi −xj ||
2 /(2h2 )
A Distributed Kernel Summation Framework
Kernel Summations
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
“Kernel summations is the computational bottleneck ubiquitous in kernel methods and scientific algorithms.” Kernel function which outputs a real number given a tuple of points. Kernel summation computes ≈ average similarity.
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods
Distributed Data If the disk space is cheap, why can’t we store everything on one machine?
Solution Preview
Thesis Outline
P0
P1
P2
P3
P4
P5
P6
P7
Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
• More cost-effective to distribute data on a network of less
powerful nodes than storing everything on one powerful node.
Distributed Averaging
• Allows distributed query processing for high scalability.
Conclusion
• In some cases, all of the data cannot be stored on one
Omitted Research Work
node due to privacy concerns.
A Distributed Kernel Summation Framework
Why are Kernel Methods Hard to Scale?
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations
k Hx i ,x j L
K
Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
The problem is inherently super-quadratic in the number of data points. How do we break up the pairwise/higher-order interaction?
A Distributed Kernel Summation Framework
Solution Preview 1: Distributed Tree
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
P0
P1
P2
P3
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Solution Preview 2: Approximation Methods Different approximation methods.
Thesis Outline à
Problem Definition Thesis Statement and Contributions
à
æ cQ
æ
à
æ cR
æ
æ
æ
æ
æ
Scaling Kernel Summations
cR L æ
à
æ cR
æ
æ
æ cR R
cQ L æ
àà
à
æ cQ
àà
Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
æ æ
Distributed Averaging
Conclusion
æ
cR æ æ
æ
æ æ æ
cQ æ æ
æ
æ
æ
Omitted Research Work
æ
æ
æ
æ
æ
æ æ æ
æ
æ cQ R
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Solution Preview 3: Distributed Averaging
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
Distributed Averaging
Conclusion Omitted Research Work
5 Conclusion
Omitted Research Work
A Distributed Kernel Summation Framework Dongryeol Lee
Kernel Methods for Density Estimation
Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
If the kernel is a pdf, then we can do density estimation. Kernel density estimation [Parzen 1962]: Weighted column P wi k(q, ri ) = K · w ). average of the kernel matrix (map q∈Q ri ∈R
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
⇒
A Distributed Kernel Summation Framework
Kernel Methods
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
A kernel k is a similarity function. If it in addition satisfies the Mercer’s conditions (K 0), then it corresponds to a dot-product: k(xi , xj ) = φ(xi )T φ(xj ). 1.0
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
0.5
-1.0
0.5
-0.5
1.0
φ →
-0.5
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
-1.0
High-dimensional mapping φ can be automatically provided by k, i.e. kernel trick.
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods
Kernel Methods for Nonlinear Feature Extraction Kernel PCA [Scholkopf 1996]: eigendecompose (K = UΣU T )
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
≈
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
KPCA on MNIST60k of 28 by 28 images 0 1
10 0 −10
−10
Conclusion Omitted Research Work
−20 −5
0 0
5
10 10
A Distributed Kernel Summation Framework
Kernel Methods for Regression
Dongryeol Lee
Gaussian process regression: solve a large linear system. Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
⇒
K −1 y
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
4
2
-6
-4
2
-2
-2
Conclusion Omitted Research Work
-4
4
6
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods
Scientific Motivation for Kernel Methods Sloan Digital Sky Survey:
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Collects around 200 GB of data/day. Has collected photometric data of around 500 M objects and spectra for more than 1 M objects.
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Kernel density estimator:
Scientific Motivation for Kernel Methods: Redshift Prediction P wj k(q, rj )
map
q∈Q rj ∈R P
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
NadarayaWatson:
map
Gaussian process regression:
K −1 y
P
q∈Q
rj ∈R
wj k(q,rj )
6
6
5
5
4
4
3
3
2
2
1
1 1
Conclusion
wj yj k(q,rj )
(rj ,yj )∈R
-1
2
3
4
5
1
2
3
4
5
-1
Omitted Research Work
Large-scale redshift prediction of galaxies and quasars.
A Distributed Kernel Summation Framework
Problem: Scaling Kernel Methods
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
k Hx i ,x j L
K
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Parameter optimization is computationally intensive. Weighted column average (KDE): O(N 2 ) / parameter. Eigendecomposition (KPCA): O(N 3 ) / parameter. Solving linear systems (GPR): O(N 3 ) / parameter.
A Distributed Kernel Summation Framework
Problem: Scaling Kernel Methods
Dongryeol Lee Kernel Methods Solution Preview
P0
P1
P2
P3
P4
P5
P6
P7
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Depending on the kernel (long-range), the communication could be a bottleneck. Weighted column average (KDE): O(N 2 ) / parameter. Eigendecomposition (KPCA): O(N 3 ) / parameter. Solving linear systems (GPR): O(N 3 ) / parameter.
A Distributed Kernel Summation Framework
Problem: Scaling Kernel Methods
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
MNIST Handwritten digit recognition data of 28 × 28 images. Training set: 60K points ⇒ requires 28 GB to store the kernel matrix.
A Distributed Kernel Summation Framework
Problem: Scaling Kernel Methods
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
For the moment, assume that the kernel hyperparameters are fixed. We are given the set of query points Q = {q} and the set of data points R. P For KDE: Given > 0, approximate Φ(q; R) = k(q, r) with r∈R e R) such that Φ(q; e R) − Φ(q; R) ≤ Φ(q; R) as fast as Φ(q; possible. Can also put probabilistic error bounds (Chapter 5 of the thesis).
A Distributed Kernel Summation Framework
Thesis Statement
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
“Utilizing the best general-dimension algorithms, approximation methods with error bounds, the distributed and shared memory parallelism can help scale kernel methods.”
A Distributed Kernel Summation Framework
Thesis Contributions
Dongryeol Lee Kernel Methods
A parallel kernel summation framework that utilizes:
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
• A recursive general-dimension divide-and-conquer
algorithm using various types of deterministic and probabilistic approximations (Chapter 3, 4, 5). • Indexing the data using any multi-dimensional binary tree
with both distributed memory (MPI) and shared memory (OpenMP/Intel TBB) parallelism (Chapter 8). • A dynamic load balancing scheme to adjust work
imbalances during the computation (Chapter 8). • A combination of distributed averaging and random
feature extraction for scaling GPR and KPCA (Chapter 9).
A Distributed Kernel Summation Framework Dongryeol Lee
Methods That Can be Easily Parallelized Within the Framework
Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
• Kernel density estimation. • Kernel regression/local polynomial regression. • Gaussian process regression. • Kernel PCA. • Kernel SVM (test phase). • Kernel k-means. • Kernel conditional density estimation. • Kernel orthogonal centroid. • Kernel (...).
A Distributed Kernel Summation Framework
Earlier Related Work
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
• “A framework for parallel tree-based scientific
simulations,” in Proceedings of 26 th International Conference on Parallel Processing, pp. 137-144, 1997. [Liu and Wu 1997] • THOR: A Parallel N-body Data Mining Framework [Boyer,
Riegel, Gray 2007] • The PEGASUS peta-scale graph mining framework [Kang,
Tsourakakis, Faloutsos 2009] Our work: more comprehensive arsenal of approximation methods + data structures; uses MPI/OpenMP/Intel TBB from the ground-up. Caveat: still a work in progress.
A Distributed Kernel Summation Framework
The First Part of the Talk
Dongryeol Lee Kernel Methods
Density estimation KDE: O N 2 / parameter.
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
⇒
Clustering Mean shift O N 2 / iter / parameter.
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Regression: GPR. O(N 3 ) / parameter.
Nonlinear feature extraction: KPCA. O(N 3 ) / parameter.
⇒
≈
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
The Other Two Claim: GPR/KPCA computations ≈ kernel summations Density estimation KDE: O N 2 / parameter.
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Clustering Mean shift O N 2 / iter / parameter.
⇒
Regression: GPR. O(N 3 ) / parameter.
Nonlinear feature extraction: KPCA. O(N 3 ) / parameter.
⇒
≈
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
Distributed Averaging
Conclusion Omitted Research Work
5 Conclusion
Omitted Research Work
A Distributed Kernel Summation Framework Dongryeol Lee
Contribution 1: Unified Framework for Kernel Summation
Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
• Use multidimensional trees for full generality. • Divide-and-conquer using trees via approximations.
A Distributed Kernel Summation Framework
Multidimensional Trees
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
kd-tree [Bentley 1975]: recursively split the data points using an axis-aligned split.
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Generalized N-body Framework [Gray/Moore 2000] For problems of the form L N k(q, r ) where ⊕ and ⊗ are associative, q∈Q r ∈R commutative binary operators:
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
DualTree(Q, R) if CanSummarize(Q, R) then Summarize(Q, R) else if Q and R are leaf nodes then DualTreeBase(Q, R) else DualTree(Q L , R L ),DualTree(Q L , R R ) DualTree(Q R , R L ),DualTree(Q R , R R ) end if end if
A Distributed Kernel Summation Framework
Contribution 2: Deterministic/Probabilistic Approximations
Dongryeol Lee Kernel Methods Solution Preview
à à
Thesis Outline
æ cQ
æ
Problem Definition Thesis Statement and Contributions
à
æ cR
æ
æ
æ
æ
æ
cR L æ
à
æ cR
æ
æ
æ cR R
cQ L æ
àà
à
æ cQ
æ
æ cQ R
àà
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
æ æ æ
cR æ æ
æ
æ
æ
æ æ æ
cQ æ æ
æ
æ æ æ
æ
æ
æ æ
æ
My research prior to year 2008 has focused on this aspect (Chapter 3, 4, 5 of the thesis); 16K lines of open-sourced C++ series expansion code.
A Distributed Kernel Summation Framework
Parallelizing Kernel Summations
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline
P0
P1
P2
P3
P4
P5
P6
P7
Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Based on the work (Chapter 8 of the thesis): D. Lee, R. Vuduc, and A. G. Gray. A Distributed Kernel Summation Framework for General-Dimension Machine Learning. In Proceedings of SIAM International Conference on Data Mining, 2012. Best Paper Award.
Extended version invited for a SIAM journal.
A Distributed Kernel Summation Framework
Recursive Doubling
Dongryeol Lee
6
Kernel Methods Solution Preview
7
2
Problem Definition Thesis Statement and Contributions
4
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
5
0 4-7 0-3 0-3
0-3
0-1 0-7
4-7
0-3 4-7 4-7
6-7
2-3 4-5 4-5
0-1
1
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
2-3
3
Thesis Outline
6-7
0-7
0-7
0-7 0-7 0-7
0-7
0-7
• Used for constructing trees in parallel. • Passing data/messages among MPI processes.
A Distributed Kernel Summation Framework
Contribution 3: Distributed Tree
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations
P0
P1
P2
P3
Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Black line: the global tree shared by all processes. Red line: the local tree owned by process P0. Each process owns the global tree and its local tree.
A Distributed Kernel Summation Framework
Distributed Memory Parallelism in Tree Building
Dongryeol Lee Kernel Methods
P0
P1
P2
P3
P2
P3
P4
P5
P6
P7
P4
P5
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
P0
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
P0
P0
P1
P1
P2
P1
P2
P4
P3
P3
P4
P6
P7
P5
P6
P5
P6
P7
P7
Build the first log p levels in parallel. Overall complexity on the hypercube topology: DN O p log Np + O (Dtw mbound (2p − log 4p)) + w) O 2N(D+t log p + O t2s log p (log p + 3) p
A Distributed Kernel Summation Framework Dongryeol Lee
Shared Memory Parallelism in Tree Building
Kernel Methods Solution Preview
1
Thesis Outline Problem Definition Thesis Statement and Contributions
1/2
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
1/4
Distributed Averaging/Random Feature-based GPR/KPCA
1/8
Distributed Averaging
Conclusion Omitted Research Work
Assign available number of threads to reduction processes necessary in computing the bounding primitive.
A Distributed Kernel Summation Framework
What Can be Indexed in Parallel?
Dongryeol Lee Kernel Methods Solution Preview
Any multi-dimensional binary tree.
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
@0,2L
Tree type kd-trees metric trees
@2,5L
@5,8L
@8,9L
Bound type
Rule(x)
hyper-rectangle {bd,min , bd,max }D d=1 hyper-sphere B(c, r ), c ∈ D R ,r > 0
xi ≤ si for 1 ≤ i ≤ D, bd,min ≤ si ≤ bd,max ||x − pleft || < ||x − pright || for pleft , pright ∈ RD
A Distributed Kernel Summation Framework
Contribution 4: Parallelization
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline
Q´R
Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Q L ,L ´ R
QL´ R
QR´ R
Q L ,R ´ R
Q R ,L ´ R
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Pre-divide and spawn off independent computations.
Q R ,R ´ R
A Distributed Kernel Summation Framework
Queueing up Independent Tasks
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
0
1
2
3
0
1
2
3
0
1
2
0
2
3
3
0
Distributed Averaging 0
2
Conclusion Omitted Research Work
1
3
3
1
2
3
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Overall Strong Scaling 10 million subset of SDSS Data Release 6. Overall strong scaling Parallel efficiency 1.0
Thesis Outline Problem Definition Thesis Statement and Contributions
0.8
0.6
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
0.4
0.2
Cores 24
Tree buildin g
96
384
Tree walk
1536
Computation
Conclusion Omitted Research Work
Raw numbers: (13.52, 339.36, 2371), (7.41, 24.38, 244), (2.93, 2.78, 98.78), (1.10, 0.27, 39.51)
A Distributed Kernel Summation Framework
Analysis of Strong Scaling
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Building a distributed multidimensional tree is different from building a geometrically constrained data structure such as octrees (communication costs in green). N DN log + O (Dtw mbound (2p − log 4p)) O p p t 2N(D + tw ) s +O log p + O log p (log p + 3) p 2
• Requires multiple all-reduce operations which hurt
scalability. • It is possible to trade the quality of the constructed tree
for scalablity.
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
Distributed Averaging
Conclusion Omitted Research Work
5 Conclusion
Omitted Research Work
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
GPR and KPCA Density estimation KDE: O N 2 / parameter. Clustering Mean shift O N 2 / iter / parameter.
⇒
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Regression: GPR. O(N 3 ) / parameter.
Nonlinear feature extraction: KPCA. O(N 3 ) / parameter.
⇒
≈
A Distributed Kernel Summation Framework
Distributed Averaging
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Each process maintains an estimate of the global average of the numbers in the network and iterates the following difference equation. P µi (t + 1) = µi (t) + (µj (t) − µi (t)) j∈Ni
Each state on each process converges to the average of the initial values on all nodes.
A Distributed Kernel Summation Framework
Distributed Averaging
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Popular technique used in literature on algorithms running on wireless sensor networks. A type of gossip-based algorithm.
A Distributed Kernel Summation Framework Dongryeol Lee
Contribution 5: Distributed Averaging + Random Features
Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Use random features to linearize the problem and apply distributed averaging on the localized averages (Chapter 9 of the thesis). Regression: GPR. O(N 3 ) / parameter.
Nonlinear feature extraction: KPCA. O(N 3 ) / parameter.
⇒
≈
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods
Distributed Averaging GPR Plot of the relative error distribution between the centralized estimates and the decentralized estimates.
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Relative error ç
ç ç
ç ç
ç
100
0.1
ç
ç
ç
ç
ì
ì
ò ô à ò
ò ô
à
ô
ì ò à ô
ì ò ô à
ì
ì
ì
ì
ò à ô
ò à ô
ò à ô
ò
10 - 4
à
ô
ç
ç
ç
ì
ì
ò à ô
ò à
ô
ì
ì
ò à ô
ò à
ô
ì
ì
ò à ô
ò à
ô
ç
ç
ì
ì
ò à ô
ò à
ô
ç
ç
ì
ì
ò à ô
ò à
ô
ç
ç
ì
ì
ò à ô
ò à
ô
ç
ç
ç
ì
ì
ò à ô
ò à ô
ç
ì
ì
ò à ô
ò à
10 - 7
ç
ç
ì
ì
ò à ô
ò à
ô æ
æ æ
Distributed Averaging
ç
Min
à
Lower quartile
ì
Mean
ò
Median
ô
Upper quartile
ç
Max
ç
ç
ç
ì
ì
ì
ò à ô
ò
ì
à
Distributed Averaging/Random Feature-based GPR/KPCA
ç
æ
à
à ò ô
ô
ô
æ æ æ
æ
æ
æ
æ æ
10 -10
æ
æ
æ
æ æ
æ
Conclusion
æ
æ æ
æ
æ æ
æ
Omitted Research Work
æ æ
æ æ æ
æ
5
10
15
20
25
30
Iteration numb
A Distributed Kernel Summation Framework
Distributed KPCA: Strong Scaling
Dongryeol Lee Kernel Methods Solution Preview Strong Scalability (Total size: 102,400 points of 784 dimensions)
Thesis Outline
60
Problem Definition Thesis Statement and Contributions
50
Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
40 Elapsed time
Scaling Kernel Summations
Flat MPI 2 threads / MPI 4 threads / MPI 8 threads / MPI
30
20
10
0 0 10
1
10
Number of cores
Conclusion Omitted Research Work
2
10
A Distributed Kernel Summation Framework
Distributed KPCA: Weak Scaling
Dongryeol Lee Kernel Methods
Weak Scalability (100K points / core)
65
Solution Preview Flat MPI
Thesis Outline
64
Problem Definition Thesis Statement and Contributions
Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
4 threads/MPI 8 threads/MPI 63
Total computation time
Scaling Kernel Summations
2 threads/MPI
62
61
60
59
Conclusion Omitted Research Work
58 0 10
1
10
Number of cores
2
10
A Distributed Kernel Summation Framework
Outline
Dongryeol Lee Kernel Methods
1 Kernel Methods
Solution Preview
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
2 Thesis Outline
Problem Definition Thesis Statement and Contributions 3 Scaling Kernel Summations
Approximation Methods Distributed and Shared Memory Parallelism 4 Distributed Averaging/Random Feature-based GPR/KPCA
Distributed Averaging
Distributed Averaging
Conclusion Omitted Research Work
5 Conclusion
Omitted Research Work
A Distributed Kernel Summation Framework
Conclusion
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Summary of contributions: • Parallel multidimensional trees. • Adopted algorithmic strategies from wide range of fields:
distributed averaging (networking), random features (machine learning), series expansion (computational physics). • Open-source contribution: MLPACK - more than 45K+
lines of contributed code. Paving the way for the next generation. “Utilizing the best general-dimension algorithms, approximation methods with error bounds, the distributed and shared memory parallelism can help scale kernel methods.”
A Distributed Kernel Summation Framework
List of Other Research Work
Dongryeol Lee Kernel Methods Solution Preview
Included in the thesis, but omitted in this presentation: Fast nonparametric clustering (Chapter 6): with Ping Wang, James M. Rehg, Alexander G. Gray. AISTATS 2007.
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
120 original converged 100
80
P 60
map
40
q∈Q
20
Distributed Averaging 0
Conclusion Omitted Research Work
−20 −80
−60
−40
−20
0
20
40
60
80
wj rj k(q,rj )
rj ∈R
P rj ∈R
wj k(q,rj )
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods
List of Other Research Work Included in the thesis, but omitted in this presentation: Higher-order extension of the kernel summation (Chapter 7): with Arkadas Ozakin, Alexander G. Gray. Submitted to Journal of Computational Physics.
Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
æ
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA
æ
cR L æ æ
æ cR
æ
æ
æ R1
æ
æ
æ
R2 cQ L
cQ
cQ Q
æ
Q
æ R2
æ æ æ
Conclusion æ æ
c R1
cQ R
æ
æ R1
æ
Distributed Averaging
Omitted Research Work
æ
cR 2
æ
æ cR R
æ æ
æ cR
æ
æ æ
cQ
A Distributed Kernel Summation Framework
Collaborators
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
• Professor Edmond Chow (Georgia Tech) • Professor Alexander G. Gray (Georgia Tech) • Kihwan Kim (NVIDIA Research) • Professor Richard Vuduc (Georgia Tech) • William March (Georgia Tech) • Nishant Metha (Georgia Tech) • Hua Ouyang (Georgia Tech) • Parikshit Ram (Georgia Tech) • Ryan Riegel (Georgia Tech) • Nikolaos Vasiloglou (Georgia Tech)
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
List of Other Collaborations Published but not included in the thesis chapters: • Run-time analysis of N-body problems:
with Parikshit Ram, William B.
March, Alexander G. Gray. NIPS 2009.
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
• Rank-approximate NN:
With Parikshit Ram, Hua Ouyang, Alexander G. Gray. NIPS
2009.
• GPR for motion trajectory analysis:
With Kihwan Kim, Irfan Essa. ICCV
2011. (post-proposal).
• Time-constrained NN:
With Parikshit Ram, Alexander G. Gray. SDM 2012.
(post-proposal).
• GPR for camera motion automation:
With Kihwan Kim, Irfan Essa.
CVPR 2012. (post-proposal).
On-going collaborations involving significant code transfer: • Parallel n-point correlation:
Other collaborations omitted...
With William B. March et al.
A Distributed Kernel Summation Framework Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
MLPACK: Open-source ML NIPS 2008 Demonstration, NIPS 2011 Big Learning Workshop
A Distributed Kernel Summation Framework
Pre-Proposal Publication List
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Dongryeol Lee, Alexander G. Gray, and Andrew W. Moore. Dual-Tree Fast Gauss Transforms. In: Advances in Neural Information Processing Systems, 2005. Dongryeol Lee and Alexander G. Gray. Faster Gaussian Summation: Theory and Experiment. In: Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 2006. Ping Wang, Dongryeol Lee, Alexander G. Gray, and James M. Rehg. Fast Mean Shift with Accurate and Stable Convergence. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 2007. Dongryeol Lee and Alexander G. Gray. Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method, In: Advances in Neural Information Processing Systems, 2008. Dongryeol Lee, Alexander G. Gray, and Andrew W. Moore. Dual-Tree Fast Gauss Transforms (arXiv). Parikshit Ram, Dongryeol Lee, Hua Ouyang, and Alexander G. Gray. Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions. In: Advances in Neural Information Processing Systems, 2009. Parikshit Ram, Dongryeol Lee, William B. March, and Alexander G. Gray. Linear-time Algorithms for Pairwise Statistical Problems. In: Advances in Neural Information Processing Systems, 2009. Spotlight Presentation.
A Distributed Kernel Summation Framework
Post-Proposal Publication List
Dongryeol Lee Kernel Methods Solution Preview
Thesis Outline Problem Definition Thesis Statement and Contributions
Scaling Kernel Summations Approximation Methods Distributed and Shared Memory Parallelism
Distributed Averaging/Random Feature-based GPR/KPCA Distributed Averaging
Conclusion Omitted Research Work
Dongryeol Lee, Arkadas Ozakin, and Alexander G. Gray. Multibody Multipole Methods. In Journal of Computational Physics, 2011. Kihwan Kim, Dongryeol Lee, and Irfan Essa. Gaussian Process Regression Flow for Analysis of Motion Trajectories. In: Proceedings of IEEE International Conference on Computer Vision, 2011. William B. March, Arkadas Ozakin, Dongryeol Lee, Ryan Riegel, and Alexander G. Gray. Multi-Tree Algorithms for Large-Scale Astrostatistics. In Advances in Machine Learning and Data Mining for Astronomy, Chapman and Hall/CRC Press, 2012. Dongryeol Lee, Richard Vuduc, and Alexander G. Gray. A Distributed Kernel Summation Framework for General-Dimension Machine Learning. To appear in SIAM International Conference on Data Mining, 2012. Best Paper Award. Parikshit Ram, Dongryeol Lee, and Alexander G. Gray. Nearest-Neighbor Search on a Time Budget via Max-Margin Trees. To appear in SIAM International Conference on Data Mining, 2012. Kihwan Kim, Dongryeol Lee, and Irfan Essa. Detecting Regions of Interest in Dynamic Scenes for Camera Motion. To appear in IEEE Conference on Computer Vision and Pattern Recognition, 2012. William B. March, Kent Czechowski, Marat Dukhan, Thomas Benson, Dongryeol Lee, Andy J. Connolly, Richard Vuduc, Edmond Chow, and Alexander G. Gray. Optimizing the Computation of N-Point Cor- relations on Large-Scale Astronomical Data. To appear in Proc. ACM/IEEE Conf. Supercomputing (SC), 2012 Nishant A. Mehta, Dongryeol Lee, and Alexander G. Gray. Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL. To appear in Advances in Neural Information Processing Systems, 2012.