A Summary of Parallel Learning Efforts DIMACS Workshop on ...

A Summary of Parallel Learning Efforts DIMACS Workshop on Parallelism: A 2020 Vision John Langford (Yahoo!)

What is Machine Learning? The simple version:

Given data (x, y )∗ find a function f (x) which predicts y .

y ∈ {0, 1} is a “label” x ∈ R n is “features” f (x) = hw · xi is a linear predictor.

What is Machine Learning? The simple version:

Given data (x, y )∗ find a function f (x) which predicts y .

y ∈ {0, 1} is a “label” x ∈ R n is “features” f (x) = hw · xi is a linear predictor.

y might be more complex and structured. Or nonexistent... x might be a sparse vector or a string. f can come from many more complex functional spaces. In general: the discipline of data-driven prediction.

Where is Machine Learning? At

:

1

Is the email spam or not?

2

Which news article is most interesting to a user?

3

Which ad is most interesting to a user?

4

Which result should come back from a search?

Where is Machine Learning? At

:

1

Is the email spam or not?

2

Which news article is most interesting to a user?

3

Which ad is most interesting to a user?

4

Which result should come back from a search?

In the rest of the world. 1

“statistical arbitrage”

2

Machine Translation

3

Watson

4

Face detectors in cameras

5

... constantly growing.

How does it work?

A common approach = gradient descent. Suppose we want to choose w for f (x) = hw · xi. Start with w = 0. Compute a “loss” according to lf (x, y ) = (f (x) − y )2 ∂lf . Alter the weights according to w ← w − η ∂w

How does it work?

A common approach = gradient descent. Suppose we want to choose w for f (x) = hw · xi. Start with w = 0. Compute a “loss” according to lf (x, y ) = (f (x) − y )2 ∂lf . Alter the weights according to w ← w − η ∂w

There are many variations and many other approaches. All efficient methods have some form of greedy optimization core. But it’s not just optimization: 1

We must predict the y correctly for new x.

2

There are popular nonoptimization methods as well.

Demonstration

Learning to classify news articles (RCV1 dataset)

Demonstration

Learning to classify news articles (RCV1 dataset)

An Outline of What’s Next Ron Bekkerman, Misha Bilenko and I are editing a book on “Scaling up Machine Learning”. Overview Next.

What’s in the book?

Parallel Unsupervised Learning Methods 1

Information-Theoretic Co-Clustering with MPI

2

Spectral Clustering using MapReduce as a subroutine

3

K-Means with GPU

4

Latent Dirichlet Analysis with MPI

It’s very hard to compare different results.

RBM-93K GPU Images

K-Means-400 GPU Synthetic

LDA-2000 MPI-1024 Medline

1e+06

Spec. Clust-53 MapRed-128 RCV1

Info CC MPI-400 RCV1 800x800

Features/s

... But let’s try Speed per method

1e+07 parallel single

100000

10000

1000

100

10

Ground Rules

Ginormous caveat: Prediction performance varies wildly depending on the problem–method pair.

Ground Rules


The standard: Input complexity/time.

Ground Rules



⇒ No credit for creating complexity then reducing it. (Ouch!)

Ground Rules



⇒ No credit for creating complexity then reducing it. (Ouch!)

Most interesting results reported. Some cases require creative best-effort summary.

Linear Threads-2 RCV1

Boosted DT MPI-32 Ranking

Decision Tree MapRed-200 Ad-Bounce

RBF-SVM TCP-48 MNIST 220K

1e+06

Ensemble Tree MPI-128 Synthetic

RBF-SVM MPI?-500 RCV1

Features/s

Supervised Training Speed per method

1e+07 parallel single

100000

10000

1000

100

Conv NN Asic(sim) Caltech

Boosted DT GPU Images

Conv. NN FPGA Caltech

100000

Inference MPI-40 Protein

Speech-WFST GPU NIST

Features/s

Supervised Testing (but not training)

Speed per method

1e+06

parallel single

10000

1000

100

My Flow Chart for Learning Optimization

1

Choose an efficient effective algorithm

2

Use compact binary representations.

3

If (Computationally Constrained)

4

then GPU else

5

1 2 3

If few learning steps then Map-Reduce else Research Problem.

A Summary of Parallel Learning Efforts DIMACS Workshop on ...

A Summary of Parallel Learning Efforts DIMACS Workshop on ...

Suggest Documents

Workshop Summary

Workshop Summary

CCR/DIMACS Workshop/Tutorial on Mining Massive Data Sets and ...

Summary Report of PQRI Workshop on Nanomaterial

pdf-1876\summary-of-a-workshop-on-software-certification-and ...

Summary of DRF Online Workshop âLearning the Research Process ...

Workshop Summary - NCBI

Workshop Summary - Europe PMC

Workshop Summary

Demonstration tools for Collaborative E-Learning - DIMACS

Successful STEM Education: A Workshop Summary - Indiana ...

Learning for Parallel Virtual Urban Workshop: An Innovate Method for

Learning for Parallel Virtual Urban Workshop: An ...

Summary of Research on Project-based Learning

An Improved Data Stream Summary: The Count-Min ... - Dimacs

A Review of the Fourth International Workshop on Machine Learning ...

workshop summary - Instituto de AstronomÃa

Slides - Parallel Data Storage Workshop

Public Health Workshop Summary - data.unhcr.org

workshop summary - Instituto de AstronomÃa

Summary of the 8th International Workshop on [email protected]

Summary of the SIGIR 2003 Workshop on Defining Evaluation ... - csiro

Summary of workshop on Future Physics with HERA Data

decimation - DIMACS

A Summary of Parallel Learning Efforts DIMACS Workshop on ...