Qualitatively Characterizing Neural Network Optimization ... - ICLR

Qualitatively Characterizing Neural Network Optimization Problems

Ian Goodfellow

Oriol Vinyals

1

Andrew Saxe

Traditional view of NN training

2015 International Conference on Representation Learning --- Goodfellow, Vinyals, and Saxe

2

Factored linear view (Cartoon of Saxe et al 2013’s worldview)


3

Attractive saddle point view (Cartoon of Dauphin et al 2014’s worldview)


4

Questions

• Does SGD get stuck in local minima? • Does SGD get stuck on saddle points? • Does SGD wind around numerous bumpy obstacles? • Does SGD thread a twisting canyon?


5

History written by the winners • Visualize trajectories of (near) SOTA results • Selection bias: looking at success • Failure is interesting, but hard to attribute to optimization • Careful with interpretation • SGD never encounters X? • SGD fails if it encounters X?


6

2-D subspace visualization

Cost

Projection 1 Projection 0 2015 International Conference on Representation Learning --- Goodfellow, Vinyals, and Saxe

7

A special 1-D subspace Interpolation plot

Learning curve


8

2-parameter deep linear model 2

1.5

Gradient Manifold of global minima Saddle point Gradient descent path Linear path Interpolation between minima

4.5

4

1 3.5

w2

0.5

3

2.5

0

2 −0.5 1.5 −1 1

−1.5 0.5

−2 −2

−1.5

−1

−0.5

0 w1

0.5

1

1.5


2

9

Maxout / MNIST experiment


10

Other activation functions

The linear interpolation curves for fully connected networks with different Left) Sigmoid activation function. Right) ReLU activation function. 2015 International Conference on Representation Learning --- Goodfellow, Vinyals, and Saxe

11

eere 5:we Here use interpolation linear interpolation to search forminima. local minima. Left) By interpola usewe linear to search for local Left) By interpolating be SGD solutions, we that showeach thatsolution each solution is a different local minimu ntifferent SGD solutions, we show is a different local minimum with ubspace. If we interpolate between a random space SGD ce. Right)Right) If we interpolate between a random point inpoint spaceinand an and SGDansolutio local minima thesolution, SGD solution, suggesting thatminima local minima lominima besidesbesides the SGD suggesting that local are rare.are rare.

Convolutional network

A small barrier

he interpolation experiment for a convolutional maxoutmaxout networknetwork on the CIFA e 6:linear The linear interpolation experiment for a convolutional on t & Hinton, 2009).2009). Left) At a global scale, the curve well-beh etzhevsky (Krizhevsky & Hinton, Left) At a global scale, thelooks curvevery looks very w med in nearinthe initial point, we seewe there a shallow barrier barrier that SGD ) Zoomed near the initial point, seeisthere is a shallow thatmust SGDnavig mu 2015 International Conference on Representation Learning --- Goodfellow, Vinyals, and Saxe

12

ear this anomaly.

LSTM


13

MP-DBM


14

3-D Visualization


15

3-D MP-DBM visualization


16

Published as a conference paper at ICLR 2015

Random walk control

2015 International Conference Representation Learning Goodfellow, Vinyals, and Saxe Figure 11: Plots of theon projection along the --axis from initialization to

17 solution versus the norm

3-D Plots Without Obstacles

Adversarial ReLUs LSTM

Factored Linear


18

3-D Plot of Adversarial Maxout

SGD naturally exploits negative curvature!

Obstacles!


19

Conclusion •

For most problems, there exists a linear subspace of monotonically decreasing values

•

For some problems, there are obstacles between this subspace the SGD path

•

Factored linear models capture many qualitative aspects of deep network training

•

See more visualizations at our poster / demo / paper


20

Qualitatively Characterizing Neural Network Optimization ... - ICLR

Qualitatively Characterizing Neural Network Optimization ... - ICLR

Suggest Documents

Qualitatively characterizing neural network optimization problems

Neural Network Inverse Modeling for Optimization

Sensor optimization using neural network sensitivity ... - CiteSeerX

Optimization of Convolutional Neural Network using Microcanonical ...

Exploring Convolutional Neural Network Structures and Optimization ...

Eugenic Evolution for Combinatorial Optimization - Neural Network ...

Hyperparameters Optimization in Deep Convolutional Neural Network

Evolving neural network optimization of cholesteryl

Optimization of Convolutional Neural Network using Microcanonical

Multi-response optimization and neural network

Optimization of neural network model using modified

Hyper-Parameter Optimization for Convolutional Neural Network

Optimization in an Error Backpropagation Neural Network

Meta-learning approach to neural network optimization

Dynamic Optimization of Neural Network Structures Using ...

Genetic algorithm for neural network architecture optimization

Architecture Optimization Model of Probabilistic Neural Network

Niche Particle Swarm Optimization for Neural Network

A Neural Network Paradigm for Characterizing Reusable Software - SCE

A Neural Network Paradigm for Characterizing Reusable ... - CiteSeerX

Stochastic Binary Neural Networks for Qualitatively Robust Predictive ...

Neural Network

Neural network

Neural Network