Deep Neural Network and Transfer Learning

Deep Neural Network and Transfer Learning 1

Workshop on Intro to Deep Neural Networks 26th to 27th August 2016

Presented By: Aqsa Saeed Qureshi Supervised By: Dr. Asifullah Khan (DCIS PIEAS)

Deep Neural Network and Transfer Learning

Outline 2

 Learning feature hierarchies (Deep learning)

 Auto-encoder  Deep Belief Net  Transfer Learning  Transfer Learning in Deep Neural Network


Deep Learning Overview 3

 Train networks with many layers (vs. shallow nets with

just a couple of layers)

 Multiple layers work to build an improved feature

space


4

Learning feature hierarchies/Deep learning Deep Neural Network and Transfer Learning

Learning feature hierarchies/Deep learning 5

“ Confronted with an array of pixels, no computer inherently knows the difference between a house, a tree and a cat. Deep learning is the guy with the paint brush”


Learning feature hierarchies/Deep learning 6

http://deeplearning4j.org/whydeeplearning.html


7

Learning feature hierarchies/Deep learning Deep Neural Network and Transfer Learning

Image features 8

 Features = local detectors

Combined to make prediction  (in reality, features are more low-level) 

Nose

Eye

Eye

Mouth Deep Neural Network and Transfer Learning

Face !

Standard image classification approach 9

Input

Use simple classifier Extract features e.g., logistic regression, SVMs Computer$vision$features$

SIFT$

Spin$image$

HoG$

RIFT$

Textons$

GLOH$

Slide$Credit:$Honglak$Lee$


Fac e

Many hand crafted features exist… 10

Computer$vision$features$

SIFT$

Spin$image$

HoG$

RIFT$

Textons$

GLOH$

Slide$Credit:$Honglak$Lee$

… but very painful to design Deep Neural Network and Transfer Learning

Change image classification approach? 11

Input

Extract features Computer$vision$features$

SIFT$

Spin$image$

HoG$

RIFT$

Textons$

GLOH$

Use simple classifier e.g., logistic regression, SVMs

Can we learn features from data? Slide$Credit:$Honglak$Lee$


Fac e

Why feature hierarchies 12

object models

object parts (combination of edges)

edges

pixels

http://ufldl.stanford.edu/eccv10-tutorial/


Deep learning algorithms 13

 Deep Belief Network (DBN) (Hinton)

 Deep sparse auto encoders (Bengio)

[Other related work: LeCun, Lee, Yuille, Ng …]


Deep learning with autoencoders 14

 Logistic regression

 Neural network  Sparse autoencoder  Deep autoencoder


Logistic regression 15

x1 Draw a logistic regression unit as:

x2 x3 +1



Neural Network 16

String a lot of logistic units together. Example 3 layer network: x1

a1

a2

x2

a3

x3

Layer 3

+1

+1

Layer 1

Layer 2 http://ufldl.stanford.edu/eccv10-tutorial

/


Neural Network 17

Example 4 layer network with 2 output units: x1 x2 x3 +1 +1

+1

Layer 1

Layer 2

Layer 3 http://ufldl.stanford.edu/eccv10-tutorial/


Layer 4

Training a neural network 18

Given training set (x1, y1), (x2, y2), (x3, y3 ), …. Adjust parameters q (for every node) to make: (Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.) http://ufldl.stanford.edu/eccv10-tutorial/


Unsupervised feature learning with a neural network 19

x1

x1

x2

x2

x3

a1 a2

x4

Network is trained to output the input (learn identify function).

x3 x4

a3 x5

x5 +1

x6 +1

Layer 1

x6

Layer 2

Autoencoder.

Layer 3

Trivial solution unless: - Constrain number of units in Layer 2 (learn compressed representation), or - Constrain Layer 2 to be sparse.



So: multiple layers make sense 20

Many-layer neural network architectures should be capable of learning the true underlying features and ‘feature logic’, and therefore generalise very well …



But, until very recently, our weight-learning algorithms simply did not work on multi-layer architectures 21



The new way to train multi-layer NNs… 22




Train this layer first




Train this layer first then this layer http://ufldl.stanford.edu/eccv10-tutorial/



Train this layer first then this layer

then this layer http://ufldl.stanford.edu/eccv10-tutorial/



Train this layer first then this layer then this layer then this layer Deep Neural Network and Transfer Learning


Train this layer first then this layer then this layer then this layer finally this layer Deep Neural Network and Transfer Learning


EACH of the (non-output) layers is trained to be an autoBasically, it is forced to learn good encoder

features that describe what comes from the previous layer http://ufldl.stanford.edu/eccv10-tutorial/


an auto-encoder is trained, with an absolutely standard weightadjustment algorithm 29



an auto-encoder is trained, with an absolutely standard weightadjustment algorithm to reproduce the input 30

By making this happen with (many) fewer units than the inputs, this forces the ‘hidden layer’ units to become good feature detectors http://ufldl.stanford.edu/eccv10-tutorial/


intermediate layers are each trained to be auto encoders 31



Auto-Encoders 32



Stacked Auto-Encoders 33



Sparse Encoders 34

any given time many/most of the features will have a 0 value – Thus there is an implicit compression each time but with varying nodes – This leads to more localist variable length encodings where a particular node with value signifies the presence of a feature (small set of bases) – A type of simplicity bottleneck (regularizer) – This is easier for subsequent layers to use for learning


Sparse Encoders 35

Sparsity Regularization:

Sparsity regularizer attempts to enforce a constraint on the sparsity of the output from the hidden layer.

L2 Regularization: When training a sparse autoencoder, it is possible to make the sparsity regulariser small by increasing the values of the weights w. Adding a regularization term on the weights to the cost function prevents it from happening.


Unsupervised feature learning with a neural network 36 a1

Training a sparse autoencoder.

a2 a3

Given unlabeled training set x1, x2, …


Final layer trained to predict class based on outputs from previous layers 37



And that’s that 38

 That’s the basic idea  There are many many types of deep learning,

 different kinds of autoencoder, variations on

architectures and training algorithms, etc…  Very fast growing area …


Auto-Encoders 39

 A type of unsupervised learning which tries to discover generic features of the data  Learn identity function by learning important sub-features (not by just passing through data).  Compression, etc.  Can use just new features in the new training set or concatenate both


Deep Belief Net 40

Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. Building block: 2-layer graphical model (Restricted Boltzmann Machine).


Deep Belief Net 41

“ Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. [...] The lower layers receive top-down, directed connections from the layers above. The states of the units in the lowest layer represent a data vector.”


Deep Belief Net 42

Motivation: The robustness and efficiency by which humans can recognize objects has ever been an intriguing challenge in computational intelligence. Theoretical results suggest that deep architectures are fundamental to learn complex functions that can represent high-level abstractions (e.g. vision, language) [Bengio, 2009]


Deep Belief Net 43

Deep Versus Shallow Architecture:



Deep Belief Net 44

DBNs are composed of several Restricted Boltzmann Machines (RBMs) stacked on top of each other.


Deep Belief Net 45

An RBM is an energy-based generative model that consists of a layer of binary visible units, v, and a layer of binary hidden units, h.


Deep Belief Net 46

Given an observed state, the energy of the joint configuration ofthe visible and hidden units (v, h) is given by (1):


Deep Belief Net 47

The RBM defines a joint probability over (v, h):

where Z is the partition function, obtained by summing the energy of all possible (v, h) configurations:


Deep Belief Net 48

Given a random input configuration v, the state of the hidden unit j is set to 1 with probability:

Similarly, given a random hidden vector, h, the state of the visible unit i can be set to 1 with probability:


Deep Belief Net

Gibbs Sampling:

49


Deep Belief Net 50

Alternating Gibbs Sampling:


Deep Belief Net 51

Alternating Gibbs Sampling:


Deep Belief Net 52

CONTRASTIVE DIVERGENCE (CD–k):  v(0) ← x  Compute the binary (features) states of the hidden units, h(0), using v(0) for n ← 1 to k  Compute the “reconstruction” states for the visible units, v(n),using h(n−1)  Compute the “reconstruction” states for the hidden units, h(n), using v(n) end for Deep Neural Network and Transfer Learning

Deep Belief Net 53

Update the weights and biases, according to:


Deep Belief Net 54


55

Deep learning examples


Convolutional DBN on face images 56

object models

object parts (combination of edges)

edges

pixels Deep Neural Network and Transfer Learning

Learning of object parts 57

Examples of learned object parts from object categories Faces

Cars

Elephants


Chairs

Deep Net with Greedy Layer Wise Training 58

ML Model

New Feature Space

Supervised Learning

Unsupervised Learning

Original Inputs http://axon.cs.byu.edu/~martinez/classes/678/Slides/Deep-Learning.pptx


TRANSFER OF LEARNING 59

TRANSFER OF LEARNING http://www.slideshare.net/ocmonmoveonpeople/transfer-of-learning-by-lorraine-anoran?qid=2d5fdd3b-13e2-449b-9410dea9dcb2ed56&v=&b=&from_search=5


Transfer of Learning 60

 The study of dependency of human conduct, learning or

performance on prior experience.  [Thorndike and Woodworth, 1901] explored how individuals would transfer in one context to another context that share similar characteristics.  C++  Java  Maths/Physics  Computer Science/Economics


Transfer Learning 61

 The ability of a system to recognize and apply knowledge and

skills learned in previous tasks to novel tasks or new domains, which share some commonality.  Given a target task, how to identify the commonality between the

task and previous (source) tasks, and transfer knowledge from the previous tasks to the target one?


PositiveOF vs. Negative TRANSFER LEARNING 62

Positive transfer: - when learning in one context improves performance in some other context Negative transfer:

- when learning in one context has a negative impact on performance in another context http://www.slideshare.net/ocmonmoveonpeople/transfer-of-learning-by-lorraine-anoran?qid=2d5fdd3b-13e2-449b-9410dea9dcb2ed56&v=&b=&from_search=5


Motivation 63

? Model

Assumptions: 1. Training and Test are from same distribution 2. Training and Test are in same feature space Deep Neural Network and Transfer Learning

Examples: Web-document Classification 64

?

Model

Learn a new model

Physics


Machine Learning

Life Science

65

Learn new Model :

1.

Collect new Labeled Data 2. Build new model

Reuse & Adapt already learned model ! Deep Neural Network and Transfer Learning

Examples: Image Classification 66

Features Task One


Model One

Examples: Image Classification 67

Reuse

Features Task One

Cars

Features Task Two

Model Two

Motorcycles

Task Two Deep Neural Network and Transfer Learning

Traditional Machine Learning vs. Transfer 68

Different Tasks

Learning System

Learning System

Source Task

Learning System

Traditional Machine Learning

Knowledge

Target Task

Learning System

Transfer Learning


Traditional ML vs. TL 69

Humans can also transfer from one Humans can learn in many domains. domain to other domains. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 10, OCTOBER 2010


test items

Transfer of learning across domains

training items

test items

training items

Traditional ML in multiple domains

Traditional ML vs. TL 70

http://www.slideshare.net/butest/ppt-3860159


Notation

Domain: It consists of two components: A feature space

, a marginal distribution

In general, if two domains are different, then they may have different feature spaces or different marginal distributions.

Task: Given a specific domain and label space predict its corresponding label

, for each

in the domain, to

In general, if two tasks are different, then they may have different label spaces or different conditional distributions



71

Notation 72

For simplicity, we only consider at most two domains and two tasks. Source domain: Task in the source domain: Target domain: Task in the target domain



Why Transfer Learning? 73

 In some domains, labeled data are in short supply.  In some domains, the calibration effort is very expensive.

 In some domains, the learning process is time consuming.

 How to extract knowledge learnt from related domains to help learning in a target domain with a few labeled data?  How to extract knowledge learnt from related domains to speed up learning in a target domain?

 Transfer learning techniques may help! http://www.slideshare.net/butest/ppt-3860159


Settings of Transfer Learning Transfer learning settings

Labeled data in a source domain

Labeled data in a target domain

Tasks

Inductive Transfer Learning

× √ √

√ √ ×

Classification Regression …

×

×

Clustering …

Transductive Transfer Learning

Unsupervised Transfer Learning

Classification Regression …



74

An overview of various settings of transfer learning

Self-taught Learning

Case 1 No labeled data in a source domain

75

Inductive Transfer Learning Labeled data are available in a source domain Labeled data are available in a target domain

Case 2

Source and target tasks are learnt simultaneously

Multi-task Learning

Transfer Learning Labeled data are available only in a source domain No labeled data in both source and target domain

Transductive Transfer Learning

Assumption: different domains but single task

Domain Adaptation

Assumption: single domain and single task

Sample Selection Bias /Covariance Shift

Unsupervised Transfer Learning

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 10, OCTOBER 2010


Conclusions 76

 Transfer learning is to re-use source knowledge to help a

target learner

 Self-Taught learning transfer unlabeled features


Challenges of deep learning

Deep Neural Network 77 and Transfer Learning

Deep learning score card 78

Cons

Pros  Enables learning of

features rather than hand tuning  Impressive performance gains on  Computer vision  Speech recognition  Potential for much more impact Deep Neural Network and Transfer Learning

Deep learning workflow 79

80%

Training set

Learn deep neural net model

Lots of labeled data 20%

Validatio n set

http://www.slideshare.net/AmazonWebServices/cmp305-deep-learning-on-aws-made-easycmp305


Valid ate

Deep learning score card 80

Pros

Cons

 Enables learning of

 Computationally really

features rather than hand tuning  Impressive performance gains on  Computer vision  Speech recognition  Potential for much more impact

expensive  Requires a lot of data for high accuracy  Extremely hard to tune  Choice of architecture  Parameter types  Hyperparameters  incredibly hard to tune


Deep features: Deep learning + Transfer learning

81

Transfer learning: idea 82

Instead of training a deep network from scratch for your task:  Take a network trained on a different domain for a different source task

 Adapt it for your domain and your target task



http://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016


Algorithms: Self-Taught Learning 84




 Framework:  Source Unlabeled data set:



Target Labeled data set:

Build classifier for cars and Motorbikes http://ufldl.stanford.edu/eccv10-tutorial/


Algorithm: Self-Taught Learning 86

Unlabeled Data Set




Transfer learning: Use data from one domain to help learn on another 88

Lots of data:

Some data:

Learn neural net

Neural net as feature extractor + Simple classifier Deep Neural Network and Transfer Learning

Great accuracy

Great accuracy on new problem

What’s learned in a neural net 89

vs.

Neural net trained for Task 1

More generic Can be used as feature extractor Deep Neural Network and Transfer Learning

Very specific to Task 1

Transfer learning in more detail… 90

For Task 2, learn only end part Use simple classifier e.g., logistic regression, SVMs

Clas s?

Neural net trained for Task 1 More generic Can be used as feature extractor Keep weights fixed! Deep Neural Network and Transfer Learning

Very specific to Task 1



Example: PASCAL VOC 2007 92

 Standard classification benchmark, 20 classes, ~10K

images, 50% train, 50% test  Deep networks can have many parameters (e.g. 60M in Alexnet)  Direct training (from scratch) using only 5K training images can be problematic.  How can we use deep networks in this setting?


Example: PASCAL VOC 2007 93



“Off-the-shelf” 94

“Off-the-shelf” Idea: use outputs of one or more layers of a network trained on a different task as generic feature detectors. Train a new shallow model on these features.



95

Works surprisingly well in practice! Surpassed or on par with state-of-the-art in several tasks in 2014 Image classification:  PASCAL VOC 2007  Oxford flowers  CUB Bird dataset  MIT indoors Image retrieval:  Paris 6k  Holidays  UKBench

Razavian et al, CNN Features off-the-shelf: an Astounding Baseline for Recognition, CVPRW 2014 http://arxiv.org/abs/1403.6382


96

Can we do better than off the shelf features?

Domain adaptation


Fine-tuning: supervised domain adaptation 97

Train deep net on “nearby” task for which it is easy to get labels using standard backprop  E.g. ImageNet classification  Pseudo classes from augmented data Cut off top layer(s) of network and replace with supervised objective for target domain Fine-tune network using backprop with labels for target domain until validation loss starts to increase fine-tuning: supervised domain adaptation


Fine-tuning: supervised domain adaptation 98



Freeze or fine-tune? 99

Bottom n layers can be frozen or fine tuned.  Frozen: not updated during backprop  Fine-tuned: updated during backprop Which to do depends on target task:  Freeze: target task labels are scarce, and we want to avoid over fitting  Fine-tune: target task labels are more plentiful In general, we can set learning rates to be different for each layer to find a tradeoff between freezing and fine tuning


Freeze or fine-tune? 100



101

How transferable are features? Lower layers: more general features. Transfer very well to other tasks. Higher layers: more task specific. Fine-tuning improves generalization when sufficient examples are available. Transfer learning and fine tuning often lead to better performance than training from scratch on the target dataset. Even features transferred from distant tasks are often better than random initial weights!


Summary 102

 Possible to train very large models on small data by using

transfer learning and domain adaptation  Off the shelf features work very well in various domains and tasks  Lower layers of network contain very generic features, higher layers more task specific features  Supervised domain adaptation via fine tuning almost always improves performance


Questions… Thank You Deep Neural Network and Transfer Learning

Deep Neural Network and Transfer Learning

Deep Neural Network and Transfer Learning

Suggest Documents

Transfer Learning with Deep Convolutional Neural Network for ... - MDPI

Convolutional Neural Network-Based Transfer Learning and

Deep convolutional neural network with transfer

Deep neural networks and transfer learning applied to

End-to-End Deep Neural Networks and Transfer Learning for

Live Target Detection with Deep Learning Neural Network and ...

Deep Convolutional Neural Network for

Deep Gate Recurrent Neural Network

Deep Columnar Convolutional Neural Network

Deep Neural Network Computes Electron Densities and

Deep transfer learning-based hologram

Deep Adaptive Network: An Efficient Deep Neural Network - arXiv

Stacking-Based Deep Neural Network: Deep Analytic Network

Learning Depth from Single Images with Deep Neural Network ... - arXiv

Compact Hash Code Learning with Binary Deep Neural Network

Learning Traffic as Images: A Deep Convolution Neural Network for

SNR-Based Progressive Learning of Deep Neural Network for Speech ...

Weightless Neural Network with Transfer Learning to Detect

A Deep Reinforcement Learning Neural Network ... .orgwww.researchgate.net › publication › fulltext › DeepFoldi

Learning Adversary-Resistant Deep Neural

Cascade Convolutional Neural Network Based on Transfer-Learning ...

Cascade Convolutional Neural Network Based on Transfer-Learning

Hybrid Heterogeneous Transfer Learning through Deep Learning

Deep Learning and Neural Networks - Google Sites