learning is the guy with the paint brushâ. 5. Deep Neural Network ... Training a neural network .... intermediate layers are each trained to be auto encoders. 31.
Deep Neural Network and Transfer Learning 1
Workshop on Intro to Deep Neural Networks 26th to 27th August 2016
Presented By: Aqsa Saeed Qureshi Supervised By: Dr. Asifullah Khan (DCIS PIEAS)
Deep Neural Network and Transfer Learning
Outline 2
Learning feature hierarchies (Deep learning)
Auto-encoder Deep Belief Net Transfer Learning Transfer Learning in Deep Neural Network
Deep Neural Network and Transfer Learning
Deep Learning Overview 3
Train networks with many layers (vs. shallow nets with
just a couple of layers)
Multiple layers work to build an improved feature
space
Deep Neural Network and Transfer Learning
4
Learning feature hierarchies/Deep learning Deep Neural Network and Transfer Learning
Learning feature hierarchies/Deep learning 5
“ Confronted with an array of pixels, no computer inherently knows the difference between a house, a tree and a cat. Deep learning is the guy with the paint brush”
Deep Neural Network and Transfer Learning
Learning feature hierarchies/Deep learning 6
http://deeplearning4j.org/whydeeplearning.html
Deep Neural Network and Transfer Learning
7
Learning feature hierarchies/Deep learning Deep Neural Network and Transfer Learning
Image features 8
Features = local detectors
Combined to make prediction (in reality, features are more low-level)
Nose
Eye
Eye
Mouth Deep Neural Network and Transfer Learning
Face !
Standard image classification approach 9
Input
Use simple classifier Extract features e.g., logistic regression, SVMs Computer$vision$features$
SIFT$
Spin$image$
HoG$
RIFT$
Textons$
GLOH$
Slide$Credit:$Honglak$Lee$
Deep Neural Network and Transfer Learning
Fac e
Many hand crafted features exist… 10
Computer$vision$features$
SIFT$
Spin$image$
HoG$
RIFT$
Textons$
GLOH$
Slide$Credit:$Honglak$Lee$
… but very painful to design Deep Neural Network and Transfer Learning
Change image classification approach? 11
Input
Extract features Computer$vision$features$
SIFT$
Spin$image$
HoG$
RIFT$
Textons$
GLOH$
Use simple classifier e.g., logistic regression, SVMs
Can we learn features from data? Slide$Credit:$Honglak$Lee$
Deep Neural Network and Transfer Learning
Fac e
Why feature hierarchies 12
object models
object parts (combination of edges)
edges
pixels
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Deep learning algorithms 13
Deep Belief Network (DBN) (Hinton)
Deep sparse auto encoders (Bengio)
[Other related work: LeCun, Lee, Yuille, Ng …]
Deep Neural Network and Transfer Learning
Deep learning with autoencoders 14
Logistic regression
Neural network Sparse autoencoder Deep autoencoder
Deep Neural Network and Transfer Learning
Logistic regression 15
x1 Draw a logistic regression unit as:
x2 x3 +1
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Neural Network 16
String a lot of logistic units together. Example 3 layer network: x1
a1
a2
x2
a3
x3
Layer 3
+1
+1
Layer 1
Layer 2 http://ufldl.stanford.edu/eccv10-tutorial
/
Deep Neural Network and Transfer Learning
Neural Network 17
Example 4 layer network with 2 output units: x1 x2 x3 +1 +1
+1
Layer 1
Layer 2
Layer 3 http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Layer 4
Training a neural network 18
Given training set (x1, y1), (x2, y2), (x3, y3 ), …. Adjust parameters q (for every node) to make: (Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.) http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Unsupervised feature learning with a neural network 19
x1
x1
x2
x2
x3
a1 a2
x4
Network is trained to output the input (learn identify function).
x3 x4
a3 x5
x5 +1
x6 +1
Layer 1
x6
Layer 2
Autoencoder.
Layer 3
Trivial solution unless: - Constrain number of units in Layer 2 (learn compressed representation), or - Constrain Layer 2 to be sparse.
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
So: multiple layers make sense 20
Many-layer neural network architectures should be capable of learning the true underlying features and ‘feature logic’, and therefore generalise very well …
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
But, until very recently, our weight-learning algorithms simply did not work on multi-layer architectures 21
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 22
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 23
Train this layer first
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 24
Train this layer first then this layer http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 25
Train this layer first then this layer
then this layer http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 26
Train this layer first then this layer then this layer then this layer Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 27
Train this layer first then this layer then this layer then this layer finally this layer Deep Neural Network and Transfer Learning
The new way to train multi-layer NNs… 28
EACH of the (non-output) layers is trained to be an autoBasically, it is forced to learn good encoder
features that describe what comes from the previous layer http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
an auto-encoder is trained, with an absolutely standard weightadjustment algorithm 29
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
an auto-encoder is trained, with an absolutely standard weightadjustment algorithm to reproduce the input 30
By making this happen with (many) fewer units than the inputs, this forces the ‘hidden layer’ units to become good feature detectors http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
intermediate layers are each trained to be auto encoders 31
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Auto-Encoders 32
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Stacked Auto-Encoders 33
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Sparse Encoders 34
any given time many/most of the features will have a 0 value – Thus there is an implicit compression each time but with varying nodes – This leads to more localist variable length encodings where a particular node with value signifies the presence of a feature (small set of bases) – A type of simplicity bottleneck (regularizer) – This is easier for subsequent layers to use for learning
Deep Neural Network and Transfer Learning
Sparse Encoders 35
Sparsity Regularization:
Sparsity regularizer attempts to enforce a constraint on the sparsity of the output from the hidden layer.
L2 Regularization: When training a sparse autoencoder, it is possible to make the sparsity regulariser small by increasing the values of the weights w. Adding a regularization term on the weights to the cost function prevents it from happening.
Deep Neural Network and Transfer Learning
Unsupervised feature learning with a neural network 36 a1
Training a sparse autoencoder.
a2 a3
Given unlabeled training set x1, x2, …
Deep Neural Network and Transfer Learning
Final layer trained to predict class based on outputs from previous layers 37
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
And that’s that 38
That’s the basic idea There are many many types of deep learning,
different kinds of autoencoder, variations on
architectures and training algorithms, etc… Very fast growing area …
Deep Neural Network and Transfer Learning
Auto-Encoders 39
A type of unsupervised learning which tries to discover generic features of the data Learn identity function by learning important sub-features (not by just passing through data). Compression, etc. Can use just new features in the new training set or concatenate both
Deep Neural Network and Transfer Learning
Deep Belief Net 40
Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. Building block: 2-layer graphical model (Restricted Boltzmann Machine).
Deep Neural Network and Transfer Learning
Deep Belief Net 41
“ Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. [...] The lower layers receive top-down, directed connections from the layers above. The states of the units in the lowest layer represent a data vector.”
Deep Neural Network and Transfer Learning
Deep Belief Net 42
Motivation: The robustness and efficiency by which humans can recognize objects has ever been an intriguing challenge in computational intelligence. Theoretical results suggest that deep architectures are fundamental to learn complex functions that can represent high-level abstractions (e.g. vision, language) [Bengio, 2009]
Deep Neural Network and Transfer Learning
Deep Belief Net 43
Deep Versus Shallow Architecture:
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Deep Belief Net 44
DBNs are composed of several Restricted Boltzmann Machines (RBMs) stacked on top of each other.
Deep Neural Network and Transfer Learning
Deep Belief Net 45
An RBM is an energy-based generative model that consists of a layer of binary visible units, v, and a layer of binary hidden units, h.
Deep Neural Network and Transfer Learning
Deep Belief Net 46
Given an observed state, the energy of the joint configuration ofthe visible and hidden units (v, h) is given by (1):
Deep Neural Network and Transfer Learning
Deep Belief Net 47
The RBM defines a joint probability over (v, h):
where Z is the partition function, obtained by summing the energy of all possible (v, h) configurations:
Deep Neural Network and Transfer Learning
Deep Belief Net 48
Given a random input configuration v, the state of the hidden unit j is set to 1 with probability:
Similarly, given a random hidden vector, h, the state of the visible unit i can be set to 1 with probability:
Deep Neural Network and Transfer Learning
Deep Belief Net
Gibbs Sampling:
49
Deep Neural Network and Transfer Learning
Deep Belief Net 50
Alternating Gibbs Sampling:
Deep Neural Network and Transfer Learning
Deep Belief Net 51
Alternating Gibbs Sampling:
Deep Neural Network and Transfer Learning
Deep Belief Net 52
CONTRASTIVE DIVERGENCE (CD–k): v(0) ← x Compute the binary (features) states of the hidden units, h(0), using v(0) for n ← 1 to k Compute the “reconstruction” states for the visible units, v(n),using h(n−1) Compute the “reconstruction” states for the hidden units, h(n), using v(n) end for Deep Neural Network and Transfer Learning
Deep Belief Net 53
Update the weights and biases, according to:
Deep Neural Network and Transfer Learning
Deep Belief Net 54
Deep Neural Network and Transfer Learning
55
Deep learning examples
Deep Neural Network and Transfer Learning
Convolutional DBN on face images 56
object models
object parts (combination of edges)
edges
pixels Deep Neural Network and Transfer Learning
Learning of object parts 57
Examples of learned object parts from object categories Faces
Cars
Elephants
Deep Neural Network and Transfer Learning
Chairs
Deep Net with Greedy Layer Wise Training 58
ML Model
New Feature Space
Supervised Learning
Unsupervised Learning
Original Inputs http://axon.cs.byu.edu/~martinez/classes/678/Slides/Deep-Learning.pptx
Deep Neural Network and Transfer Learning
TRANSFER OF LEARNING 59
TRANSFER OF LEARNING http://www.slideshare.net/ocmonmoveonpeople/transfer-of-learning-by-lorraine-anoran?qid=2d5fdd3b-13e2-449b-9410dea9dcb2ed56&v=&b=&from_search=5
Deep Neural Network and Transfer Learning
Transfer of Learning 60
The study of dependency of human conduct, learning or
performance on prior experience. [Thorndike and Woodworth, 1901] explored how individuals would transfer in one context to another context that share similar characteristics. C++ Java Maths/Physics Computer Science/Economics
Deep Neural Network and Transfer Learning
Transfer Learning 61
The ability of a system to recognize and apply knowledge and
skills learned in previous tasks to novel tasks or new domains, which share some commonality. Given a target task, how to identify the commonality between the
task and previous (source) tasks, and transfer knowledge from the previous tasks to the target one?
Deep Neural Network and Transfer Learning
PositiveOF vs. Negative TRANSFER LEARNING 62
Positive transfer: - when learning in one context improves performance in some other context Negative transfer:
- when learning in one context has a negative impact on performance in another context http://www.slideshare.net/ocmonmoveonpeople/transfer-of-learning-by-lorraine-anoran?qid=2d5fdd3b-13e2-449b-9410dea9dcb2ed56&v=&b=&from_search=5
Deep Neural Network and Transfer Learning
Motivation 63
? Model
Assumptions: 1. Training and Test are from same distribution 2. Training and Test are in same feature space Deep Neural Network and Transfer Learning
Examples: Web-document Classification 64
?
Model
Learn a new model
Physics
Deep Neural Network and Transfer Learning
Machine Learning
Life Science
65
Learn new Model :
1.
Collect new Labeled Data 2. Build new model
Reuse & Adapt already learned model ! Deep Neural Network and Transfer Learning
Examples: Image Classification 66
Features Task One
Deep Neural Network and Transfer Learning
Model One
Examples: Image Classification 67
Reuse
Features Task One
Cars
Features Task Two
Model Two
Motorcycles
Task Two Deep Neural Network and Transfer Learning
Traditional Machine Learning vs. Transfer 68
Different Tasks
Learning System
Learning System
Source Task
Learning System
Traditional Machine Learning
Knowledge
Target Task
Learning System
Transfer Learning
Deep Neural Network and Transfer Learning
Traditional ML vs. TL 69
Humans can also transfer from one Humans can learn in many domains. domain to other domains. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 10, OCTOBER 2010
Deep Neural Network and Transfer Learning
test items
Transfer of learning across domains
training items
test items
training items
Traditional ML in multiple domains
Traditional ML vs. TL 70
http://www.slideshare.net/butest/ppt-3860159
Deep Neural Network and Transfer Learning
Notation
Domain: It consists of two components: A feature space
, a marginal distribution
In general, if two domains are different, then they may have different feature spaces or different marginal distributions.
Task: Given a specific domain and label space predict its corresponding label
, for each
in the domain, to
In general, if two tasks are different, then they may have different label spaces or different conditional distributions
http://www.slideshare.net/butest/ppt-3860159
Deep Neural Network and Transfer Learning
71
Notation 72
For simplicity, we only consider at most two domains and two tasks. Source domain: Task in the source domain: Target domain: Task in the target domain
http://www.slideshare.net/butest/ppt-3860159
Deep Neural Network and Transfer Learning
Why Transfer Learning? 73
In some domains, labeled data are in short supply. In some domains, the calibration effort is very expensive.
In some domains, the learning process is time consuming.
How to extract knowledge learnt from related domains to help learning in a target domain with a few labeled data? How to extract knowledge learnt from related domains to speed up learning in a target domain?
Transfer learning techniques may help! http://www.slideshare.net/butest/ppt-3860159
Deep Neural Network and Transfer Learning
Settings of Transfer Learning Transfer learning settings
Labeled data in a source domain
Labeled data in a target domain
Tasks
Inductive Transfer Learning
× √ √
√ √ ×
Classification Regression …
×
×
Clustering …
Transductive Transfer Learning
Unsupervised Transfer Learning
Classification Regression …
http://www.slideshare.net/butest/ppt-3860159
Deep Neural Network and Transfer Learning
74
An overview of various settings of transfer learning
Self-taught Learning
Case 1 No labeled data in a source domain
75
Inductive Transfer Learning Labeled data are available in a source domain Labeled data are available in a target domain
Case 2
Source and target tasks are learnt simultaneously
Multi-task Learning
Transfer Learning Labeled data are available only in a source domain No labeled data in both source and target domain
Transductive Transfer Learning
Assumption: different domains but single task
Domain Adaptation
Assumption: single domain and single task
Sample Selection Bias /Covariance Shift
Unsupervised Transfer Learning
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 10, OCTOBER 2010
Deep Neural Network and Transfer Learning
Conclusions 76
Transfer learning is to re-use source knowledge to help a
target learner
Self-Taught learning transfer unlabeled features
Deep Neural Network and Transfer Learning
Challenges of deep learning
Deep Neural Network 77 and Transfer Learning
Deep learning score card 78
Cons
Pros Enables learning of
features rather than hand tuning Impressive performance gains on Computer vision Speech recognition Potential for much more impact Deep Neural Network and Transfer Learning
Deep learning workflow 79
80%
Training set
Learn deep neural net model
Lots of labeled data 20%
Validatio n set
http://www.slideshare.net/AmazonWebServices/cmp305-deep-learning-on-aws-made-easycmp305
Deep Neural Network and Transfer Learning
Valid ate
Deep learning score card 80
Pros
Cons
Enables learning of
Computationally really
features rather than hand tuning Impressive performance gains on Computer vision Speech recognition Potential for much more impact
expensive Requires a lot of data for high accuracy Extremely hard to tune Choice of architecture Parameter types Hyperparameters incredibly hard to tune
Deep Neural Network and Transfer Learning
Deep features: Deep learning + Transfer learning
81
Transfer learning: idea 82
Instead of training a deep network from scratch for your task: Take a network trained on a different domain for a different source task
Adapt it for your domain and your target task
Deep Neural Network and Transfer Learning
Transfer learning: idea 83
http://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
Deep Neural Network and Transfer Learning
Algorithms: Self-Taught Learning 84
http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Algorithms: Self-Taught Learning 85
Framework: Source Unlabeled data set:
Target Labeled data set:
Build classifier for cars and Motorbikes http://ufldl.stanford.edu/eccv10-tutorial/
Deep Neural Network and Transfer Learning
Algorithm: Self-Taught Learning 86
Unlabeled Data Set
Deep Neural Network and Transfer Learning
Algorithms: Self-Taught Learning 87
Deep Neural Network and Transfer Learning
Transfer learning: Use data from one domain to help learn on another 88
Lots of data:
Some data:
Learn neural net
Neural net as feature extractor + Simple classifier Deep Neural Network and Transfer Learning
Great accuracy
Great accuracy on new problem
What’s learned in a neural net 89
vs.
Neural net trained for Task 1
More generic Can be used as feature extractor Deep Neural Network and Transfer Learning
Very specific to Task 1
Transfer learning in more detail… 90
For Task 2, learn only end part Use simple classifier e.g., logistic regression, SVMs
Clas s?
Neural net trained for Task 1 More generic Can be used as feature extractor Keep weights fixed! Deep Neural Network and Transfer Learning
Very specific to Task 1
Transfer learning: idea 91
Deep Neural Network and Transfer Learning
Example: PASCAL VOC 2007 92
Standard classification benchmark, 20 classes, ~10K
images, 50% train, 50% test Deep networks can have many parameters (e.g. 60M in Alexnet) Direct training (from scratch) using only 5K training images can be problematic. How can we use deep networks in this setting?
Deep Neural Network and Transfer Learning
Example: PASCAL VOC 2007 93
http://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
Deep Neural Network and Transfer Learning
“Off-the-shelf” 94
“Off-the-shelf” Idea: use outputs of one or more layers of a network trained on a different task as generic feature detectors. Train a new shallow model on these features.
http://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
Deep Neural Network and Transfer Learning
95
Works surprisingly well in practice! Surpassed or on par with state-of-the-art in several tasks in 2014 Image classification: PASCAL VOC 2007 Oxford flowers CUB Bird dataset MIT indoors Image retrieval: Paris 6k Holidays UKBench
Razavian et al, CNN Features off-the-shelf: an Astounding Baseline for Recognition, CVPRW 2014 http://arxiv.org/abs/1403.6382
Deep Neural Network and Transfer Learning
96
Can we do better than off the shelf features?
Domain adaptation
Deep Neural Network and Transfer Learning
Fine-tuning: supervised domain adaptation 97
Train deep net on “nearby” task for which it is easy to get labels using standard backprop E.g. ImageNet classification Pseudo classes from augmented data Cut off top layer(s) of network and replace with supervised objective for target domain Fine-tune network using backprop with labels for target domain until validation loss starts to increase fine-tuning: supervised domain adaptation
Deep Neural Network and Transfer Learning
Fine-tuning: supervised domain adaptation 98
http://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
Deep Neural Network and Transfer Learning
Freeze or fine-tune? 99
Bottom n layers can be frozen or fine tuned. Frozen: not updated during backprop Fine-tuned: updated during backprop Which to do depends on target task: Freeze: target task labels are scarce, and we want to avoid over fitting Fine-tune: target task labels are more plentiful In general, we can set learning rates to be different for each layer to find a tradeoff between freezing and fine tuning
Deep Neural Network and Transfer Learning
Freeze or fine-tune? 100
http://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-transfer-learning-and-domain-adaptation-upc-2016
Deep Neural Network and Transfer Learning
101
How transferable are features? Lower layers: more general features. Transfer very well to other tasks. Higher layers: more task specific. Fine-tuning improves generalization when sufficient examples are available. Transfer learning and fine tuning often lead to better performance than training from scratch on the target dataset. Even features transferred from distant tasks are often better than random initial weights!
Deep Neural Network and Transfer Learning
Summary 102
Possible to train very large models on small data by using
transfer learning and domain adaptation Off the shelf features work very well in various domains and tasks Lower layers of network contain very generic features, higher layers more task specific features Supervised domain adaptation via fine tuning almost always improves performance
Deep Neural Network and Transfer Learning
Questions… Thank You Deep Neural Network and Transfer Learning