Jun 22, 2010 - 9. NIKON CORPORATION. Core Technology Center. June 22, 2010. VB hardly overfits. [Raiko et al.2007]. [Lim
Core Technology Center
Tokyo Institute of Technology
Implicit Regularization in Variational Bayesian Matrix Factorization Shinichi Nakajima (Nikon) Masashi Sugiyama (Tokyo Tech.) June 22, 2010
NIKON CORPORATION Core Technology Center June 22, 2010
Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods
¾ Theoretical results ¾ VBMF has implicit regularization
¾ Discussion ¾ Mechanism of implicit regularization
¾ Conclusion
2
NIKON CORPORATION Core Technology Center June 22, 2010
Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods
¾ Theoretical results ¾ VBMF has implicit regularization
¾ Discussion ¾ Mechanism of implicit regularization
¾ Conclusion
3
NIKON CORPORATION Core Technology Center June 22, 2010
Matrix factorization
…
Movie M
Multivariate analysis (RRR, CCA, oPLS) Missing entries prediction (CF)
Movie 3 Movie 2 Movie 1
Applications:
User 1 User 2 User 3 …
Output
Input
User L
4
NIKON CORPORATION Core Technology Center June 22, 2010
Bayesian matrix factorization model
5
NIKON CORPORATION Core Technology Center June 22, 2010
Bayesian estimation
Not easy to calculate. Approximation methods.
[Salakhutdinov&Mnih2008] 6
NIKON CORPORATION Core Technology Center June 22, 2010
Free energy minimization
[Lim&Teh2007, Raiko et al.2007]
Constraint makes optimization much easier. [Lim&Teh2007] [Raiko et al.2007]
7
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits
[Raiko et al.2007]
[Lim&Teh2007]
8
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
[Raiko et al.2007]
[Lim&Teh2007]
9
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
VB test error
[Raiko et al.2007]
[Lim&Teh2007]
10
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
VB test error
[Raiko et al.2007]
[Lim&Teh2007]
Why? 11
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
VB test error
[Raiko et al.2007]
[Lim&Teh2007]
Why?
Because it’s Bayesian! 12
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
VB test error
[Raiko et al.2007]
[Lim&Teh2007]
Why?
Because it’s Bayesian!
Implicit regularization exists. 13
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
VB test error
[Raiko et al.2007]
[Lim&Teh2007]
In this paper, we show implicit regularization of VBMF. explain its mechanism. 14
NIKON CORPORATION Core Technology Center June 22, 2010
VB hardly overfits MAP/ML test error
VB test error
[Raiko et al.2007]
[Lim&Teh2007]
Notes: we assume no missing entries.
[Raiko et al.2007] 15
NIKON CORPORATION Core Technology Center June 22, 2010
Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods
¾ Theoretical results ¾ VBMF has implicit regularization
¾ Discussion ¾ Mechanism of implicit regularization
¾ Conclusion
16
NIKON CORPORATION Core Technology Center June 22, 2010
Maximum likelihood (ML) estimator
17
NIKON CORPORATION Core Technology Center June 22, 2010
MAP estimator (Theorem 1)
18
NIKON CORPORATION Core Technology Center June 22, 2010
VB estimator (Theorem 2, 3)
19
NIKON CORPORATION Core Technology Center June 22, 2010
Implicit regularization
(Singular component-wise) positive-part James-Stein (PJS) estimator. [James&Setin1961, Efron&Morris1973] PJS
20
NIKON CORPORATION Core Technology Center June 22, 2010
Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods
¾ Theoretical results ¾ VBMF has implicit regularization
¾ Discussion ¾ Mechanism of implicit regularization
¾ Conclusion
21
NIKON CORPORATION Core Technology Center June 22, 2010
Simplest case
Equivalent classes 22
NIKON CORPORATION Core Technology Center June 22, 2010
Bayes posterior
23
NIKON CORPORATION Core Technology Center June 22, 2010
Bayes posterior
: ML estimators
: MAP estimators
24
NIKON CORPORATION Core Technology Center June 22, 2010
Bayes posterior
: ML estimators
: MAP estimators
Bayes posterior has two modes! 25
NIKON CORPORATION Core Technology Center June 22, 2010
Bayes posterior
: ML estimators
: MAP estimators
Bayes posterior has two modes! 26
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
27
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
28
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
29
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
Both modes are approximated. 30
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
31
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
One of the modes is approximated. 32
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior Approximation of Bayes posterior
One of the modes is approximated. 33
NIKON CORPORATION Core Technology Center June 22, 2010
VB posterior
Ridge between the peaks attracts VB posterior. Implicit regularization
34
NIKON CORPORATION Core Technology Center June 22, 2010
Contents ¾ Introduction ¾ Bayesian matrix factorization ¾ Approximation methods
¾ Theoretical results ¾ VBMF has implicit regularization
¾ Discussion ¾ Mechanism of implicit regularization
¾ Conclusion
35
NIKON CORPORATION Core Technology Center June 22, 2010
Conclusion ¾ Derived bounds of VBMF solution. ¾ Explained mechanism of implicit regularization. ¾ What were omitted. Please visit to our poster! ¾ Explanation with Jeffreys prior. ¾ Bounds of empirical VBMF (EVBMF) solution.
¾ Future work ¾ Tighter bounds ¾ Analyze imputation cases.
36
NIKON CORPORATION Core Technology Center June 22, 2010
Empirical VBMF (EVBMF)
38
NIKON CORPORATION Core Technology Center June 22, 2010
EVB estimator (Theorem 4, 5)
39
NIKON CORPORATION Core Technology Center June 22, 2010
EVB posterior
Stronger regularization than VB with flat prior. 40
NIKON CORPORATION Core Technology Center June 22, 2010
Jeffreys prior
[Jeffreys1946].
Parameterization invariant.
Equivalent
No regularization
Flat prior in (A, B) distributes more mass around origin than flat prior in U. 41
NIKON CORPORATION Core Technology Center June 22, 2010
Appendix: Negative log likelihood
42
NIKON CORPORATION Core Technology Center June 22, 2010
Appendix: VB Free energy
43