Matrix Differential Calculus Cheat Sheet

81 downloads 1034962 Views 183KB Size Report
Aug 30, 2013 ... Matrix Differential Calculus Cheat Sheet ... Rules for the differentials1 ... (1) d(αφ) = α dφ d(αu) = α du d(αU) = α dU. (2) d(φ + ψ) = dφ + dψ.
Matrix Differential Calculus Cheat Sheet Blue Note 142 (started on 27-08-2013)

Stefan Harmeling compiled on 30-8-2013 15:45

Rules for the differentials1 Let α, a, A be constants and φ, ψ, u, v, x, f , U , V , F be functions. dα = 0

da = 0n

d(αφ) = α dφ

d(αu) = α du

d(φ + ψ) = dφ + dψ

d(u + v) = du + dv T

d(φ ψ) = (dφ)ψ + φ dψ

dA = 0mn

T

d(φ/ψ) = ((dφ)ψ − φ dψ) / ψ

T

d(φ ) = αφ

(3)

d(u ) = (du)

d(U V ) = (dU )V + U dV

T

T

T

d(U ) = (dU ) d tr U = tr dU

d(U ⊗ V ) = (dU ) ⊗ V + U ⊗ dV α−1

d(U + V ) = dU + dV T

d vec U = vec dU α

(2)

d(u v) = (du) v + u dv 2

(1)

d(αU ) = α dU



d det U = det(U ) tr(U

d(U −1

dU )

) = −U

−1

d log(det U ) = tr(U

d exp φ = exp(φ) dφ

(5) (6)

d(U V ) = (dU ) V + U dV −1

(4)

(dU ) U

−1

−1

dU )

(7) (8) (9)

tr(d exp U ) = tr(exp(U ) dU )

(10)

Identification table Note that the differential has always the same shape as the function. φ(ξ) φ(x) φ(X)

function

differential

derivative

R→R

dφ = α(ξ)dξ

Dφ(ξ) = α(ξ)

n

T

R →R R

n×q

dφ = a(x) dx

→R m

f (ξ)

R→R

f (x)

Rn → Rm

f (X) F (ξ) F (x) F (X)

R

n×q

→R

R→R n

R

n×q

m

m×p

R →R

m×p

→R

Dφ(x) = a(x)

T

m×p

shape of derivative T T

1×1

(11)

1×n

(12)

1 × nq

(13)

dφ = tr A(X) dX

Dφ(X) = (vec A(X))

df = a(ξ) dξ

Df (ξ) = a(ξ)

m×1

(14)

df = A(x) dx

Df (x) = A(x)

m×n

(15)

df = A(X) dvec X

Df (X) = A(X)

m × nq

(16)

dF = A(ξ) dξ

DF (ξ) = vec A(ξ)

mp × 1

(17)

dvec F = A(x) dx

DF (x) = A(x)

mp × n

(18)

dvec F = A(X) dvec X

DF (X) = A(X)

mp × nq

(19)

Notation α, β, φ, ξ, . . .

lower case Greek letters are scalars

(20)

a, b, c, f, x, . . . A, B, C, F, X, . . .

lower case Latin letters are (column) vectors capital Latin letters are matrices

(21) (22)



differential of φ

(23)

Dφ(x)

derivative of φ at x

(24)

0n , 0mn

n vector of zeros, m × n matrix of zeros

(25)

1n , 1mn

n vector of ones, m × n matrix of ones

(26)

In , Imn

n × n identity matrix, m × n identity matrix

(27)

A , tr A, det A

transpose of A, trace of A, determinant of A

(28)

vec A

vector containing the stacked columns of A

(29)

diag A

vector containing the diagonal of A

(30)

Diag a

diagonal matrix with a along the diagonal

(31)

exp α

scalar exponential function

(32)

exp A

matrix exponential function, det(exp(A)) = exp(tr(A))

(33)

A B

Hadamard product (component-wise product)

(34)

A B

component-wise division

(35)

T

A⊗B

T

Kronecker product, vec ab = b ⊗ a

(36)

1 Most of the material of this document is from [2], available at at the author’s webpage http://www.janmagnus.nl/misc/ mdc2007-3rdedition.pdf. Another good resource for matrix wisdom is [1].

1

Matrix Differential Calculus Cheat Sheet Blue Note 142 (started on 27-08-2013)

Stefan Harmeling compiled on 30-8-2013 15:45

More rules T T tr A = 1T m (A Imn ) 1n = 1 diag(A) = tr A

tr(AB) = tr(BA) diag(U V T ) = (U V )1n

(37)

tr(U T (V C)) = tr((U T V T )C)

(38)

A ⊗ 1l = (Im ⊗ 1l )A

1l ⊗ A = (1l ⊗ Im )A

(39)

Diag a = a1Tn In

diag A = vec(A In ) = (A In )1n

(40)

kU k2Fro = tr(U T U ) = vec(U )T vec(U )

(41)

Diag(diag A) = In A T

T

vec(ABC) = (C ⊗ A) vec(B)

vec(a) = vec(a ) = a T

T

T

T

(42) T

T

T

T

T

ABc = (c ⊗ A) vec B = (A ⊗ c ) vec B = vec(c B A )

tr(u v) = tr(v u) = v u

(43)

Interlude: finite differencing You should always check your derivatives with finite differencing which is an alternative way to calculate a derivative! Here is some matlab code: function df = finitediff(fun, x, d, varargin) %FINITEDIFF estimates a gradient by finite-differencing method. % (c) Stefan Harmeling, 2012-07-10. sx = size(x); nx = numel(x); df = zeros(sx); dx = zeros(sx); for i = 1:nx dx(i) = d; df(i) = (fun(x+dx, varargin{:})-fun(x-dx, varargin{:}))/(2*d); dx(i) = 0; end Some pros and cons of matrix differential calculus + clean notation + vectorized function leads to vectorized derivative (good for coding) + powerful: [2] shows how to take the derivative of eigenvalues and eigenvectors − complicated formulas − requires tricks and practice General recipe and examples (i) (ii) (ii) (iv)

write the letter d in front of the expression identity the constants and variables transform the expression read off the derivative using the identification table

1. Find the derivative of φ(ξ) = ξ 2 . dφ = dξ 2 = 2ξdξ thus Dφ(ξ) = 2ξ

(44)

2. Find the derivative of φ(x) = xT Ax. dφ = (dx)T Ax + xT Adx = xT AT dx + xT A dx = xT (A + AT ) dx thus Dφ(x) = xT (A + AT )

(45)

3. Find the derivative of φ(X) = tr(X T X). dφ = d tr(X T X) = tr((dX)T X + X T dX) = tr(2X T dX) thus Dφ(X) = 2(vec X)T 2

(46)

Matrix Differential Calculus Cheat Sheet Blue Note 142 (started on 27-08-2013)

Stefan Harmeling compiled on 30-8-2013 15:45

4. Find the derivative of φ(x) = (y − Ax)2 . dφ = d((y − Ax)T (y − Ax)) = −2(y − Ax)T A dx thus Dφ(x) = −2AT (y − Ax)

(47)

5. Find the derivative of f (X) = (X T X)−1 X T y. We write A = (X T X)−1 . df = d(X T X)−1 X T y

(48)

T

T

T

= −A((dX) X + X (dX))AX y T

T

(49)

T

T

T

= −A(dX) XAX y − AX (dX)AX y − A(dX) y T

T

T

(50)

T

T

= −(A ⊗ (XAX y) ) dvec X − ((XAX y) ⊗ A) dvec X − (A ⊗ y ) dvec X

(51)

= −(A ⊗ (XAX T y)T + (XAX T y)T ⊗ A + A ⊗ y T ) dvec X | {z }

(52)

Df (X)

6. Often it is easier to find the differential of a scalar function φ(X) = cT (X T X)−1 X T y. dφ = −cT A(dX)T XAX T y − cT AX T (dX)AX T y − cT A(dX)T y

(53)

= − tr(cT A(dX)T XAX T y) − tr(cT AX T (dX)AX T y) − tr(cT A(dX)T y) T

T

T

T

T

T

T

T

= − tr(yXA X (dX)A c) − tr(c AX (dX)AX y) − tr(y (dX)A c) T

T

T

T

T

T

T

T

= − tr(A c yXA X dX) − tr(AX yc AX dX) − tr(A cy dX) T

T

T

T

T

T

T

T

= − tr((A c yXA X + AX yc AX + A cy )dX)

(54) (55) (56) (57)

7. Sometimes it is good to rewrite with indices: T T T T d tr(A Diag v) = d tr(A(In v1T n )) = d tr((A ⊗ I)v1n = d1n Diag(A)v = d diag(A) v = diag(A) dv (58) X d tr(A Diag v) = d Aii vi = d tr diag(A)T v = tr diag(A)T dv (59) i

8. Find the derivative of Rayleigh coefficient φ(x) = xT Ax/(xT x) for symmetric A. 2xT A(dx)(xT x) − 2xT AxxT dx (xT x)2 T T 2(x x)x A(dx) − 2xT AxxT dx = (xT x)2 2(xT x)xT A − 2xT AxxT dx = (xT x)2 2 = T 2 xT (xxT A − AxxT ) dx (x x)

dφ =

(60) (61) (62) (63) (64)

Thus the derivative is: 2 (AxxT − xxT A)x (xT x)2 Ax xT Ax = 2 T − 2 T 2x x x (x x)

Dφ(x) =

(65) (66)

More difficult examples 9. Consider a steepest descent algorithm for minimizing the previous function: x(k+1) = x(k) − ξAT (y − Ax(k) )

(67)

(a) Find the derivative of x(1) (x(0) ) and of x(k+1) (x(k) ). dx(1) = dx(0) + ξAT A dx(0) = (In + ξAT A) dx(0)

(68)

(k+1)

(69)

dx

T

= (In + ξA A) dx 3

(k)

Matrix Differential Calculus Cheat Sheet Blue Note 142 (started on 27-08-2013)

Stefan Harmeling compiled on 30-8-2013 15:45

(b) Find the derivative of x(2) (x(0) ). dx(2) = (In + ξAT A) dx(1) = (In + ξAT A) (In + ξAT A) dx(0)

(70)

(c) Find the derivative of x(k) (x(0) ). dx(k) = (In + ξAT A)k dx(0)

(71)

dx(1) = −AT (y − Ax(0) ) dξ

(72)

(d) Find the derivative of x(1) (ξ).

(e) Find the derivative of x(2) (ξ). dx(2) = d(−ξAT (y − Ax(1) ) + x(1) ) T

(1)

T

(1)

= −A (y − Ax

= −A (y − Ax T

= (−A (y − Ax

(73)

T

) dξ + ξA A dx

(1)

+ dx

T

) dξ + (ξA A + In ) dx

(1)

T

(1)

(74)

(1)

(75)

T

) − (ξA A + In )A (y − Ax

(0)

)) dξ

(76) (77)

(f) Find the derivative of x(3) (ξ). dx(3) = d(−ξAT (y − Ax(2) ) + x(2) )

(78)

= −AT (y − Ax(2) ) dξ + (ξAT A + In ) dx(2) T

= −A (y − Ax =−

2 X

(2)

T

(79)

T

) dξ − (ξA A + In )A (y − Ax !

(ξAT A + In )i AT (y − Ax(2−i) )

(1)

T

2

T

) − (ξA A + In ) A (y − Ax



(0)

) dξ

(80) (81)

i=0

(g) Find the derivative of x(k+1) (ξ). dx

(k+1)

k X =− (ξAT A + In )i AT (y − Ax(k−i) )

! dξ

(82)

i=0

(h) Find the derivative of x(1) (A). dx(1) = d(x(0) − ξAT (y − Ax(0) )) T

= −ξ(dA) (y − Ax

(0)

(83) T

) + ξA (dA)x

= −ξ(In ⊗ (y − Ax

(0) T

= −ξ(In ⊗ (y − Ax

(0) T

(0)

(84)

) ) dvec A + ξ((x ) + (x

(0) T

(0) T

T

) ⊗ A ) dvec A

T

) ⊗ A ) dvec A

(85) (86)

10. Consider the norm φ = r2 = rT r of the residual r = y − Ax. Plug into φ the minimizer x = (AT A)−1 AT y, we obtain φ = (y − A(AT A)−1 AT y)2 = ((In − A(AT A)−1 AT )y)2

(87)

Additionally we have an expression for A: A = C T Diag(Za)B

(88)

So we can view φ as a function of a. Find the derivative of φ(a). Left as an exercise ;)

References [1] H. L¨ utkepohl. Handbook of matrices. Wiley, 1996. [2] J.R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley, Chichester, 1999.

4