Higher Derivatives in Matlab Using MAD Shaun Forth Engineering Systems Department Cranfield University (DCMT Shrivenham) Shrivenham, Swindon SN6 8LA, U.K. email:
[email protected] www.amorg.co.uk/AD/staff_SAF.html
5th European Workshop on Automatic Differentiation University of Hertfordshire, UK, May 21-22 2007
1/ 29
Higher Derivatives in Matlab Using MAD
Plan 1
Introduction
2
MAD’s Forward Mode fmad class derivvec class
3
Higher Derivatives Second Derivatives How The Heck Does This Work? Required Changes to MAD
4
Reverse Mode AD in Matlab
5
Results Minpack Elastic-Plastic Torsion (EPT) Problem Object-Oriented Test Case
6
Conclusions & Future Work
2/ 29
Higher Derivatives in Matlab Using MAD
Introduction
MAD’s overloaded forward mode is efficient for first derivatives [For06]. Requests for higher derivatives of order 2 to 4 for uncertainty analysis and robust optimisation. Could implement a Taylor series class `a la ADOL-C [GJU96] manpower expensive. OR use MAD to recursively differentiate itself (c.f. EuroAD2 - Barak Perlmutter and Jeffrey Siskind, [Gri00, p109]).
3/ 29
Higher Derivatives in Matlab Using MAD
MAD’s Forward Mode
Based on two classes: fmad class - forward mode AD by operator overloading derivvec class - for storage and combination of multiple directional derivatives.
4/ 29
Higher Derivatives in Matlab Using MAD
fmad Class Constructor
e.g. x=fmad([1.1 2 3],[4 5 6]);
5/ 29
Higher Derivatives in Matlab Using MAD
fmad Class Constructor
e.g. x=fmad([1.1 2 3],[4 5 6]); Defines fmad object with I I
5/ 29
value component - row vector [1.1 2 3] deriv component - single directional derivative [4 5 6]
Higher Derivatives in Matlab Using MAD
fmad Class Constructor
e.g. x=fmad([1.1 2 3],[4 5 6]); Defines fmad object with I I
value component - row vector [1.1 2 3] deriv component - single directional derivative [4 5 6]
Perform overloaded operations, e.g., element-wise multiplication via times z=x.*x value = 1.2100 4.0000 9.0000 derivatives = 8.8000 20.0000 36.0000 AD enabled by the times.m function of the fmad class.
5/ 29
Higher Derivatives in Matlab Using MAD
fmad times function for z=x.*y function z=times(x,y) if isa(x,’fmad’) z=x; % deep copy to avoid constructor, no refs. to x if isa(y,’fmad’) z.deriv=y.value.*z.deriv+z.value.*y.deriv; z.value=z.value.*y.value; else z.value=z.value.*y; z.deriv=y.*z.deriv; end else z=y; % deep copy to avoid constructor, no refs. to y z.value=x.*z.value; z.deriv=x.*z.deriv; end
6/ 29
Higher Derivatives in Matlab Using MAD
Working with Multiple Directional Derivatives What if we want the Jacobian?
7/ 29
Higher Derivatives in Matlab Using MAD
Working with Multiple Directional Derivatives What if we want the Jacobian? Seed derivatives with identity I3 x=fmad([1.1 2 3],eye(3));
7/ 29
Higher Derivatives in Matlab Using MAD
Working with Multiple Directional Derivatives What if we want the Jacobian? Seed derivatives with identity I3 x=fmad([1.1 2 3],eye(3)); Overloaded operation with same times function gives value = 1.2100 4.0000 Derivatives Size = 1 3 No. of derivs = 3 derivs(:,:,1) = 2.2000 0 derivs(:,:,2) = 0 4 derivs(:,:,3) = 0 0
7/ 29
9.0000
0 0 6
Higher Derivatives in Matlab Using MAD
The fmad and derivvec classes The fmad constructor function function xad=fmad(x,dx) % FUNCTION: FMAD % SYNOPSIS: Class constructor for forward Matlab AD objects xad.value=x; sx=size(xad.value); sd=size(dx); if prod(sx)==prod(sd) % #values=#derivs => single directional derivative xad.deriv=reshape(dx,sx); else % #values~=#derivs => multiple directional derivatives % Pass derivatives to derivvec constructor xad.deriv=derivvec(dx,size(xad.value)); end 8/ 29
Higher Derivatives in Matlab Using MAD
The fmad and derivvec classes The fmad constructor function function xad=fmad(x,dx) % FUNCTION: FMAD % SYNOPSIS: Class constructor for forward Matlab AD objects xad.value=x; sx=size(xad.value); sd=size(dx); if prod(sx)==prod(sd) % #values=#derivs => single directional derivative xad.deriv=reshape(dx,sx); else % #values~=#derivs => multiple directional derivatives % Pass derivatives to derivvec constructor xad.deriv=derivvec(dx,size(xad.value)); end 8/ 29
Higher Derivatives in Matlab Using MAD
The derivvec class Store derivatives as a matrix with each directional derivative unrolled into a column.
9/ 29
Higher Derivatives in Matlab Using MAD
The derivvec class Store derivatives as a matrix with each directional derivative unrolled into a column. e.g. derivvec(eye(3),[1 3]) derivatives conceptualised as as, direc 1 direc 2 direc 3 [1, 0, 0] [0, 1, 0] [0, 0, 1]
9/ 29
Higher Derivatives in Matlab Using MAD
The derivvec class Store derivatives as a matrix with each directional derivative unrolled into a column. e.g. derivvec(eye(3),[1 3]) derivatives conceptualised as as, direc 1 direc 2 direc 3 [1, 0, 0] [0, 1, 0] [0, 0, 1] But stored as,
direc 1 direc 2 direc 3 1 0 0 1 0 0 0 1 0 = 0 1 0 0 0 1 0 0 1
9/ 29
Higher Derivatives in Matlab Using MAD
The times operation of the derivvec class e.g. Need to calculate, direc 1 direc 2 direc 3 1 0 0 1.1 2 3 . ∗ 0 1 0 0 0 1
with multiplication of each of the 3 directional derivatives. Convert value to column matrix and replicate 3 times 1.1 1.1 1.1 1 0 0 1.1 0 0 2 2 2 . ∗ 0 1 0 = 0 2 0 3 3 3 0 0 1 0 0 3 columns give required directional derivatives.
10/ 29
Higher Derivatives in Matlab Using MAD
Accessor Functions Getting the value getvalue(z) ans = 1.2100
4.0000
9.0000
Getting external representation of derivatives getderivs(z) ans(:,:,1) = 2.2000 ans(:,:,2) = 0 ans(:,:,3) = 0
0 4 0
0 0 6
Getting unrolled internal representation getinternalderivs(z) ans = 2.2000 0 0 4 0 0
11/ 29
0 0 6
Higher Derivatives in Matlab Using MAD
Higher Derivatives Basic idea is to be able to use MAD recursively. For first derivatives: x=fmad([1.1 2 3],eye(3)); z=x.*x zvalue=getvalue(z) zvalue = 1.2100 4.0000 9.0000 zderivs=getinternalderivs(z) zderivs = 2.2000 0 0 0 4.0000 0 0 0 6.0000 Can we just differentiate the above process a second time with MAD?
12/ 29
Higher Derivatives in Matlab Using MAD
Second Derivatives xx=fmad([1.1 2 3],eye(3)); % xx’s value and derivs. D1=I_3 x=fmad(xx,eye(3)); % x’s value =xx, x’s derivs. D2=I_3 z=x.*x; zvalue=getvalue(getvalue(z)) % z’s value’s value zvalue = 1.2100 4.0000 9.0000 zderivs=getinternalderivs(getvalue(z)) % z’s value’s derivs % in D1 direcs. zderivs = 2.2000 0 0 0 4.0000 0 0 0 6.0000 zderivs=getvalue(getinternalderivs(z)) % value of z’s derivs % in D2 direc. zderivs = 2.2000 0 0 0 4.0000 0 0 0 6.0000 13/ 29
Higher Derivatives in Matlab Using MAD
Second Derivatives (ctd)
z2ndderivs=reshape(... getinternalderivs(getinternalderivs(z)),[3 3 3]) % derivs. in D1 direc. of z’s derivs. in D2 direc. z2ndderivs(:,:,1) = 2 0 0 0 0 0 0 0 0 z2ndderivs(:,:,2) = 0 0 0 0 2 0 0 0 0 z2ndderivs(:,:,3) = 0 0 0 0 0 0 0 0 2
14/ 29
Higher Derivatives in Matlab Using MAD
Why The Heck Does This Work? xx xx=fmad([1.1 2 3],eye(3)); creates object, value
deriv
[1.1 2 3] eye(3) x value
deriv
xx
eye(3) deriv
x=fmad(xx,eye(3)); creates object, value
[1.1 2 3] eye(3)
15/ 29
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value
deriv
x.value.*x.value 2.*x.value.*x.deriv
16/ 29
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value
deriv 2.*x.value.*x.deriv
xx value
xx deriv
[1.1 2 3] eye(3)
16/ 29
.*
value
deriv
[1.1 2 3] eye(3)
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value deriv 2.2 0 0 [1.21, 4.0, 9.0] 0 4 0 0 0 6 value
16/ 29
deriv 2.*x.value.*x.deriv
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value deriv 2.2 0 0 [1.21, 4.0, 9.0] 0 4 0 0 0 6
deriv
value
16/ 29
xx value
deriv .* 2 eye(3)
[1.1 2 3] eye(3)
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value deriv 2.2 0 0 [1.21, 4.0, 9.0] 0 4 0 0 0 6
deriv
value
16/ 29
xx value deriv deriv value .* 2 eye(3) .* 2 I3 1.1 1.1 1.1 22 3]2 eye(3) 2 [I3 , I3 , I3 ] [1.1 3 3 3
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value deriv 2.2 0 0 [1.21, 4.0, 9.0] 0 4 0 0 0 6
deriv
value
16/ 29
xx deriv value value deriv .* 2 eye(3) 2.2 0 0 e1 02 3] [I3 , I3 , I3 ] .* 2 e2 4 0eye(3) [1.1 0 0 6 e3
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value deriv 2.2 0 0 [1.21, 4.0, 9.0] 0 4 0 0 0 6
deriv
value
16/ 29
value xx deriv value deriv .* 2 eye(3) 2.2 0 0 I3 e 1 e 1 e1 02 3] I3 .* 2e2 e2 e2 4 0eye(3) [1.1 0 0 6 I3 e 3 e 3 e3
Higher Derivatives in Matlab Using MAD
How The Heck (ctd.) z=x.*x; forms object, z value deriv deriv 2.2 0 0 [1.21, 4.0, 9.0] 0 4 0 0 0 6 value
16/ 29
value xx deriv value deriv .* 2 eye(3) 2.2 0 0 e1 0 0 02 3] 0 e2 0 4 0eye(3) [1.1 0 0 6 0 0 e3
Higher Derivatives in Matlab Using MAD
Required Changes to MAD
Few changes needed Made derivvec class superiorto(’fmad’). Subscript Referencing - make via direct calls to fmad’s subsref function. For efficiency - eliminate unnecessary temporary fmad objects created.
17/ 29
Higher Derivatives in Matlab Using MAD
Made derivvec class superiorto(fmad) derivvec Constructor Function function dv=derivvec(x,shape,nd) . . otherwise error(’derivvec must have 1 or 2 inputs’) end superiorto(’fmad’) Required for binary operations between fmad and derivvec objects. e.g. xfmad.*yderivvec Ensures object precedence is such that derivvec operation called and not the fmad operation. Enables differentiation through the derivvec class.
18/ 29
Higher Derivatives in Matlab Using MAD
Subscript Referencing Cannot make subscript references such as x(1:5,2) within fmad class if x is fmad class. Have to replace overloaded subscripting with direct calls to fmad’s subsref function.
In fmad’s prod.m Product Function localderivs=pval(ones(s(1),1),:)./Aval; replaced with S.type=’()’; S.subs=ones(s(1),1),’:’; localderivs=subsref(pval,S)./Aval;
19/ 29
Higher Derivatives in Matlab Using MAD
Eliminate Unnecessary Temporary fmad Objects A.value of fmad class:
Original Code in prod.m (fmad objects highlighted) Aval=A.value; s=size(Aval); S.subs={ones(s(1),1),’:’}; localderivs=subsref(pval,S)./Aval;
Modified Code in prod.m (fmad objects highlighted) Aval=A.value; s=size(deactivate(Aval)); S.subs={ones(s(1),1),’:’}; localderivs=subsref(pval,S)./Aval; Reduces number of fmad operations performed.
20/ 29
Higher Derivatives in Matlab Using MAD
Reverse Mode AD in Matlab
Uses a madtape class to build up 2 tapes: 1 2
operations tape of all arithmetic operations, including subscripting. value tape of sizes of values involved in calculation, values of those involved nonlinearly.
Functions to set adjoints and propagate back through the computation. Ensure madtape class superior to fmad to enable forward-over-reverse.
21/ 29
Higher Derivatives in Matlab Using MAD
Use of Reverse Mode - Minpack EPT Gradient x=madtape(x0); % initialise madtape object f=MinpackEPT_F(x,Prob); % Evaluate the function % and create tapes fmadtape=getvalue(f) % Extract the objective function value setadjoint(f,1); % initialise adjoint of dependent var. MADCalcAdjoints % propagate adjoints back % through the operations tape gmadtape=getadjoint(x); % extract adjoint of independent
22/ 29
Higher Derivatives in Matlab Using MAD
Results
Very preliminary Minpack EPT case. Objected oriented application
23/ 29
Higher Derivatives in Matlab Using MAD
Minpack EPT Problem Objective function with sparse Hessian
24/ 29
Higher Derivatives in Matlab Using MAD
Minpack EPT Problem Ratio of Hessian time to function time. n = nx ∗ nx 4 16 64 256 hand-coded 12. 13. 12. 14. fmad(sparse) on fmad(sparse) 662. 664. 1154. 18553. fmad(sparse) on reverse 1160. 1147. 1235. 2897. Modify Source Code hand-coded 13. 14. 14. 15. fmad(sparse) on fmad(sparse) 569. 559. 757. 4778. fmad(sparse) on reverse 1114. 1094. 1194. 2727. n2 16 256 4096 65536 All times averaged over a minimum of 5s.
25/ 29
Higher Derivatives in Matlab Using MAD
Source Code Modification [f,fgrad,fhess]=MinpackEPT_FGH(x,Prob) nx=Prob.user.nx; % nx is class double ny=length(x)./nx; % x active => ny active with zero derivs. if ny~=Prob.user.ny error(’Must have length(x)==nx*ny’) end hx = 1/(nx+1); % hx is inactive hy = 1/(ny+1); % hy is active with zero derivs. v = zeros(nx+2,ny+2); % ny active => v active v(2:nx+1,2:ny+1) = reshape(x,nx,ny); % v active allows subscripted adssignment % computer dvdx and dvdy on each edge of the grid dvdx = (v(2:nx+2,:)-v(1:nx+1,:))/hx; dvdy = (v(:,2:ny+2)-v(:,1:ny+1))/hy % unnecessary differentiated division op.
26/ 29
Higher Derivatives in Matlab Using MAD
Source Code Modification [f,fgrad,fhess]=MinpackEPT_FGH(x,Prob) nx=Prob.user.nx; % nx is class double ny=length(x)./nx; % x active => ny active with zero derivs. if ny~=Prob.user.ny error(’Must have length(x)==nx*ny’) end hx = 1/(nx+1); % hx is inactive hy = 1/(ny+1); % hy is active with zero derivs. v = zeros(nx+2,ny+2); % ny active => v active v(2:nx+1,2:ny+1) = reshape(x,nx,ny); % v active allows subscripted adssignment % computer dvdx and dvdy on each edge of the grid dvdx = (v(2:nx+2,:)-v(1:nx+1,:))/hx; dvdy = (v(:,2:ny+2)-v(:,1:ny+1))/hy % unnecessary differentiated division op.
26/ 29
Higher Derivatives in Matlab Using MAD
Source Code Modification [f,fgrad,fhess]=MinpackEPT_FGH(x,Prob) nx=Prob.user.nx; % nx is class double ny=length(x)./nx; % x active => ny active with zero derivs. if ny~=Prob.user.ny error(’Must have length(x)==nx*ny’) end hx = 1/(nx+1); % hx is inactive hy = 1/(ny+1); % hy is active with zero derivs. v = zeros(nx+2,ny+2); % ny active => v active v(2:nx+1,2:ny+1) = reshape(x,nx,ny); % v active allows subscripted adssignment % computer dvdx and dvdy on each edge of the grid dvdx = (v(2:nx+2,:)-v(1:nx+1,:))/hx; dvdy = (v(:,2:ny+2)-v(:,1:ny+1))/hy % unnecessary differentiated division op.
26/ 29
Higher Derivatives in Matlab Using MAD
Source Code Modification [f,fgrad,fhess]=MinpackEPT_FGH(x,Prob) nx=Prob.user.nx; % nx is class double ny=length(x)./nx; % x active => ny active with zero derivs. if ny~=Prob.user.ny error(’Must have length(x)==nx*ny’) end hx = 1/(nx+1); % hx is inactive hy = 1/(ny+1); % hy is active with zero derivs. v = zeros(nx+2,ny+2); % ny active => v active v(2:nx+1,2:ny+1) = reshape(x,nx,ny); % v active allows subscripted adssignment % computer dvdx and dvdy on each edge of the grid dvdx = (v(2:nx+2,:)-v(1:nx+1,:))/hx; dvdy = (v(:,2:ny+2)-v(:,1:ny+1))/deactivate(hy); % deactivate hy => cheaper division
26/ 29
Higher Derivatives in Matlab Using MAD
Object-Oriented Test Case Preliminary Results Test case with 4 classes from aerospace sector n = 21, m = 101. max. 11 nonzeros/row of J. Require derivatives for uncertainty propagation. Users require derivatives to order 3 - results here to order 2 - worked first time! technique FD fmad fmad on fmad
cpu(JF)/cpu(F) 22.0 1.09 -
cpu(∇2 F)/cpu(F) 486.7 2.41
Very little computation performed - nearly all time creating objects and some trivial computation - highlights benefits of OO!! OO destroys performance of FD.
27/ 29
Higher Derivatives in Matlab Using MAD
Jacobian Sparsity
28/ 29
Higher Derivatives in Matlab Using MAD
Conclusions & Future Work
MAD can differentiate object-oriented code - including itself! Further performance optimisation of fmad class needed: I I
Inactive fmad object is needed. Store size of fmad object directly - don’t take size of it’s value.
Sparsity detection needed for first and higher derivatives - some proof of concept done using sparse logical matrices - summer 2007. madtape class needs improvements: I
I
29/ 29
Generate tape once and do multiple adjoints for same inputs x - need to extend to different x with same control flow. Performance analysis and optimisation.
Higher Derivatives in Matlab Using MAD
References
Shaun A. Forth. An efficient overloaded implementation of forward mode automatic differentiation in MATLAB. ACM Trans. Math. Softw., 32(2):195–222, June 2006. Andreas Griewank, David Juedes, and Jean Utke. Algorithm 755: ADOL-C: A package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software, 22(2):131–167, 1996. Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Number 19 in Frontiers in Applied Mathematics. SIAM, Philadelphia, PA, 2000.
30/ 29
Higher Derivatives in Matlab Using MAD