Chain Rules
for Higher Derivatives H.-N. HUANG,S. A. M. AND N. J. YOUNG
MARCANTOGNINI,
e define a notion of higher-order directional derivative of a smooth function and use it to establish three simple formulae for the nth derivative of the composition of two functions. These three "higher-order chain rules" are alternatives to the classical Fag{ di Bruno formula. They are less explicit than Fa~ di Bruno's formula but are much simpler and avoid Diophantine or combinatorial complications. In principle it is easy to calculate a higher derivative of the composition f o g of two sufficiently differentiable fnnctions f a n d g: one can simply apply the "chain rule"
W
(.1"o g)' = ( f ' o g)g' as many times as needed. For example, (fog)"=
((./" o g)g')' = ( f ' o g)' g' + (./" o g)g" - ( f " o g)(g,)2 + ( f , o g)g".
By continuing in this vein we can readily obtain an expression for any particular higher-order derivative o f f o g in terms of derivatives of f and g. Formulae for ( f o g) y
i n d u c t i v e l y f o r Xo ~ ~ , X l , X 2 , 9 9 . , x~ E X b y Aof(xo) = f(xo), Aj+lf(xo,xl,
d d(x>x2, . . . , xj+l) Ajf(x~
. . . , xj+O=
(2)
" " " , xj).
Thus AJ+lf(x~
....
Xj+l) = limz+0 1 [
= ~az A j f ( x o
.....
+ zxl,xl
+ ZX2 . . . .
xj+zxj+l)
-Ajf(xo,xx,
. . . xj) 1
, Xj ~- ZXj+ 1) z=0 '
and w e m a y write d A ,f( xo,xl, . . . ,xu) =
d
d
d(xl .....
x,)
d(&,
. . . , xn-O
d(Xl,X2)
d dXl f(xo).
EXAMPLE 1: The exponential function. Directly from Definition 1 w e find, for x j E C, A1 exp(xo,xl) =
O----eXo+ZXl
OZ
z=0
= ego&,
A2 e x p ( x o , x l , x 2 ) = 0 eX0+zxl(& + zx2) z=0 = eX~ 2 + x2), az
A3 exp(x~
= ~ eX~ Oz
[(Xl q- zx2)2 '}- x2 q- zx3] z=O
= eXo(x 3 + 3XlX2 q- X3).
We regard A , , f ( x o , . . . , x , ) as the directionial derivative of f of order n at xo in the direction (Xl, . . 9 , x , ) . There is another natural interpretation of the appellation "nth-order directional derivative of f " , to wit the function An f : ~ ' ~ X X 2 " - l - - - +
y
defined inductively by A1 f=
Alf,
A n+l f =
Al(Anf).
There is a close connection b e t w e e n 5,, a n d An.
NICHOLAS YOUNG works on operator theory and complex analysis,with a leaning toward questions arising from H ~ control theory. He took his first and doctoral degrees in mathematics at Oxford University and has held posts at Glasgow, Lancaster, and Newcastle upon Tyne.
School of Mathematics University of Leeds Leeds, LS2 9JT, England e-mail:
[email protected] 9 2006 Springer Science+Business Media, Inc., Volume 28, Number 2, 2006
63
THEOREM 1 Let X, Y be B a n a c h spaces, let ~ be a n open subset o f X, a n d let f : 1"1 --+ Y be n times continuously FrSchet differentiable. For m E ~, let Lm
:
~ X X m ---> ~~ X X 2m 1
be defined inductively by LI(XO,Xl) = ( X 0 , X l ) ,
L~+ l(Xo,Xl, . . . , x,,+l ) = (Lm(&, . . . , x , ) ; L,,(xl . . . .
, x,,+ p),
f o r & E ~ a n d xj E X, j >- 1. Then A,,f=(A'f)
oL~,
:
~XX
~'--> E
Thus ~2f(xo,..V1,X2)
= A2f(XO,Xl,XI,X2),
A 3 f ( Xo,Xl,X2,K3) = A3 f ( XO,X1,XI,X2,XI,X2,DC2,X3).
2 b J h a s the merit of a relative e c o n o m y in the n u m b e r of its arguments, and earns its k e e p b y virtue of the following three higher-order chain rules. The first two concern functions f o g w h e r e g is a function of a single real or c o m p l e x variable. Let ~ denote either [~ or C. THEOREM 2 Let X,, Y be B a n a c h spaces over ~ a n d let qt, ~ be open subsets o f ~, X respectively. Let g : ~--+ ~ ,
f : ~)---~ Y
have n c o n t i n u o u s Fr&-het derivatives, .for some n E ~. bur a n y t E tit ( f o g)(")( t) = z X , f ( g ( t),g'( t), . . . , g(~')( t)). PROOF.
(3)
In the case n = 1 we have & f ( g ( t ) g'(t)) = .~z f ( g ( t ) + zg'(O) ~=0 = DJ'(g(t))(g'(t)) = (fog)'(O
by the classical chain rule. S u p p o s e the assertion holds for s o m e j < n, that is, for any t E ~ , ~Tf
g(t)
Ajf(g(t),g'(t) ....
[]
, g(./)(t)).
Then, for t E tit, d j+ 1
dj
dtJ+-Tf o g(t) = ~ _
( f o g)'(t)
dj
dt j k l.f ( g( t),g' ( t) ) _
at/OJ(~zf(g(t)+zg'(t))z=o) as
0
OtJ Oz f ( g ( t ) +
z g ' ( b ) z=0
The function ( t , z ) ~ - + f ( g ( t ) + zg'(t)) has j + 1 continuous derivatives on an o p e n set in ~ 2 and so the partial derivatives can be interchanged to give d j+l
dtJ+ ~ f o g(t) -
0 -J(g(t) + zg' ( t) ) z=0' Oz ~~ t
Hence, by the inductive hypothesis, dJ+l
0 A j f ( g ( t ) + zg'(t),g'(t) + zg"(t),
dt - / + l f ~ g(t) = 07z
= zXj+J(g(&g'(t) .... The theorem follows by induction.
64'
THE MATHEMATICAL INTELLIGENCER
"'"
' ~176
+ zg(j+l)(t))
z=0
, gCj+l)(t)). []
W e c a n give a n a l t e r n a t i v e e x p r e s s i o n for A,ac a n d h e n c e for t h e c h a i n r u l e (3) in t e r m s o f t h e elem e n t a r y s y m m e t r i c f u n c t i o n s ok. If z = (zl, . . . , z , ) t h e n o-~(z) will d e n o t e t h e e l e m e n t a r y s y m m e t r i c f u n c t i o n o f d e g r e e k i n z l , . . . , z,,, s o that, for all 3. E C, (A + z p 9 9 9 (;t + z n ) = A" + o-~(z)a ~-~ + 9 9 9 + o-~(z). LEMMA
1
Let f
~ C X--+ Y h a v e n c o n t i n u o u s FrOchet derivatives, w h e r e X, Y a r e B a n a c h s p a c e s a n d f~
:
is a n o p e n set. F o r a n y Xo ~ ~ a n d x l , . . . , x n E X,
Anf(xo,xl ....
PROOF.
, xn)
f(xo +
O,z n ~ j Z n _ l a" . . . 0271
=
O-I(Z)XI
q- . . . -1-
o-,,(z)x~) z=(0, ..,0)
T h e case n = 1 is simple. S u p p o s e that n > 1 a n d t h a t t h e e q u a l i t y h o l d s for n -
1. If z ' =
(zl, 9 . . , z , , - 1 ) w e h a v e Anf(Xo,Xl,
9 . .
'
Xn)
=
0
0 Z~z
An
1/'('%'0 "Jr- Z n X l , "
O --
OZ n
9 . . , Xn
1 q- Z n X n )
zn=0
on-- 1 OZn-1
. . . OZ 1 f(Xo
-}- " " " q- O" n
q- Z n X 1 -}- O - I ( Z ' ) ( X l
I(Z')(Xn_ 1 -I- ZnXn) )
-1.- Z n X 2 )
z'=(O,...,O),z,,=O
Since, for 0 -< j < n - 1, oT/(& . . . . .
z , , _ ] ) z , , + o[/+](& . . . , z , ~ - l ) = G i + l ( Z l . . . .
, z,0,
we have
A,,f(xo,& ....
, an)
C ) Z n O Z n - 1a " . . . 0271
=
f(x 0 + GI(Z)XI+
" " " + o - ~ ( z ) x , ) z=(0, ,0)
as r e q u i r e d .
[]
COROLLARY 1
Iff
g a r e a s in T h e o r e m 2, t h e n f o r a n y t E xtt,
( f ~ g ) ( m ( t ) = O z , , O z , an1 " 9 9 O< f ( g ( t )
9 9 9 + o - ' ( z ) g ( ~ ) ( t ) ) z=~0, ..,0)"
+ o-l(z)g'(b+
T h e t h i r d c h a i n rule a p p l i e s to m o r e g e n e r a l c o m p o s i t e f u n c t i o n s o n B a n a c h s p a c e s . THEOREM 3
Let W,,X, Y be B a n a c h
s p a c e s o v e r 0< a n d let dp,fl b e o p e n s u b s e t s o f W,,X, respectively. Let g : O~-+f~,
have n continuous
f:
f~--+ y
F r & h e t d e r i v a t i v e s , .for s o m e n ~ ~ . F o r a n y Wo E ~ a n d Wl,W2, . 9 9 , w n ~ W,,
Arz.f o g ( w o , w l ,
. . . , w~) = A n f ( g ( w o ) , A l g ( w o , w l )
, . . . , Ang(wo,w] , . . . , wn)).
P R O O F . G i v e n a n y W o E 9 a n d w~ E W,, set G ( z ) : = g ( w o + Z W l ) for z i n a n e i g h b o u r h o o d xtr o f 0, so that G: W __. IN--~ ~ is c o n t i n u o u s l y differentiable at 0 w i t h G ' ( 0 ) = A i g ( w o , w l ) . T h e n A]fo g(wo, w]) = ~
f(g(wo
+ ZWl)) z=o
=
( f o G)'(O).
By T h e o r e m 2, (f o G)'(0) = k~f(G(O), G'(0)) = Alf(g(wo),Alg(wo,
wl)).
Hence A l f o g ( w 0 , Wl ) =
•
Wl )),
w h i c h p r o v e s t h e r e s u l t in t h e c a s e n = 1. S u p p o s e t h e a s s e r t i o n h o l d s for s o m e j < n, t h a t is, for a n y w0 C @ a n d Wl, . . . , w! E W, 2xi(,/~
.....
wi) = ~ / f ( g ( w o )
.....
~ig(wo ....
, wj)).
9 2006 Springer Science+Business Media, Inc., Volume 28, Number 2, 2006
Let Wo E O), u4, . . . , Wj+l ~ W b e g i v e n and, for z in a n e i g h b o u r h o o d q* of 0, set
.....
Z o ( z ) : ~- g ( w 0 + Z W l )
a./.(z) : =
ajg(w 0 + ZWl,
. . . , wj + zwj+l).
By the inductive hypothesis, Aj+l(fo g)(wo .....
Wj+l) = ~ z h j ( f
~ g)(w0 + Z W l , . . . , w j + z w j + l ) z=o-
-~zA ff(Zo(z),
. . . , Zj(z)) z=o
Define G:xP" C_ ~---+ ~ X x J - 1 b y G(z): = (Zo(z), . . . , Zj(z)), so that Aj+l(fo g)(wo .....
(4)
w j + l ) = ( A j f o G)'(0).
Write (0 = z 0 ( 0 ) , . . . ,
fj=zj(0)
and
~j+l = a j ( 0 ) . Note that, for 1 -< k < - j + 1, ~'k = Z k - l ( 0 )
=
A k g ( W o , 9 9 9 , Wk),
w h e n c e G is differentiable at 0 with G'(0) = (~'1, . . 9 , srj+l) 9
By equation (4) a n d T h e o r e m 2, Aj+ 1( f o g ) ( Wo, 9 . 9 , w j + 1) = ( A j f o G)' (0) = A 1Ajar(G(O), G'(O)) = klAjf((~'o, . . .
7gf( 0 +
= 0
, ~j);(~l,
~j + gj+l) t=0
t~'l . . . . .
= Aj+,f(~o,...,
' ' ' , ~'j+l))
~j+,)
= Aj+lf(g(Wo),...,
Aj+lg(Wo
.....
Wj+l)). []
Therefore, b y i n d u c t i o n , the formula holds for n E N. Let us n o t e s o m e s i m p l e properties of h i g h e r directional derivatives. THEOREM 4
Let X, Y b e B a n a c h
s p a c e s o v e r ~ , let ~ b e a n o p e n set i n X, a n d let f : ~ --+ Y h a v e n c o n -
t i n u o u s F r ~ c h e t d e r i v a t i v e s . F o r a n y Xo E ~ , x l , . . . , Xn, E1 ,~2 ~ X a n d a E ~ , A nf(Xo,OtXl .....
olnXn) = otnAnf(Xo,Xl,
. . . , Xn)
and A n f ( X o , . . . , Xn-l,Ol~l +
(1 - o/)~2)
= aAnf(Xo,
. . . , Xn-l,~l)
T h e s e p r o p e r t i e s follow s i m p l y b y i n d u c t i o n . Note that A n f ( X o , X l , v e r s i o n of Leibniz's f o r m u l a holds for An.
+
(1 -
a)Anf(Xo,
. . . , X n)
. . . , Xn-l,~2).
is affine in x n. Second, a
THEOREM 5 Let W, X l , X 2 , a n d Y b e B a n a c h spaces, let 9 b e a n o p e n s u b s e t o f W, a n d let B : X1 X X2 --+ Y be a continuous
b i l i n e a r operator. Let f : q* --+ X1 a n d g : q* --+ X2 be n t i m e s c o n t i n u o u s l y F r ~ c h e t d i f f e r e n -
tiable. T h e n , f o r a n y ( w o , w l . . . . . An(Bo(fXE))(Wo,WI,...,
66
THE MATHEMATICALINTELLIGENCER
Wn) E ~ • Wn)
=
W u,
m=o~ (nm) B ( A m f ( W o , ' ' ' ,
Wm),An-mg(WO,...,
Wn-m)).
Examples Three examples will suffice to illustrate the chain rule formulae in the preceding section.
EXAMPLE 2: Derivatives of
e g.
On combining Theorem 2 and Example 1 we find, for any cg3 function g and t ~ C, d 3
--eg(t) dt 3
= A3
exp(g(t),g'(t),g"(b,g"(t))
= eg(t)(g'(t) 3 +
3g'(t)g"(t)
+ g"(t)).
By induction for any n E NI there is a polynomial Yn of degree n in x l , . . . , x n such that k , exp(x0,& . . . . .
Xn) = eX~
9 9 . , Xn),
and so that an __~q(t)
= eg(t) y n ( g , ( t ) ,
dt n
. . . , g(n)(t))
whenever g is n times continuously differentiable. The E, can be derived recursively: YI(&) = Xl, and the relation eX~
Xn+l) = a-~-eX~
9. . '
+ zx2, 9 . . , Xn + Z X n + l ) z=0
OZ
translates into the recursion Yn+I(XO
....
, Xn+l) = X 1Yn(Xl . . . .
, OCn) -}-
0Yn
j=l
X/+17 "
(Xl, 9 . . ,
oxj
Xn).
The polynomials Yn have a long history. They are named after Eric T. Bell, w h o studied them in 1934 [1], but Johnson [8, Section 2] traces related polynomials to sundry 19th-century authors, including Fa~ di Bruno himself. Yn can be defined combinatorially, or by means of the elegant determinantal formula (which apparently is essentially due to Fa~ di Bruno)
Yn(Xl,
(:-l,)xn
(no1)X 1
(nll)X 2
(n21)X 3
n-2]
--1
(nO2)X 1
(n 12)X2
n -2]X n-3] n-2
0
-1
(n o3)Xl
i
i
i
0
0
0
('0>1
(11)x2
0
0
0
-1
(%
. . . ,x n) =
n--1
(
n--3]X n--4) n--3
[n-2~X ~n-2] n-1 (n - 3 ~ X ~n-3} n-2
Fa~ di Bruno's formula can be succinctly expressed in terms of the coefficients of the Bell polynomials Yn: for proofs and much interesting history, see Johnson, loc. cit. The purpose of this note is rather to exhibit some simpler forms of higher chain rules.
EXAMPLE 3: Let us calculate the third derivative o f f ( g l ( t ) , g 2 ( t ) ) for suitably smooth f u n c t i o n s f o n R 2, gj on R. For x j, 3~/E ~ we find Alf((xO,YO),(Xl,Jl))
=
X l - ~ X -Jr- Y l
Xo,Yo),
0
2
A3f((xo y o ) , ( x l , y l ) , ( x 2 Y2) (x3,Y3)) =
(( oq ~) x3-~x + Y3
f+
3
( oq 022-~176 X2~X q-
Oy
-1- Y l
..~) ( ~ q- ~)3 ) f+
Xl
Yl
f
(Xo,Yo).
9 2006 Springer Science+Business Media, Inc,, Volume 28, Number 2, 2006
67
Hence d3
dt 3 f ( g l ( t ) , g 2(t)) = A3f((gl(t),g2(t)),.
. . ,
= (( g;"(t)-~x + ge(t)~y)f"
(
(gi"(t),g~e"(t)))
O\
,, o,(
O
+(g{(t) -~x + g'2(t) ~ ) 3 f More generally, one can express An f i n
o
+ g2(t) -~y) g{(t) --Ox + g'2(t)
+3 g~'(t)
)(gl(t),ga(t)) .
terms of higher-order Fr6chet derivatives o f f e.g.,
A3f(xo,xl,x2,x3) = Df(xo)(X3) + 3D2f(xo)(xl,x2) + D3f(xo)(xl,Xl,Xl). In view of T h e o r e m 2, it is clear that a similarly explicit expression for A n f w o u l d be equivalent to a version of the Fa~t di Bruno formula; indeed, from equation (1) (or its multivariate generalization) w e must have
Anf(xo, . 9 9 , xn) = Z kl! " " 9 kn!(l!)H!kl 9 . . (n!) te" Dkf(xo)(Xl, " " " ' Xl, . . . ,. Xn, . .
' Xn)
where, on the right-hand side, the argument :ci is r e p e a t e d kj times, and the sum is taken over all nonnegative integer solutions for the kj of the Diophantine equation kl + 2k2 + " ' " + nk,, = n, a n d k := kl + 9 9 9 + kn. An i n d e p e n d e n t p r o o f of this formula, together with T h e o r e m 2, w o u l d yield yet another p r o o f of the Fail di Bruno formula. Here is a multivariate example. EXAMPLE 4 : Let fro : CkXk-+ C be defined by fro(X) = tr2(Xm), and let us calculate Write p m ( X ) = X m. By Theorem 3 w e have
A2fm(Xo,Xl,X2).
A2fm(Xo,X1,X2) = Ai(tr 2 ~ pm)(Xo,X1,X 2) = A2tre(pm(Xo),Alpm(Xo,X1),Azpm(Xo,X1,X2)). On the one hand, Altr2(X0,X1) = 2trX0trX>
A2tr2(Xo,X1,X2) = 2tr2X1 + 2trX0trX2. On the other hand, AlPm(Xo'XI) = ~ Z (XO 4- zx1)m z= 0 = ~,, X o"X I X o , ee,/3>~O o~+.8= m - 1
A2Pm(X~
= 7O
~ ' (Xo + zX1)a(X1 Jr- zX2)(Xo + a,/3->o o~+/3--m-- 1
= ~X~X2X~"}- E a,/3->O oe+/3= m-1
zX1)/3 z=0
XaXlX~XlXYo .
oe,/3,.y>0 or+/3+ T= m - 2
Therefore A2fm(Xo,Xl,X2) = 2tr2AlPm(Xo,X1) +
2trpm(Xo)trA2Pm(Xo,X1,X2)
= 2m2tr2(X~ '-1 X1) +
2tr(X~n) X mtr(X~" 1X2) q-
(m - 1 - / 3 ) t r ( X ~ n-2 r
.
/3=0 ACKNOWLEDGEMENTS
The first and second n a m e d authors gratefully acknowledge the hospitality of the School of Mathematics and Statistics of the University of Newcastle u p o n Tyne. The third n a m e d author carried out this work while at Newcastle with partial support from EPSRC grant GR/T 20014/01.
68
THE MATHEMATICALINTELLIGENCER
R E F E R E N C E S
[1] E. T. Bell, Exponential polynomials, Ann. of Math. (2) 35 (1934) 258-277. [2] G. M. Constantine and T. H. Savits, A multivariate Faa di Bruno formula with applications, Transactions of the American [3] [4] [5] [6]
Mathematical Society 348(2) (1996), 503-520. Cav. F. Faa di Bruno, Sullo sviluppo delle funzioni, Annali di Scienze Matematiche e Fisiche 6 (1855), 479-480. Cav. F. Fa& di Bruno, Note sur une nouvelle formule de calcul differentiel, Quarterly J. Pure Appl. Math. 1 (1857) 359-360. http://www-groups.dcs.st-and.ac.uW-history/Mathematicians/Faa di Bruno.html. H.-N. Huang, S. A. M. Marcantognini, and N. J. Young, The spectral Caratheodory-Fej6r problem, to appear in Integr.
Equ. Oper. Theory. [7] I. L. Iribarren, Calculo Diferencial en Espacios Normados, Equinoccio, 1980. [8] W. P. Johnson, The curious history of Faa di Bruno's formula, American Mathematical Monthly 109 (2002), 217-234. [9] R. L. Mishkov, Generalization of the formula of Faa di Bruno for a composite function with a vector argument, Intemat. J. Math. ScL 24(7) (2000), 481-491. [10] J. D. DePree, and C. W. Swartz, Introduction to RealAnalysis, John Wiley & Sons, Inc., New York, 1998. [11] K. Spindler, A short proof of the formula of Fab. di Bruno, Elemente der Mathematik 60 (2005), 33-35.
IIL,,l~tili~,;~.~i~.-~.~ll~-ii(.~
. m , ~ ,- , r -~ J ~ 4 - 1
di ~'~- 11
Solutions to "Ein Kleines
Schach" Problems on page 49
TIMOTHY CHOW
Michael
Kleber
and
Ravi
Solution to Problem 1 O n e first verifies that White is not in check from the Black queen, and so is not constrained by having to parry a check. 1.Ba5 is checkmate; all the flight squares of the king are covered, and the q u e e n cannot capture the bishop. Trying to interpose the q u e e n also does not work, b e c a u s e the b i s h o p is checking from two opposite directions. Closer analysis reveals that the bishop's moves enjoy a symmetry about the horizontal axis of the board; that is, barring obstructions, if the bishop can m o v e to a certain square X on the board, then it can also m o v e to the mirror image of X in the horizontal axis. It follows from this symmetry that 1.Ba4 is also checkmate.
Solution to Problem 2 Black plays Ka6, White plays Nf2, Black plays Kb7, and White checkmates with Nhl. On an ordinary chess board, knight moves have parity; that is, if a knight can reach a certain square in an even n u m b e r of moves, then it cannot also
Vakil,
Editors
reach the same square in an o d d numb e r of moves. Chess players say that a knight "cannot lose a tempo." However, on a Klein bottle, this is not true, b e c a u s e there are o d d cycles (in the graph w h o s e edges connect squares that are a knight's m o v e apart), as this p r o b l e m illustrates. Note that pawns "reverse direction" w h e n they cross the left or right edge of the board, and that the standard starting position in chess is illegal on a Klein bottle. This raises some questions about the p r o p e r rules for p a w n promotion, double moves, and en passant, which w e have glossed over in this problem, even though pawns appear. Fortunately, almost any reasonable w a y of settling these questions leaves the solution unchanged, as long as one stipulates (as w e did) that Black's pawns are moving d o w n the board and White's p a w n is moving up the b o a r d in the set position. Center for Communications Research Princeton, NJ 08540 USA e-mail:
[email protected]
9 2006 Springer Science+Business Media, Inc., Volume 28, Number 2, 2006
69