2010/1/10 page 1 i i i i i i i i. Advanced Calculus: Lecture Notes for Mathematics
217-317. James S. Muldowney. Department of Mathematical and Statistical ...
i
i
i
“muldown 2010/1/10 page 1 i
Advanced Calculus: Lecture Notes for Mathematics 217-317 James S. Muldowney Department of Mathematical and Statistical Sciences The University of Alberta Edmonton, Alberta, Canada January 10, 2010
i
i i
i
i
i
i
“muldown 2010/1/10 page 2 i
2
i
i i
i
i
i
i
“muldown 2010/1/10 page i i
Contents Preface
iii
1
The Real Number system & Finite Dimensional 1.1 The Real Number System R . . . . . . . . . . 1.1.1 Fields . . . . . . . . . . . . . . . . 1.1.2 Ordered Fields . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . 1.1.3 Complete Ordered Field . . . . . . 1.1.4 Properties of R . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . 1.2 Cartesian Spaces . . . . . . . . . . . . . . . . . 1.2.1 Functions . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . 1.2.2 Convexity . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . 1.3 Topology . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . .
2
Limits, Continuity, and Differentiation 2.1 Sequences . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . 2.2 Continuity . . . . . . . . . . . . . . . . . . 2.3 Global Properties of Continuous Functions Exercises . . . . . . . . . . . . . . . . . . . 2.4 Uniform Continuity . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . 2.5 Limits . . . . . . . . . . . . . . . . . . . . 2.6 Differentiation of real valued functions of a Exercises . . . . . . . . . . . . . . . . . . .
3
Cartesian Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 2 3 4 4 5 6 7 11 11 13 13 13 21
. . . . . . . . . . .
25 25 29 33 35 38 42 42 44 46 49 52
Riemann Integration 3.1 Content and the Riemann Integral . . . . . . . . . . . . . . . . .
57 57
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . real variable . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
i
i
i i
i
i
i
i
ii
Contents
3.2 3.3
4
5
“muldown 2010/1/10 page ii i
3.1.1 Partition of I . . . . . . . . . . . . . . . 3.1.2 Riemann Sums . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . Cauchy Criteria and Properties of Integrals . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . Evaluation of Integrals . . . . . . . . . . . . . . . . 3.3.1 Real valued functions of a real variable 3.3.2 Real valued functions on R2 . . . . . . . 3.3.3 Real valued functions on Rn . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
59 59 60 61 69 71 71 73 76 77
Differentiation 0f Functions of Several Variables 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . 4.1.1 Linear Functions . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Straight Lines and Curves . . . . . . 4.2 The Directional Derivative and the Differential . Exercises . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Differentiation Rules . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . 4.3 Partial Derivatives of Higher Order . . . . . . . 4.3.1 Min-Max Theory . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . 4.4 Local Properties of C 1 Functions . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . 4.5 Implicit Function Theorem . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Dimension . . . . . . . . . . . . . . 4.5.2 Application: Lagrange Multipliers . Exercises . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
81 81 81 84 85 86 89 96 98 102 105 111 119 124 131 131 137 139 147 150
Further Topics in Integration 5.1 Changes of Variables in Integrals . Exercises . . . . . . . . . . . . . . . 5.2 Integration on Curves and Surfaces 5.2.1 Curves (1-surfaces) . . . 5.2.2 Surfaces (2-surfaces) . . Exercises . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
155 155 166 169 169 175 178
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
i
i i
i
i
i
i
“muldown 2010/1/10 page iii i
Preface These notes are not intended as a textbook. It is hoped however that they will minimize the amount of notetaking activity which occupies so much of a student’s class time in most courses in mathmatics. Since the material is presented in the sterile “definition, theorem, proof” form without much background colour or discussion most students will find it profitable to use the notes in conjunction with a textbook recommended by the instructor. Probably the most important aspect of the notes is the set of exercises. You should develop the practice of attempting several of these problems every week. Many of the problems are quite difficult so please consult your instructor if you are not blessed with success initially. Do not acquire the habit of abandoning a problem if it does not yield to your first attempt; a defeatist attitude is your greatest adversary. Solution of a problem, even with some assistance from the teacher when necessary, is a fine boost to your morale. You will find that a strong effort expended on the earlier part of the courses will be rewarded by growing self-confidence and easier success later.
iii
i
i i
i
i
i
i
iv
“muldown 2010/1/10 page iv i
Preface
NOTATION: Except when specified otherwise, upper case (capital) letters will denote sets and lower case (small) letters will denote elements of sets. a ∈ A means a is an element of the set A. A ⊂ B means A is a subset of B. a 6∈ A means a is NOT an element of the set A. A 6⊂ B means A is NOT a subset of B. Remark The slash through any symbol will mean the negation of the corresponding statement. B ⊃ A means b contains A. A ( B means A is a proper subset of B. P =⇒ Q means statement P implies statement Q. P ⇐⇒ Q means P holds if and only if Q holds. ∃ “there exists”. ∀ “for all”. s.t.
“such that”.
“end of proof”. {x : . . .} means the set of all things x that satisfy conditions specified in . . .. For example, see the following. A ∪ B The union of sets A and B, {x : x ∈ A
or x ∈ B}.
A ∩ B The intersection of sets A and B, {x : x ∈ A
and x ∈ B}.
A\B Set difference, {x : x ∈ A and x 6∈ B}. A × B Cartesian product of sets A and B, {(x, y) : x ∈ A, y ∈ B}. ∅ The empty set, a set with no elements. intervals For a, b ∈ R with a < b, [a, b] means {x ∈ R : a ≤ x ≤ b} (closed interval). (a, b) means {x ∈ R : a < x < b} (open interval). (a, b] means {x ∈ R : a < x ≤ b}. [a, b) means {x ∈ R : a ≤ x < b}.
i
i i
i
i
i
i
“muldown 2010/1/10 page 1 i
Chapter 1
The Real Number system & Finite Dimensional Cartesian Space We begin by defining our basic tool, the real numbers R The real numbers can be constructed from more primitive notions such as the natural numbers N := {1, 2, 3, . . .}, or even from the fundamental axioms of set theory. Here we shall be content with a precise description of R.
1.1
The Real Number System
R
Definition 1.1. R is a complete ordered field. In the next several subsections we will explain the three underlined words.
1.1.1
Fields
A field is a set F together with two binary operations + and · (addition, multiplication) which satisfy the following axioms: For all a, b, c, . . . in F F1 a + b ∈ F and a · b ∈ F (closure) F2 a + b = b + a and a · b = b · a (commutativity) F3 a + (b + c) = (a + b) + c and a · (b · c) = (a · b) · c (associativity) F4 (a + b) · c = (a · c) + (b · c) (distributivity) F5 There exists unique elements 0 and 1 in F , 0 6= 1, such that a+0=a
and a · 1 = a
∀ a ∈ F.
F6 For each a ∈ F , there exists −a ∈ F such that a + (−a) = 0, and if a 6= 0, there exists a−1 ∈ F such that a · a−1 = 1 (existence of inverses). Please note the following conventions and easy consequences: 1
i
i i
i
i
i
i
2
“muldown 2010/1/10 page 2 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space 1. a · b will henceforth be written ab. 2. a(b + c) = ab + ac is an consequence of axioms F2 and F4, and need not be separately assumed. (verify for yourself) 3. The elements (−a) and a−1 are uniquely determined by a. Suppose, for example, that there are two elements (−a1 ), (−a2 ) that satisfy a + (−a1 ) = 0 = a + (−a2 ). Then (−a1 ) = (−a1 )+0 = (−a1 )+(a+(−a2 )) = ((−a1 )+a)+(−a2 ) = 0+(−a2 ) = (−a2 ). 4. It is customary to write a − b for a + (−b) and
a b
for ab−1 .
5. aa, aaa, . . . are usually denoted a2 , a3 , . . .. 6. {1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1, . . .} is usually denoted {1, 2, 3, 4, . . .} = N. i The simplest (and least interesting) field is the set {θ, e} with · θ e θ e + the operations θ θ e θ θ θ e e θ e θ e
Example 1.2
ii The set Q of rational numbers, i.e., numbers of the form usual addition and multiplication is a field.
m n
(n 6= 0) with the
iii The sets R and C of real and complex numbers respectively with the usual addition and multiplication are fields. iv The set Q(t) of rational functions with rational coefficients (i.e. functions of the form p(t) q(t) where p(t) and q(t) are polynomials with rational coefficients) is a field. v The set N = {1, 2, 3, 4, . . .} of natural numbers and the set Z of integers are NOT fields.
1.1.2
Ordered Fields
A field F is ordered if there is a subset P of F (called the positive elements) such that the following order axioms hold: O1 a, b ∈ P =⇒ a + b ∈ P
and ab ∈ P .
O2 0 ∈ /P O3 x ∈ F,
x 6= 0 =⇒ x ∈ P
or
− x ∈ P but not both.
i
i i
i
i
i
i
1.1. The Real Number System R
“muldown 2010/1/10 page 3 i
3
Remark: Every ordered field contains Q as a subfield (we do not prove this). Thus, Q may be characterized as an ordered field containing no ordered proper subfield, i.e., Q is the smallest ordered field. (Two ordered fields are considered the same if they are isomorphic and the isomorphism preserves the order.) A discussion of this point may be found in C. Goffman, Real Functions, Proposition 1 and 2 in Chapter 3. E. Hewitt and K. Stromberg, Real and Abstract Analysis, Theorem 5.9. We define a relation > on an ordered field as follows: If a, b ∈ F , write a > b (equivalently, b < a) if a − b ∈ P . Then the axioms O1, O2, and O3 have the following consequences: Proposition 1.3 (Properties of b, b > c implies a > c. ii If a, b ∈ F , then exactly one of the following holds, a > b, b > a, a = b. iii a ≥ b, b ≥ a implies a = b. iv a > b implies a + c > b + c for each c ∈ F . v a > b, c > d implies a + c > b + d. vi a > b, c > 0 implies ac > bc and a > b, c < 0 implies ac < bc. vii a > 0 implies a−1 > 0 and a < 0 implies a−1 < 0. viii a > b implies a >
a+b 2
> b.
ix ab > 0 implies either a > 0 and b > 0, or a < 0 and b < 0.
Exercises 1.1. Establish the following properties of a field: (a) a · 0 = 0
(b) a · (−1) = (−a)
(c) (−a)(−b) = ab
(d) (ab−1 )(cd−1 ) = ac(bd)−1 (f ) If ab = 0, then a = 0 or b = 0. 1.2. Observe that for P being the set of positive elements (a) 1 ∈ P
(b) a 6= 0 =⇒ a2 ∈ P
(c) If n ∈ N, then n ∈ P .
i
i i
i
i
i
i
4
“muldown 2010/1/10 page 4 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space (d) The field {θ, e} in Example 1.2(i) cannot be ordered. (e) The field C of complex numbers cannot be ordered.
(f ) The fields Q and R are ordered by the usual notion of positivity. (g) The field Q(t) in Example 1.2(iv) is ordered if p(t) q(t) ∈ P whenever the highest power of t in the product p(t)q(t) is positive. 1.3. Prove the statements (i)-(ix) of Proposition 1.3.
1.1.3
Complete Ordered Field
The notion of completeness for an ordered field will stretch your imagination a little further. We will introduce some terminology necessary to discuss this topic. Let S be a subset of an ordered field F . (a) An element u ∈ F is an upper bound of the set S if s ≤ u, (b) w ∈ F is a lower bound of S if w ≤ s,
∀s ∈ S.
∀s ∈ S.
(c) S is bounded above (below) if it has an upper (lower) bound in F . S is bounded if it is bounded above and below. For example, the set of natural numbers N is bounded below and unbounded above; the interval [0, 1) = {x : 0 ≤ x < 1} is bounded. (d) u is the least upper bound (or supremum) of S if (i) s ≤ u,
∀s ∈ S, that is u is an upper bound, and
(ii) s ≤ v, ∀s ∈ S =⇒ u ≤ v, that is u is smaller than any other upper bound for S. We write either u = sup S or u = lub S. (e) Similarly, the greatest lower bound (or infimum) of S is a number w which is a lower bound for S and exceeds all other lower bounds. We write w = inf S or w = glb S. Definition 1.4. An ordered field F is complete if each nonempty subset S of F which has an upper bound has a least upper bound (supremum). The only (to within an isomorphism) complete ordered field is R. Again we do not prove this. A discussion may be found in the book of Hewitt and Stromberg, p.45.
1.1.4
Properties of R
The first property is for an ordered field F to be Archimedean: Definition 1.5. An ordered field F is Archimedean if ∀a ∈ F , there exists n ∈ N = {{1, 2, 3, 4, . . .} such that n > a. (That is the set N is not bounded above.)
i
i i
i
i
i
i
1.1. The Real Number System R
“muldown 2010/1/10 page 5 i
5
Theorem 1.6. R is Archimedean. Proof. Suppose not. Then let a ≥ n, ∀n ∈ N be an upper bound of N. Then there exists a least upper bound b = sup N, since R is complete. Thus, b ≥ n, ∀n ∈ N and b − 1 < n0 for some no ∈ N. But then, b < n0 + 1 ∈ N, contradicting b = sup N. Corollary 1.7. If a > 0, there exists n ∈ N such that 0 < Proof.
There exists n > a−1 > 0. (Why?). Thus, a >
1 n
1 n
< a.
> 0.
Corollary 1.8. Q is an Archimedean ordered field. Corollary 1.9. If a, b ∈ R, a < b, then there is a rational r such that a < r < b. Proof. There exists an n ∈ N such that n(b − a) > 1 (why?). Let m be the least integer such that m > na. Hence, m − 1 ≤ na and so na < m ≤ na + 1 < na + n(b − a) = nb =⇒ a
0, there exists n ∈ N such that 0 < 21n < a. Hint: Show 2n > n, ∀n ∈ N. 1.5. Q is not Archimedean. 1.6. Let F be an Archimedean ordered field containing an irrational element ξ. Show that if a, b ∈ F , a < b, then there is an irrational element η such that a < η < b. 1.7. Show that R contains an irrational element. Hint: Show first that no rational p satisfies p2 = 2. Then show that p = sup{x > 0 : x2 < 2} must satisfy p2 = 2. Theorem 1.10. Let In = [an , bn ], and In+1 ⊂ In , n ∈ N. Then ∩∞ n=1 In 6= ∅. In other words, a nested sequence of closed intervals has at least one point common to all intervals. Proof. First note that an < bm for all n, m. Thus, each bm is an upper bound for the set {an : n ∈ N}. Therefore, a := sup{an : n ∈ N} ≤ bm for all m. It follows that an ≤ a ≤ bn for all n. Thus, a ∈ In for all n.
i
i i
i
i
i
i
6
“muldown 2010/1/10 page 6 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
Definition 1.11. For x ∈ R, the absolute value |x| is defined via x, if x ≥ 0; |x| = −x. if x < 0. The main properties of the absolute value are listed in Proposition 1.12 (Properties of | · |). For x ∈ R, there holds (i) |x| = 0 ⇐⇒ x = 0. (ii) | − x| = |x|. (iii) |xy| = |x| |y|. (iv) If c ≥ 0, then |x| ≤ c ⇐⇒ −c ≤ x ≤ c. (v) ||a| − |b|| ≤ |a + b| ≤ |a| + |b|. Proof. Part (i)–(iv) are an exercise. For the proof of (v), note (iv) =⇒ −|a| ≤ a ≤ |a|,
−|b| ≤ b ≤ |b|
=⇒ −(|a| + |b|) ≤ a + b ≤ |a| + |b| (iv) =⇒ |a + b| ≤ |a| + |b|
which is the desired right hand inequality. This implies the left hand inequality since |b| = |b − a + a| ≤ |b − a| + |a| =⇒ |b| − |a| ≤ |b − a| = |a − b|, interchanging a and b =⇒ |a| − |b| ≤ |a − b|,
=⇒ ||a| − |b|| ≤ |a − b| by definition of | · |.
The rest of (v) follows from replacing b by −b.
Exercises 1.9. Let F be an ordered field, with the property that if {In } is a nested sequence of closed intervals in F , then ∩∞ n=1 In 6= ∅. Show that F is complete. (Remark: This exercise and Theorem 1.10 shows that the supremum (completeness) property and the nested interval property are equivalent.) 1.10. Let In = (0, n1 ). Show that ∩∞ n=1 In = ∅. 1.11. Let Kn = [n, ∞) := {x : x ≥ n}. Show that ∩∞ n=1 Kn = ∅.
i
i i
i
i
i
i
1.2. Cartesian Spaces
“muldown 2010/1/10 page 7 i
7
1.12. If a set S of real numbers contains one of its upper bounds a, then a = sup S. Such a supremeum is called a maximum. 1.13. Show that S ⊂ R cannot have two suprema. 1.14. Show that an ordered field F is complete if and only if every non-empty subset of F which has a lower bound has an infimum. 1.15. Show that Q is not complete. 1.16. If s ⊂ R is bounded and S0 ⊂ S, show inf S ≤ inf S0 ≤ sup S0 ≤ sup S. 1.17. If S = {(−1)n (1 − n1 ) : n = 1, 2, . . .}, find sup S, inf S. Prove any statements you make. 1.18. Prove (i) – (iv) of Proposition 1.12.
1.2
Cartesian Spaces
The Euclidean spaces Rn of dimension n are defined as the Cartesian products of the real numbers R. The following definition provides the notation and basic operations on these spaces. Definition 1.13. The Euclidean spaces Rn are defined as Rn = R × R × . . . × R = {(x1 , . . . , xn ) : xi ∈ R, i = 1, . . . , n}. | {z } n−times
The components of (x1 , . . . , xn ) are the xi , i = 1, . . . , n. A point, or vector, in Rn is x := (x1 , . . . , xn ). The zero vector, or origin is the point O = (0, . . . , 0). For two points x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ), we define the operations of addition x + y := (x1 + y1 , . . . , xn + yn ), and scalar multiplication by any “scalar” λ (number from R λx = (λx1 , . . . , λxn ). Geometrically, the sum of points x and y and scalar multiplication is shown in Figure 1.1 The following properties of Rn follow from the properties of R: Proposition 1.14. Rn forms a vector space. (i) x + y = y + x (ii) (x + y) + z = x + (y + z) (iii) x + O = O + x = x
i
i i
i
i
i
i
8
“muldown 2010/1/10 page 8 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
kp
p+q
q
p
p
0
0
Figure 1.1. The sum of vectors p, q and the scalar kp, k > 1. (iv) x + (−1)x = O (v) 1x = x, and 0x = O (vi) λ(µx) = (λµ)x (vii) λ(x + y) = λy + λx There is another kind of product on Rn : Definition 1.15. The inner product or dot product of two vectors x and y is the quantity n X xi yi = x1 y1 + . . . + xn yn . x • y := i=1
The absolute value, or norm, of a point x in Rn is |x| =
√ x • x = (x21 + . . . + x2n )1/2 .
The main properties of the inner product is summarized by Proposition 1.16. (i) x • x ≥ 0, with equality if and only if x = O. (ii) x • y = y • x. (iii) x • (y + z) = x • y + x • z. (iv) (λx • y = λ(x • y) = x • (λy). There is a famous inequality that links inner products and the norms of vectors
i
i i
i
i
i
i
1.2. Cartesian Spaces
“muldown 2010/1/10 page 9 i
9
Theorem 1.17 (Cauchy-Bunyakowski-Schwarz inequality). For all x, y ∈ Rn , we have x • y ≤ |x| |y|,
with equality if and only if either one of x, y is O, or x = λy with λ > 0. Proof. have
Let z = λx − µy with λ, µ ∈ R. From the properties of inner products, we 0 ≤ z•z
by (i)
2
= λ x • x − 2λµx • y + µ2 y • y, by (ii), (iii) and (iv) = |y|2 |x|2 − 2|x||y|x • y + |x|2 |y|2 , choosing λ = |y|, µ = |x|
= 2|x| |y| [|x| |y| − x • y] .
Hence, x • y ≤ |x| |y|. If equality holds in the last expression, then working backwards through the proof yields O = z = |y|x − |x|y,
i.e. x =
|x| y. |y|
Corollary 1.18. For all x, y ∈ Rn , we have |x • y| ≤ |x| |y|, or
1/2 1/2 . y12 + . . . + yn2 |x1 y1 + . . . + xn yn | ≤ x21 + . . . + x2n
Corollary 1.19 (Triangle Inequality). ||x| − |y|| ≤ |x ± y| ≤ |x| + |y|. Proof.
We have |x + y|2 = (x + y) • (x + y) = x • x + 2x • y + y • y = |x|2 + 2x • y + |y|2 ≤ |x|2 + 2|x| |y| + |y|2 , 2 = |x| + |y|
(CBS inequality)
=⇒ |x + y| ≤ |x| + |y|.
The left-hand inequality follows from this just as in the scalar triangle inequality. Proposition 1.20 (Properties of Norm).
i
i i
i
i
i
i
10
“muldown 2010/1/10 page 10 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
Figure 1.2. An interval in R2 on the left and in R3 on the right. (i) |x| ≥ 0 with equality if and only if x = O. (ii) |λx| = |λ| |x|. (iii) ||x| − |y|| ≤ |x ± y| ≤ |x| + |y|.
Definition 1.21. An interval in Rn is the cartesian product of intervals in R: I = I1 × . . . × In , where Ii are intervals in R. If each Ii = [ai , bi ], i = 1, . . . , n, is closed, then I is a closed interval in Rn ; thus, I = {(x1 , x2 , . . . , xn ) : ai ≤ xi ≤ bi }. As in R, we have a nested interval property in Rn . Theorem 1.22 (Nested Interval Property). If {Ik }, k ∈ N, is a sequence of closed intervals in Rn such that Ik+1 ⊂ Ik , k = 1, 2, . . ., then ∩∞ k=1 Ik 6= ∅. Proof. If Ik = Ik,1 × . . . × Ik,n , k = 1, 2, . . ., where the Ik,i are closed intervals in R, then for each i, Ik+1,i ⊂ Ik,i , k = 1, 2, . . ., and ∩∞ k=1 Ik,i 6= ∅. Thus, there exits xi ∈ Ik,i , ∀k ∈ N, and each i = 1, 2, . . . , n. Hence, (x1 , . . . , xn ) ∈ Ik,1 × . . . × Ik,n = Ik , ∀k ∈ N.
i
i i
i
i
i
i
1.2. Cartesian Spaces
1.2.1
“muldown 2010/1/10 page 11 i
11
Functions
Definition 1.23. A subset f of A × B is a function from A to B if (x, y1 ), (x, y2 ) ∈ f =⇒ y1 = y2 . We write f : A 7→ B, (“f is a function from A to B). For (x, y) ∈ f , we use the notation y = f (x). The set Rf := {y : (x, y) ∈ f } for some x is called the range of f , and the set Df := {x : (x, y) ∈ f for some y } is called the domain of f . If U ⊂ A, then f (U ) := {f (x) : x ∈ U } is called the image of U . If V ⊂ B, then f −1 (V ) := {x : f (x) ∈ V } is called the inverse image of V . Example 1.24 Consider f = {(x, x2 ) : −1 ≤ x ≤ 1}; the function f (x) = x2 . Then Df = [−1, 1] Rf =[0, 1] √ −1 f [−1, 1/2] = [0, 1] f [−1, 1/2] = [0, 1/ 2]. Definition 1.25. A function f : A 7→ B is one-to-one if it also satisfies (x1 , y), (x2 , y) ∈ f =⇒ x1 = x2 . 1−1
We write this f : 7→ B. 1−1
Note that if f : 7→ B, then {(y, x) : (x, y) ∈ f } is also a one-to-one function, 1−1 which is denoted by f −1 : B 7→ A, and is called the inverse function of f . If f : A 7→ B and g : B 7→ C are two functions, then the composition of g with f is the function g ◦ f (x) = g(f (x)) with domain given by Dg◦f = f −1 (Dg ). Example 1.26 As an example of a composition, f : R 7→ R2 f (x) = (|x|, x2 + 1) 2 2 g : R 7→ R g(u, v) = (u + v, u − v) g ◦ f : R 7→ R2 g ◦ f (x) = (|x| + x2 + 1, |x| − x2 − 1).
Exercises 1.19. Prove Corollary 1.18. 1.20. Show |x + y|2 + |x − y|2 = 2(|x|2 + |y|2 ) for all x, y ∈ Rn . (parallelogram identity) 1.21. If x = (x1 , . . . , xn ), show √ for i = 1, . . . , n. |xi | ≤ |x ≤ n sup{|x1 |, . . . , |xn |},
i
i i
i
i
i
i
12
“muldown 2010/1/10 page 12 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
1.22. Show that |x + y|2 = |x|2 + |y|2 ⇐⇒ x • y = 0. In this case, x and y are said to be orthogonal. This is sometimes denoted by x ⊥ y. 1.23. Is it true that |x + y| = |x| + |y| ⇐⇒ x = λy
or y = λx
with λ ≥ 0?
1.24. Two sets A and B have the same cardinality if there is a one-to-one function φ : A 7→ B such that φ(A) = B and φ−1 (B) = A. Show that the following sets have the same cardinality: (a) N = {1, 2, 3, . . .} and 2N = {2, 4, 6, 8, . . .}.
(b) [0, 1] and [0, 2].
(c) (0, 1) and (0, ∞) = {x : x > 0}.
(d) [0, 1] and [0, 1).
1.25. A set A is said to be finite if it has the same cardinality as some initial segment {1, . . . , n} of the natural numbers, and is said to be infinite otherwise. (Thus, finite means that the elements can be labeled a1 , . . . , an .) Show that a finite set of real numbers contains its inf and its sup. Hint: Induction. 1.26. A set A is countable if it has the same cardinality as N, the set of natural numbers, or if it is finite. Otherwise, the set is said to be uncountable. Show that a countable set need not contain its sup or its inf. (Countability means all elements of the set can be labeled by the natural numbers {a1 , a2 , a3 , . . .}.) 1.27. Show that the union of a countable collection of countable sets is countable. Hint: S1 : a1,1 → a1,2 a1,3 → . . . ւ ր ւ S2 : a2,1 a2,2 a2,3 ... ↓ ր ւ ր S3 : a3,1 a3,2 a3,3 , ... ւ ր ւ S4 : a4,1 a4,2 a4,3 , ... .. .. .. .. .. .. .. .. . : . . . . . . . The elements may be counted by the scheme indicated . Deduce that Q is countable. 1.28. [0, 1] is uncountable. Complete this sketch of proof: Suppose [0, 1] is countable and that [0, 1] = {a1 , a2 , . . .}.
At least one of the intervals [0, 13 ], [ 13 , 23 ], [ 23 , 1] does not contain a1 ; call this interval I1 . Subdivide I1 into three closed intervals, then a2 in not in one of those three; call it I2 . Continuing in this manner, we obtain a nested sequence of closed intervals, In , with the property that an 6∈ In . Therefore, none of the an are in ∩∞ k=1 Ik . But, by the nested interval theorem, the intersection is non-empty so there must be an x ∈ [0, 1] with x 6= an for any n. This contradicts our assumption.
i
i i
i
i
i
i
1.3. Topology
1.2.2
“muldown 2010/1/10 page 13 i
13
Convexity
Definition 1.27. For two points x, y ∈ Rn with x 6= y, we define (i) the line through x and y as {x + t(y − x) : t ∈ R};
(ii) and the line segment between x and y as the set {x + t(y − x) : t ∈ [0, 1]}.
A subset C of Rn is convex if
x, y ∈ C =⇒ x + t(y − x) ∈ C,
∀t ∈ [0, 1];
that is, if the line segment between any two points of the set is a subset of C. Example 1.28 S := {x : |x| ≤ 1} is convex. Proof. If |x| ≤ 1 and |y| ≤ 1, then |x + t(y − x)| = |(1 − t)x + ty| ≤ |(1 − t)x| + |ty|. triangle inequality ≤ (1 − t)|x| + t|y| ≤ (1 − t) + t = 1. Thus, x + t(y − x) ∈ S, for 0 ≤ t ≤ 1; that is S is convex.
Exercises 1.29. Prove that {x : |x| = 1} is not convex. 1.30. Prove that {(x, y) ∈ R2 : y > 0} is convex. 1.31. Let C be any collection of convex sets. Show that ∩A∈C A is convex. Is ∪A∈C A necessarily convex? 1.32. The convex hull H(A) of a set A is the intersection of all convex sets containing A as a subset. Prove that H(A) is convex. What is H(A) if A is a set consisting of two points only? 1.33. A subset C of Rn is a cone if {tx : x ∈ C} ⊂ C for all t ≥ 0. (i) Prove that a cone C is a convex set if and only if
{x + y : x ∈ C, y ∈ C} ⊂ C. (ii) Draw pictures of convex and non-convex cones in R2 .
1.3
Topology
Definition 1.29. If ρ > 0 and x0 ∈ Rn , then the open ball of center x0 and radius ρ is the set B(x0 , ρ) := {x : |x − x0 | < ρ}.
A neighborhood of x0 is any set U which contains an open ball with center x0 as a subset. A set A is open in Rn if it is a neighborhood of each of its points.
i
i i
i
i
i
i
14
“muldown 2010/1/10 page 14 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
Example 1.30 Consider the following examples: 1. Rn is open. 2. ∅ is open. 3. (0, 1) is open in R. 4. [0, 1) is not open in R. 5. {(x, y) : 0 < x < 1, y = 1} is not open in R2 6. B(x0 , ρ) is open in Rn Proof. (1),(2) and (5) are left to the reader. For (3) note that if x0 ∈ (0, 1), then B(x0 , δ) = (x0 − δ, x0 + δ) ⊂ (0, 1) when δ = min(x0 , 1 − x0 ). For (4), note that B(0, δ) = (−δ, δ) is not in [0, 1) for any δ > 0. Finally, to see (6), let x1 ∈ B(x0 , ρ). We will show that B(x1 , ρ1 ) ⊂ B(x0 , ρ) when ρ1 = ρ − |x1 − x0 | > 0, so that B(x0 , ρ) is a neighborhood of each of its points, and hence is open. Let x ∈ B(x1 , ρ1 ), then |x − x0 | = |x − x1 + x1 − x0 |
≤ |x − x1 | + |x1 − x0 | (triangle inequality) < ρ1 + |x1 − x0 | (p ∈ B(x1 , ρ1 )) = ρ − |x1 − x0 | + |x1 − x0 |
definition of ρ1
= ρ.
Thus, x ∈ B(x0 , ρ), and also B(x1 , ρ1 ) ⊂ B(x0 , ρ). Proposition 1.31 (Open set properties). For open sets in Rn , we have (a) ∅ and Rn are open. (b) If A and B are open sets, then A ∩ B is an open set. (c) The union of any collection of open sets is open. Proof. For (a), note that both ∅ and Rn are neighborhoods of their points (for ∅, it is true because there are no points). For (b), let x0 ∈ A ∩ B. Then x0 ∈ A =⇒ ∃ρ1 s.t. B(x0 , ρ1 ) ⊂ A,
(since A is open)
x0 ∈ B =⇒ ∃ρ2 s.t. B(x0 , ρ2 ) ⊂ B, (since B is open) =⇒ B(x0 , ρ) ⊂ A and B(x0 , ρ) ⊂ B for ρ = min(ρ1 , ρ2 ) =⇒ B(x0 , ρ) ⊂ A ∩ B =⇒ A ∩ B is open.
i
i i
i
i
i
i
1.3. Topology
“muldown 2010/1/10 page 15 i
15
For (c) let C be a collection of open sets, and select any x ∈ ∪A∈C A. Then x ∈ A for some A ∈ C
=⇒ B(x, ρ) ⊂ A for some ρ > 0 (A is open) =⇒ B(x, ρ) ∈ ∪A∈C A =⇒ ∪A∈C A is open
Definition 1.32. A set A is closed in Rn if its complement Ac := Rn \A := {x ∈ Rn : x 6∈ A} is an open set. Proposition 1.33 (Properties of closed sets.). (a) ∅ and Rn are closed. (b) If A and B are closed sets, then A ∪ B is a closed set. (c) The intersection of any collection of closed sets is closed. Example 1.34 Consider the following examples: 1. [0, ∞) := {x : x ≥ 0} is closed in R. (Thus, (−∞, 0) is open.) 2. [0, 1] is closed in R, i.e. (−∞, 0) ∪ (1, ∞) is open. 3. [0, 1) is not closed in R. (Why?) 4. {x : |x| ≥ 1} is closed in Rn . (Since B(O, 1) is open.) 5. {x : |x| ≤ 1} is closed in Rn . (Exercise.)
Proposition 1.35. For a closed set C and an open set V in Rn (a) C\V is closed; (b) V \C is open. Proof.
For (a) note that C\V := {x : x ∈ C
and x 6∈ V } = C ∩ V c .
which is closed since C and V c are closed. Part (b) is an exercise.
i
i i
i
i
i
i
16
“muldown 2010/1/10 page 16 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
Figure 1.3. A sequence of bisections. Notice that there are sets which are neither open nor closed (for example, [0, 1) in R). The sets Rn and ∅ are both open and closed. We will see that they are the only sets in Rn with this property. Definition 1.36. For a set S in Rn , a point x0 is a cluster point of S if each neighborhood of x0 contains a point x ∈ S with x 6= x0 . A cluster point of S need not be an element of S. For example, the set S = { n1 : n = 1, 2, 3, . . .} has 0 as a cluster point. Definition 1.37. A set S in Rn is bounded if there is a ρ > 0 such that S ⊂ B(O, ρ)
(=⇒ |x| < ρ,
∀x ∈ S).
Equivalently, S is bounded if it is contained in some closed interval in Rn . For a closed interval I = [a1 , b1 ] × . . . × [an , bn ] in Rn , the quantity p λ(I) := (b1 − a1 )2 + . . . + (bn − an )2
will be called the diameter of I. Note that if x, y are two points in I, the |x − y| ≤ λ(I). The next theorem has important consequences. Theorem 1.38 (Bolzano-Weierstrass Theorem). Every bounded infinite subset of Rn has a cluster point. Proof. Let K be a bounded infinite subset of Rn . Then, by definition, there is a closed interval I1 such that K ⊂ I1 . Bisect the sides of I1 to partition I1 into 2n subintervals. At least one of these new intervals must contain infinitely many points of K; pick one of these and call it I2 . Bisect the sides of I2 . Again, at least one of the 2n subintervals of I2 must contain infinitely many points of K; pick one of these and call it I3 . Continue this procedure inductively to define a sequence of closed intervals Ik with the properties:
i
i i
i
i
i
i
1.3. Topology
“muldown 2010/1/10 page 17 i
17
1. Each Ik contains infinitely many points of K. 2. 0 < λ(Ik ) =
1 2k−1 λ(I1 ),
k = 1, 2, 3, . . ..
3. There is a point x0 ∈ ∩k Ik 6= ∅ by Theorem 1.10. We must show that x0 is a cluster point of K. If this were not true, then there would be a radius ρ > 0 such that the ball B(x0 , ρ) does not intersect K: B(x0 , ρ) ∩ K = ∅. Now, choose k so that 0
0 so that B(x0 , ρ) ⊂ K c . That contradicts the definition of x0 being a cluster point of K. Hence, xo ∈ K. “⇐=” Let K contain all of its cluster points. If x1 ∈ K c , then x1 6∈ K, and is not a cluster point of K. Therefore, by definition of cluster points, there exists ρ > 0 such that B(x1 , ρ) ⊂ K c . Thus, K c is open and K is closed. Definition 1.40. A collection G of open sets is an open cover of a set K if K ⊂ ∪U∈G U. A set K is Rn is compact if every open cover of K has a finite subcover; that is, for any open cover G, there is a finite collection of sets U1 , . . . , Um from G such that K ⊂ ∪m j=1 Uj . Some simple examples: Example 1.41 The following examples illustrate the concept of compactness: 1. A finite set is compact. 2. [0, ∞) is not compact. Indeed, the sets Gk := (−1, k), k ∈ N form an open cover of [0, ∞), but there cannot be a finite subcover. (Why?)
i
i i
i
i
i
i
18
“muldown 2010/1/10 page 18 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space 3. (0, 1) is not compact. (Find the infinite open cover that does not have a finite subcover.)
The next theorem is a very important characterization of compact subsets of Rn . Theorem 1.42 (Heine-Borel Theorem). A subset K of Rn is compact iff and only if K is closed and bounded. Proof. “=⇒” Let K be compact. We first show K is closed. Let x0 be any point in K c . We wish to find a ball B(x0 , ρ) ⊂ K c for some ρ. Then K c would be open, and hence, K would be closed. Consider the collection G of open sets Gk , k ∈ N, where Gk := {x : |x − x0 | >
1 }. k
(Why is Gk open?) Clearly, the collection G covers all of Rn \{x0 }, and hence covers K. Since K is compact, there is a finite subcover {Gki : i = 1, . . . , m} of K. Now, the Gj are nested with Gj ⊂ Gi if i > j. Hence, letting k0 = max{k1 , . . . , km }, we have K ⊂ Gk0 . In other words, B(x0 ,
1 1 1 ) := {x : |x − x0 | < } ⊂ {x : |x − x0 | ≤ } = Gck0 ⊂ K c . k0 k0 k0
Hence, K c is open and K is closed. To show that K is bounded, consider the open cover consisting of all balls B(O, k), k ∈ N. This collection covers all of Rn , and so is an open cover of K. Therefore, there is a finite subcover {B(O, ki : i = 1, . . . , m} of K. Hence, K ⊂ B(O, k0 ) where k0 := max{k1 , . . . , km }. Thus, K is bounded. “⇐=” Let K be closed and bounded. The proof will be by contradiction. If K is not compact, then there exists an open cover G = {Gα } of K such that K is not contained in any union of finitely many Gα ’s. Since K is bounded, it is contained in some closed interval I1 in Rn . Bisect the sides of I1 to obtain 2n subintervals (as in the proof of the Bolzano-Weierstrass Theorem). At least one of these subintervals intersects K in such a way that this intersection cannot be covered by finitely many Gα ’s. Select such a subinterval as I2 . Proceeding inductively in this way, we obtain a nested sequence of closed intervals Ik , k = 1, 2, . . ., such that 1. Each K ∩ Ik cannot be covered by finitely many Gα ’s, k = 1, 2, . . .. 2. λ(Ik ) = λ(I1 )/2k−1 , k = 1, 2, . . .. By the Nested Intervals Theorem, there is a point x0 ∈ ∩k Ik , and this point is a cluster point of K (as in the Bolzano-Weierstrass Theorem’s proof). Since K is closed, x0 ∈ K. Therefore, x0 ∈ Gα0 for some α0 , since the Gα cover K. But, Gα0
i
i i
i
i
i
i
1.3. Topology
“muldown 2010/1/10 page 19 i
19
B A Figure 1.4. A disconnection, is open, so there is a ρ > 0 so that B(x0 , ρ) ⊂ Gα0 . Now, by (2), we can chose k so large that λ(Ik ) < ρ. Hence, Ik ⊂ B(x0 , ρ) ⊂ Gα0 , which contradicts (1).
Definition 1.43. A subset D of Rn is disconnected, if there are open sets A and B such that (i) A ∩ D 6= ∅, B ∩ D 6= ∅. (ii) (A ∩ D) ∩ (B ∩ D) = ∅ (iii) (A ∩ D) ∪ (B ∩ D) = D The sets A and B are called a disconnection of D. A subset D of Rn is said to be connected if it is not disconnected. Example 1.44 1. The set N = {1, 2, 3, . . .} is disconnected. A disconnection is given, example, by (A, B) = ((−∞, 32 ), ( 23 , ∞)). √ √ 2. Q is disconnected, with a disconnection given by (A, B) = ((−∞, 2), ( 2, ∞)). 3. [0, 1] is connected. Proof. We will prove that [0, 1] is connected. Suppose it were disconnected and (A, B) provides the disconnection: (i) A ∩ [0, 1] 6= ∅, B ∩ [0, 1] 6= ∅. (ii) (A ∩ [0, 1]) ∩ (B ∩ [0, 1]) = ∅. (iii) (A ∩ [0, 1]) ∪ (B ∩ [0, 1]) = [0, 1].
i
i i
i
i
i
i
20
“muldown 2010/1/10 page 20 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
y B
x A
Figure 1.5. The line determined by x and y and a disconnection of R. Now, 1 belongs to one of the sets, say 1 ∈ B. Let c := sup{A ∩ [0, 1]},
which implies by
0 ≤ c ≤ 1.
Thus, c ∈ A ∪ B, so c ∈ A, or c ∈ B (but not both by (ii)). If c ∈ A, then c ∈ A ∩ [0, 1] and 1 ∈ B implies c ∈ A ∩ [0, 1). Since A is open, and c < 1, there is a δ > 0 with [c, c + δ) ⊂ A ∩ [0, 1). This contradicts the definition of sup. If c ∈ B, then since B is open, there exists a δ > 0 such that (c − δ, c) ⊂ B. But this again contradicts the definition of c as the sup{A ∩ [0, 1]}. In either case, we arrive at a contradiction, so no disconnection of [0, 1] exists. Theorem 1.45. Rn is connected. Proof.
Suppose (A, B) is a disconnection of Rn . Then
(i) A 6= ∅, B 6= ∅. (ii) A ∩ B = ∅. (iii) A ∪ B = Rn . Choose any points x ∈ A and y ∈ B, and consider the subsets of R determined by A1 := {t ∈ R : x + t(y − x) ∈ A},
B1 := {t ∈ R : x + t(y − x) ∈ B}.
Since A, B are open in Rn , both A1 and B1 are open in R (this is not obvious so think how to show it). Now the statements (i), (ii) and (iii) above imply (i)1 A1 ∩ [0, 1] 6= ∅, B1 ∩ [0, 1] 6= ∅. (Indeed, 0 ∈ A1 and 1 ∈ B1 .) (ii)1 (A1 ∩ [0.1]) ∩ (B1 ∩ [0, 1]) = ∅. (iii)1 (A1 ∩ [0, 1]) ∪ (B1 ∩ [0, 1]) = [0, 1].
i
i i
i
i
i
i
1.3. Topology
“muldown 2010/1/10 page 21 i
21
This means, (A1 , B1 ) is a disconnection of [0, 1] which, we have seen, is connected. The contradiction implies that Rn is connected.
Corollary 1.46. The only sets in Rn which are both open and closed are Rn and ∅. Proof. If the nonempty set A 6= Rn is both open and closed, then so is Ac . Then, (A, Ac ) would provide a disconnection of Rn . More restricted notions of connectedness are sometimes used. A set C is polygonally connected, if for each pair of point x0 , xm in C, there is a finite subset {x1 , . . . , xm−1 } of C such that the polygon {xi−1 + t(xi − bf xi−1 ) : t ∈ [0, 1], i = 1, 2, . . . , m} is a subset of C. A set C is arcwise connected if, for each pair of points x, y ∈ C, there is a path joining x to y lying entirely in C. That is, there is a continuous function f : [0, 1] 7→ C such that f (0) = x and f (1) = y. Either polygonally connected or arcwise connected will imply the set is connected in the sense defined here. However, the converse is not true. For example, the set {(x, y) ∈ R2 : x 6= 0, 0 < y ≤ x2 } ∪ {(0, 0)} is connected (in fact, arcwise connected), but is not polygonally connected. The set {(x, y) ∈ R2 : y = sin( x1 ), x 6= 0} ∪ {(0, y) : −1 ≤ y ≤ 1} is connected, but is not arcwise connected.
Exercises 1.33. Property (b) of Proposition 1.31 implies that the intersection of any finite collection of open sets is open. Show that it is not true that this holds for an infinite collection of open sets. 1.34. Prove items (a), (b), and (c) of Proposition 1.33. 1.35. Prove that the two definitions of a bounded set are equivalent. 1.36. Prove that the intersection of any finite collection of open sets is open. Hint: Use Property (b) of open sets and induction. 1.37. Prove that {x : |x| ≤ 1} is closed in Rn . 1.38. Prove that a subset U of Rn is open if and only if it is the union of a collection of open balls. 1.39. If A is a subset of Rn , then A, the closure of A, is the intersection of all closed sets which contain A as a subset. Show that (a) A is closed. (b) A ⊂ A. (c) A = A.
i
i i
i
i
i
i
22
“muldown 2010/1/10 page 22 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space (d) A ∪ B = A ∪ B. (e) ∅ = ∅.
(f ) Observe that A is the smallest closed set containing A. (g) Prove that B(O, 1) = {x : |x| ≤ 1}.
(h) If A and B are subsets of R, then is A ∩ B = A ∩ B?
1.40. If A is a subset of Rn , then A◦ , the interior of A, is the union of all open sets contained in A. Show that (a) A◦ is open. (b) A◦ ⊂ A.
(c) A◦ ◦ = A◦ . (d) (A ∩ B)◦ = A◦ ∩ B ◦ . (e) Rn ◦ = Rn .
(f ) Observe that A◦ is the largest open set contained in A. ◦
(g) Prove that B(O, 1) = B(O, 1). (h) Is there a subset A ⊂ R such that A◦ = ∅ and A = R?
1.41. Let A ⊂ Rn . The derived set A′ of A is the set of all cluster points of A (a) Prove that A′ is closed.
(b) Prove that A = A ∪ A′ .
(c) Theorem 1.39 says that A is closed if and only if A′ ⊂ A. A set A for which A = A′ is called perfect. Give examples of perfect and non-perfect sets.
1.42. If A ⊂ Rn , then ∂A, the boundary of A, is the set of all points x such that each neighborhood of x contains a point of A and a point of Ac . (a) Show A is closed if and only if ∂A ⊂ A.
(b) Show that ∂(∂A) = ∂A; hence ∂A is closed. (c) Show that ∂A = A\A◦ . 1.43. For each of the following sets give its closure, interior, derived set, and boundary. (a) {x ∈ Rn : 0 < |x| < 1}
(b) { n1 ∈ R : n ∈ N}
1 (c) {( n1 , m ) ∈ R2 : n, m ∈ N}
(d) {x ∈ Rn : |x| < 1}.
1.44. Without using the Heine-Borel Theorem show that {(x, y) : x2 + y 2 < 1} is not compact on R2 . 1.45. Let A and B be open in R. Prove that A × B is open in R2 . 1.46. Show that a finite subset of Rn is closed.
i
i i
i
i
i
i
1.3. Topology
“muldown 2010/1/10 page 23 i
23
1.47. Show that a countable subset of R is not open. Show that it may or may not be closed. 1.48. Let S be an uncountable subset of R. Show that S has a cluster point. Hint: Show that at least one of the intervals [n, n + 1], n ∈ Z, must contain uncountably many points of S. 1.49. Show that a closed interval is closed. 1.50. Let Q2 denote the set of points in R2 with rational coordinates. What is the interior of Q2 ? The boundary of Q2 ? Show that Q2 is not connected. 1.51. Show that Qc is not connected. Show that (Q2 )c is connected, in fact, polygonally connected. 1.52. Show that an open connected set in Rn is also polygonally connected.
i
i i
i
i
i
i
24
“muldown 2010/1/10 page 24 i
Chapter 1. The Real Number system & Finite Dimensional Cartesian Space
i
i i
i
i
i
i
“muldown 2010/1/10 page 25 i
Chapter 2
Limits, Continuity, and Differentiation
2.1
Sequences
Definition 2.1. A sequence in Rn is a function from p : N 7→ Rn . Sequences are usually denoted by {pk } where pk := p(k). Example 2.2 Two simple sequences: 1. pk := k1 , {pk } is a sequence in R. 2. pk := ( k1 , sin(k 2 )); {pk } is a sequence in R2 .
Definition 2.3. A sequence {pk } in Rn is convergent if there exists p ∈ Rn such that for each neighborhood U of p, there is a natural number N = N (U ) for which k ≥ N implies pk ∈ U . Write limk→∞ pk = p, or lim{pk } = p. Equivalently, {pk } converges if there is a point p ∈ Rn such that for each ǫ > 0, there exists and N such that k ≥ N implies |pk − p| < ǫ. p is called the limit of the sequence. The sequence {pk } is said to be divergent if it is not convergent. Proposition 2.4. A convergent sequence cannot have two limits. Proof.
Suppose lim{pk } = p and lim{pk } = q. If ǫ > 0, then ∃N1 such that k ≥ N1 =⇒ |pk − p| < ǫ, ∃N2 such that k ≥ N2 =⇒ |pk − q| < ǫ. 25
i
i i
i
i
i
i
26
“muldown 2010/1/10 page 26 i
Chapter 2. Limits, Continuity, and Differentiation
If k = max{N1 , N2 }, then |q − p| = |q − pk + pk − p| ≤ |pk − q| + |pk − p| < 2ǫ. Therefore, |q − p| < 2ǫ for each ǫ > 0; thus |p − q| = 0 (Archimedian Property). Hence, p = q. Example 2.5 Two simple limits: 1. xk = 1, k = 1, 2, 3, . . ., then limk→∞ xk = 1. 2. limk→∞
1 k
= 0 (from the Archimedean Property, Theorem 1.6).
Remarks: Notice that if limk→∞ pk = p, then p is either (a) a cluster point of the set {pk : k ∈ N}, or (b) {pk } is ultimately constant and equal to p. (That is, there is an N such that k ≥ N =⇒ pk = p}.) Note also that limk→∞ pk = p if and only if limk→∞ |pk − p| = 0. Theorem 2.6. A convergent sequence is bounded. Proof. If limk→∞ pk = p, there exists a natural number N such that k ≥ N =⇒ |pk − p| < 1. By the triangle inequality |pk | − |p| ≤ |pk − p| < 1, if k ≥ N =⇒ |pk | < |p| + 1, if k ≥ N
=⇒ |pk | ≤ max{|p1 |, . . . , |pN −1 |, |p| + 1},
∀k ∈ N.
Example 2.7 If pk = k, {pk } is divergent. By the Archimedian property, {pk } is unbounded and so is divergent by Theorem 2.6. Definition 2.8. If {kj } is a sequence of natural numbers such that k1 < k2 < k3 < . . ., then {pkj } is called a subsequence of {pk }. Theorem 2.9. A bounded sequence has a convergent subsequence. Proof. There are two cases to consider. Either {pk : k ∈ N} is a finite set or it is an infinite set. If it is a finite set, then there is at least one value p such that pk = p for infinitely many k – these terms in the sequence form a subsequence of {pk }. In the other case, {pk : k ∈ N} is a bounded infinite set and so by the Bolzano-Weierstrass Theorem 1.38, it has a cluster point p. For the ball B(p, 1), there is a k1 such that pk1 ∈ B(p, 1). Suppose kj is such that pkj ∈ B(p, 1j ), then
i
i i
i
i
i
i
2.1. Sequences
“muldown 2010/1/10 page 27 i
27
1 ). So, by induction, there there must exist kj+1 > kj such that pkj+1 ∈ B(p, j+1 exists a subsequence {pkj } of {pk } such that |pkj − p| < 1j , j = 1, 2, 3, . . .. The Archimedian Property implies limj→∞ pkj = p.
Corollary 2.10. If K ⊂ Rn , then K is compact if and only if each sequence of points in K has a subsequence convergent to a point in K. Proof. “=⇒” The Heine-Borel Theorem shows that K is compact if and only if it is closed and bounded. Thus, any sequence of points in K is bounded, and so by Theorem 2.9 contains a convergent subsequence. The limit of this sequence is either a cluster point of the sequence (and hence of K), and so is contained in K since K is closed, or else the sequence is ultimately constant so its limit is one of the members of the sequence and is in K. “⇐=” Suppose each sequence in K has a subsequence convergent to a point of K. To show that K is closed, suppose it is not. Then there exists a cluster point p of K such that p 6∈ K. Thus, there exists a sequence of points {pk } from K (pk chosen in B(p, k1 )∩K) such that limk→∞ pk = p 6∈ K, contradicting our hypothesis that the limit must be in K. To show that K is bounded, again suppose it is not. Then there exist pk ∈ K such that |pk | > k, k = 1, 2, 3, 4 . . .. Then the unbounded sequence {pk } cannot converge, contradicting our hypothesis. Thus, K is closed and bounded, and thus compact, by the Heine-Borel Theorem.
Theorem 2.11. A sequence is convergent to a limit p if and only if each subsequence is convergent and has limit p. Proof. “=⇒” Suppose limk→∞ pk = p, that is, for each ǫ > 0 there exists N such that j ≥ N =⇒ |pj − p| < ǫ. If {pkj } is a subsequence of {pk }, then j ≥ N =⇒ kj ≥ j ≥ N =⇒ |pkj − p| < ǫ. Thus, limj→∞ pkj = p. “⇐=” Suppose each subsequence {pkj } of {pk } satisfies limj→∞ pkj = p. But, {pk } is a subsequence of itself. Example 2.12 If pk = (−1)k , then {pk } is divergent. Indeed, lim p2k = 1,
k→∞
lim p2k+1 = −1.
k→∞
If {pk } were convergent, these two limits would be the same number by the previous theorem.
i
i i
i
i
i
i
28
“muldown 2010/1/10 page 28 i
Chapter 2. Limits, Continuity, and Differentiation
Theorem 2.13. If {pk } and {qk } are sequences in Rn such that lim pk = p
k→∞
and
lim qk = q,
k→∞
then (i) limk→∞ (pk + qk ) = p + q, (ii) limk→∞ (pk · qk ) = p · q. Further, if {xk } is a sequence in R such that limk→∞ xk = x, then (iii) limk→∞ xk pk = xp, and (iv) limk→∞
1 xk pk
= x1 p, if xk , x 6= 0.
Proof. Exercise 2 The following result shows that it is sufficient to consider only sequences in R when considering convergence or divergence. Theorem 2.14. Let pk = (x1,k , . . . , xn,k ) ∈ Rn . Then {pk } is convergent if and only if each of the sequences {xi,k } is convergent for i = 1, 2, . . . , n. Furthermore, lim pk = p = (x1 , . . . , xn ) ⇐⇒ lim xi,k = xi ,
k→∞
k→∞
i = 1, 2, . . . , n.
Proof. “=⇒” |pk − p| ≥ |xi,k − xi |, for all k ∈ N and for i = 1, 2, . . . , n. The remainder of the proof√is an exercise. “⇐=” |pk −p| ≤ n max{|xi,k −xi | : i = 1, . . . , n}. Thus, if limk→∞ xi,k = xi . i = 1, . . . , n, given any ǫ > 0, there exists Ni such that k ≥ Ni =⇒ |xi,k − xi | < √ǫn , for each i = 1, . . . , n. Let N = max{Ni : i = 1, . . . , n} so k ≥ N implies ǫ |xi,k − xi | < √ , n
i = 1, . . . , n =⇒ |pk − p| < ǫ =⇒ lim pk = p. k→∞
Example 2.15 Further examples of limits of sequences: (1) limk→∞ ( k1 , k12 +1) = (0, 1). Use Theorems 2.13 and 2.14 together with limk→∞ k1 = 0 (Archimedian Property), to note that limk→∞ k12 = 0 and limk→∞ k12 +1 = 1 (both by Theorem 2.13), and then Theorem 2.14 to complete the example. (2) limk→∞
2k+3 k+2
= 2. Indeed,
2+ 2k + 3 = k+2 1+
3 k 2 k
and
lim
k→∞
3 1 = 0 =⇒ lim = 0, k→∞ k k
lim
k→∞
2 = 0, k
and the result follows by an application of Theorem 2.13.
i
i i
i
i
i
i
2.1. Sequences
“muldown 2010/1/10 page 29 i
29
Exercises 2.1. 2.2. 2.3. 2.4.
Show that the two definitions of the limit of a sequence are equivalent. Prove Theorem 2.13 Finish the proof of “=⇒” in Theorem 2.14. Let xk ≤ zk ≤ yk . Prove that if {xk } and {yk } are convergent with limit c, then {zk } is convergent with limit c. 2.5. Discuss the convergence or divergence of the sequences whose kth terms are given by: k k k (b) (−1) (c) 3k2k (a) k+1 2 +1 k+1 (d)
2k2 +3 3k2 +1
(e) ( k1 , k)
(f) ((−1)k , k1 )
2.6. If {xk } is a sequence of nonnegative real numbers which converge to x, show √ √ √ xk − x √ √ . that limk→∞ xk = x. HINT: xk − x = √ xk + x √ √ √ 2.7. Let xk = k + 1 − k. Discuss the convergence of {xk } and k xk . 2.8. Show that a set C in Rn is closed if and only if each convergent sequence in C has its limit in C. 2.9. Show limk→∞ pk = p if and only if limk→∞ |pk − p| = 0. 1 2.10. If 0 < r < 1, show that limk→∞ rk = 0. HINT: Set r = 1+s , s > 0. Show 1 1 k k . From this that (1 + s) ≥ 1 + ks, k = 1, 2, . . ., so 0 < r = (1+s)k ≤ 1+ns show that 0, if |r| < 1; lim rk = 1, if r = 1 k→∞ and {rk } is divergent if r = −1 or |r| > 1.
2.11. (Ratio Test) Let {xk } be such that limk→∞ xxk+1 = r. Show k
(a) If 0 ≤ r < 1, then limk→∞ xk = 0. (b) If r > 1, then {xk } is divergent. (c) If r = 1, give examples which show that {xk } may be convergent or divergent.
HINT: In (a), if r < s < 1, show that for some constant A and all large enough k, 0 ≤ |xk | ≤ Ask (using the previous exercise). k
2.12. Show that the sequence xk has limit 0 if −1 ≤ x ≤ 1, and is divergent if |x| > 1. xk 2.13. Show that lim = 0 for all real numbers x. k→∞ k! 1 2.14. If x > 0 show limk→∞ x k = 1. HINT: If 0 < x < 1, given ǫ > 0 there exists 1 N such that k ≥ N implies 1+ǫk < x < 1. (Why?) Therefore 1 1 ≤ 0. Show k = (1 + xk )k > k(k+1) 2 xk , and hence, limk→∞ xk = 0. 2 1
2.16. (Root Test) Let {xk } be such that lim |xk | k = r. Show k→∞
(a) If 0 ≤ r < 1, then limk→∞ xk = 0. (b) If r > 1, then {xk } is divergent.
(c) If r = 1, then {xk } may be either convergent or divergent.
1
2.17. If a and b are nonnegative real numbers, show that lim (ak +bk ) k = max{a, b}. k→∞
Definition 2.16. A real sequence {xk } is increasing if x1 ≤ x2 ≤ x3 ≤ . . ., and is decreasing if x1 ≥ x2 ≥ x3 ≥ . . .. The sequence is monotone if it is either increasing or decreasing. Theorem 2.17. A monotone sequence is convergent if and only if it is bounded. Proof. Let {xk } be an increasing sequence. By Theorem 2.6, {xk } convergent implies {xk } is bounded. On the otherhand, {xk } bounded implies sup{xk : k = 1, 2, . . .} = x exists. If ǫ > 0, then x − ǫ is not an upper bound for {xk }, so there exists an N such that x − ǫ < xN ≤ x. Since {xk } is increasing k ≥ N =⇒ x − ǫ < xN ≤ xk ≤ x, that is, k ≥ N =⇒ |xk − x| < ǫ. Thus, the increasing sequence converges to its supremum. Remark: Given any ordered field F , each monotone bounded sequence in F is convergent (with its limit in F if and only if F is complete. Example 2.18
1. limk→∞ rk = 0 if 0 ≤ r < 1.
Note that 0 < rk+1 = rrk ≤ rk < 1 so {rk } is decreasing and bounded, hence is convergent with 0 ≤ limk→∞ rk < 1. Now {rk+1 } is a subsequence of {rk } so they have the same limit. But, limk→∞ rk > 0 implies lim rk+1 = r lim rk < lim rk ,
k→∞
k→∞
k→∞
a contradiction. (A different proof is given in the exercises.) 2. If ak =
k 2k ,
then limk→∞ ak = 0. Indeed, ak+1 =
k+1 k+1 = ak =⇒ 0 < ak+1 ≤ ak . k+1 2 2k
i
i i
i
i
i
i
2.1. Sequences
“muldown 2010/1/10 page 31 i
31
Hence, {ak } is convergent and limk→∞ ak ≥ 0. Now, lim ak = lim ak+1 = lim
k→∞
k→∞
k→∞
1 k+1 ak = lim ak 2k 2 k→∞
implies limk→∞ ak must be zero. 3. For ak = (1 + k1 )k , the sequence {ak } is convergent and 2 ≤ limk→∞ ak ≤ 3. Indeed, by the binomial theorem 1 k k! 1 k 1 k(k + 1) 1 + ... + ) =1+ + k k! k 2! k2 k! k k 1 1 1 1 2 1 1 21 = 1 + 1 + (1 − ) + (1 − )(1 − ) + . . . + (1 − ) · · · . 2! k 3! k k k! k kk
ak = (1 +
Notice that the typical term after the first two has the form 1 ℓ−1 1 (1 − ) · · · (1 − ) ℓ! k k and this increases in k. Thus, each term of ak is no greater than the corresponding term in ak+1 and ak+1 has one more term. Thus, ak+1 > ak and {ak } is increasing. We next show ak is bounded. Examining the above, we find 1 1 1 + + ...+ 2! 3! k! 1 1 1 ≤ 1 + 1 + + 2 + . . . + k−1 ≤ 3. 2 2 2
2 < ak ≤ 1 + 1 +
Thus, 2 ≤ limk→∞ (1+ k1 )k ≤ 3. This limit is sometimes taken as the definition of e. k+1
Note: We used the fact that 1 + r + r2 + . . . + rk = 1−r if r = 6 1. You 1−r will recall that this is easily proved by denoting the left-hand side by sk and showing sk − rsk = 1 − rk+1 . √ 4. If A > 0, x1 > 0, and xk+1 = 12 (xk + xAk ), k = 1, 2, . . ., then limk→∞ xk = A. This is an algorithm for computing square roots. Notice that x1 > 0 =⇒ xk > 0 by induction. Then x2k+1 =
1 2 A2 (xk + 2A + 2 ) 4 xk
A2 1 A 2 1 2 ) ≥ 0. (xk − 2A + 2 ) = (xk − 4 xk 4 xk √ ≥ A =⇒ xk+1 ≥ A, k = 1, 2, . . . .
=⇒ x2k+1 − A = =⇒ x2k+1 Hence, xk ≥
√
A, for k = 2, 3, . . .. Thus,
1 1 x2k − A A A 1 ) = (xk − )= ≥ 0. xk − xk+1 = xk − (xk + 2 xk 2 xk 2 xk
i
i i
i
i
i
i
32
“muldown 2010/1/10 page 32 i
Chapter 2. Limits, Continuity, and Differentiation So, {x √k } is decreasing and bounded below by L ≥ A. But, on the other hand, xk+1 =
√ A > 0. Therefore, limk→∞ xk =
√ 1 A A 1 ) =⇒ L = (L + ) =⇒ L2 = A =⇒ L = A. (xk + 2 xk 2 L
The results on monotone sequences are interesting in that, unlike the proceeding examples, it is not necessary first to guess the limit of a sequence in order to show that it converges. Fortunately, we are able to do this in general. Definition 2.19. A sequence {pk } in Rn is a Cauchy sequence if, for each ǫ > 0, there exists a natural number N = N (ǫ) such that if k, m ≥ N , then |pk − pm | < ǫ. Theorem 2.20 (Cauchy Criterion). {pk } is convergent if and only if {pk } is a Cauchy sequence. Proof. “=⇒” Suppose limk→∞ pk = p. Thus, for each ǫ > 0, there exists N such that k ≥ N =⇒ |pk − p| < ǫ/2. Hence, if k, m ≥ N , we have |pk − pm | = |pk − p + p − pm | ≤ |pk − p| + |p − pm |
N =⇒ |pk − pN | < 1 =⇒ |pk | < |pN | + 1. Hence, |pk | ≤ max{|p1 |, . . . , |pN −1 |, |pN | + 1}. Suppose now that {pkj } is the subsequence that converges. We claim limj→∞ pkj = p =⇒ limk→∞ pk = p. Indeed, since limj→∞ pkj = p, for each ǫ > 0 there is a N1 such that j ≥ N1 =⇒ |pkj − p| < ǫ/2. Since {pk } is a Cauchy sequence, there is also an N2 such that m, k ≥ N2 =⇒ |pk − pm | < ǫ/2. Take any kj ≥ max{N1 , N2 }. Then for k ≥ max{N1 , N2 }, we have |pk − p| = |pk − pkj + pkj − p| ≤ |pk − pkj | + |pkj − p|
0. (Exercise 15). An easier proof is 1 now available to us from our results on monotone sequences. Show {x k } is 1 monotone and bounded, hence convergent. Consider the subsequence {x 2k } and deduce the result. 2.26. Let {xk } be a sequence of positive real numbers. Show that 1 xk+1 = r =⇒ lim xkk = r. k→∞ k→∞ xk
lim
Hint: If r > 0 and ǫ > 0, then there exist positive numbers A, B and a natural number N such that A(r − ǫ)k ≤ xk ≤ B(r + ǫ)k ,
∀ k > N. k
2.27. By applying the last exercise to the sequence { kk! } show that lim
k→∞
k (k!)
1 k
= e,
where
e = lim (1 + k→∞
1 k ) . k
1 1 1 + 2! + . . . + k! . We have seen that both 2.28. Let sk = (1 + k1 )k , tk = 1 + 1! sequences are convergent. Show that they have the same limit. Hint: First use the Binomial Theorem to show
sk = 1 +
1 1 1 1 1 k−1 + (1 − ) + . . . + (1 − ) · · · (1 − ) ≤ tk . 1! 2! k k! k k
Next, with m fixed, and k ≥ m, show that sk ≥ 1 +
1 1 1 1 1 m−1 + (1 − ) + . . . + (1 − ) · · · (1 − ) 1! 2! k m! k k
and deduce that limk→∞ sk ≥ tm for each m. 2.29. Show that every sequence of real numbers has a monotone subsequence. (Be careful.) 2.30. Let f be an ordered field. Show that the following statements are equivalent. (a) F is complete (sets bounded above have a supremum in F ). (b) F has the nested interval property.
i
i i
i
i
i
i
2.2. Continuity
“muldown 2010/1/10 page 35 i
35
(c) Each bounded monotone sequence in F converges to an element of F . (d) Each Cauchy sequence in F converges to an element of F . Discussion: All four statements in this problem are equivalent. However, (a), (b), (c) are inapplicable if one wishes to consider completeness for sets which are not ordered, whereas (d) is applicable to any set X for which distance between points, i.e., a metric, has been defined. A function ρ : X × X 7→ R is a metric if (i) ρ(x, y) ≥ 0, ∀ x, y ∈ X. (ii) ρ(x, y) = 0 ⇐⇒ x = y. (iii) ρ(x, y) = ρ(y, x), ∀ x, y ∈ X. (iv) ρ(x, y) ≤ ρ(x, z) + ρ(z, y) for all x, y, x ∈ X. (Triangle inequality)
The pair (X, ρ) is called a metric space. A sequence {xk } in X is a Cauchy sequence (or fundamental sequence) if, for each ǫ > 0, there exists a natural number N such that m, k ≥ N =⇒ ρ(xm , xk ) < ǫ. A metric space {X, ρ} is called complete if each Cauchy sequence in {X, ρ} is convergent (i.e., there exists x ∈ X such that limk→∞ ρ(xk , x) = 0 whenever {xk } is a Cauchy sequence). For example, the space {Rn , ρ} is a complete metric space if ρ(p, q) = |p − q|. It is easily seen that ρ is a metric from the properties (i),(ii),(iii),and (iv) above. Theorem 2.20 implies the completeness of Rn .
2.2
Continuity
Let f : Rn 7→ Rm , and let D ⊂ Rn be the domain of f . Definition 2.22. Suppose p0 ∈ D, we say f is continuous at p0 if, for each neighborhood V of f (p0 ), there exist a neighborhood U of p0 such that p ∈ U ∩D =⇒ f (p) ∈ V , (i.e. f (U ) ⊂ V ). Equivalently, f is continuous at p0 if, for each ǫ > 0, there is a δ > 0 such that p∈D
and
|p − p0 | < δ =⇒ |f (p) − f (p0 )| < ǫ.
In general, δ = δ(ǫ, p0 ).
Definition 2.23. The function f is continuous on D if it is continuous at each point p0 of D. Example 2.24 Let f (x) = x2 , −1 ≤ x ≤ 1, then D = [−1, 1], f (D) = [0, 1], and |f (x) − f (x0 )| = |x2 − x20 | = |(x − x0 )(x + x0 )| ≤ |x − x0 |(|x| + |x0 |) =⇒ |f (x) − f (x0 )| ≤ 2|x − x0 | for x, x0 ∈ [−1, 1]
If ǫ > 0, take δ = ǫ/2, then |x − x0 | < δ and x, x0 ∈ [−1, 1] implies |f (x) − f (x0 )| < ǫ, so f is continuous on [−1, 1].
i
i i
i
i
i
i
36
“muldown 2010/1/10 page 36 i
Chapter 2. Limits, Continuity, and Differentiation
U
f
0110..p 00 11 00p0 11
V
. f(p0 ) .
f(p)
f(D) D n R
R
m
Figure 2.1. f is continuous at f (p0 ). Theorem 2.25. Let f : D 7→ Rm , D ⊂ Rn . Then f is continuous at p0 ∈ D if and only if for each sequence {pk } in D such that limk∞ pk = p0 we have limk→∞ f (pk ) = f (p0 ). Proof. “=⇒” f continuous at p0 implies that for every ǫ > 0, there is a δ > 0 such that for p ∈ D and |p − p0 | < δ we have |f (p) − f (p0 )| < ǫ. Now, {pk } ∈ D and limk→∞ pk = p0 implies the existence of N such that k ≥ N =⇒ |pk − p0 | < δ. Hence, for this N we have k ≥ N =⇒ |f (pk ) − f (p0 )| < ǫ; that is limk→∞ f (pk ) = f (p0 ). “⇐=” Suppose limk→∞ f (pk ) = f (p0 ) for every sequence {pk } in D with limk→∞ pk = p0 . Assume f is not continuous at p0 . Then, negating the definition of continuity, there exists an ǫ0 > 0 such that each neighborhood U of p0 contains a point pU for which |f (pU ) − f (p0 )| ≥ ǫ0 . Consider U = B(p0 , k1 ) and set pU = pk , for each k = 1, 2, . . .. Then limk→∞ pk = p0 , and |f (pk ) − f (p0 )| ≥ ǫ0 , so f (pk ) does not converge to f (p0 ), contrary to our hypothesis. Hence, the assumption that f is not continuous at p0 is false.
Corollary 2.26. (i) If f, g : Rn 7→ Rm are continuous at p0 , then f + g and f · g are continuous at p0 . (ii) If f : Rn 7→ Rm and λ : Rn 7→ R are continuous at p0 , then λf is continuous at p0 and λ1 f is continuous at p0 when λ(p0 ) 6= 0. Proof. This follows immediately from the corresponding theorem for sequences Theorem 2.13.
Corollary 2.27. A function f : Rn 7→ Rm is continuous at p0 if and only if each component of f is continuous at p0 .
i
i i
i
i
i
i
2.2. Continuity Proof.
“muldown 2010/1/10 page 37 i
37
If f (p) = (f1 (p), . . . , fm (p)) and limk→∞ pk = p0 , then lim f (pk ) = f (p0 ) ⇐⇒ lim fi (pk ) = fi (p0 ),
k→∞
k→∞
i = 1, 2, . . . , m,
by Theorem 2.14. Example 2.28 Some examples of continuous and discontinuous functions: (1) D = [0, 1], f (x) = 1, for 0 < x ≤ 1, and f (0) = 0. Then f is discontinuous at x = 0. Indeed, if xk is any sequence in (0, 1] with limk→∞ xk = 0, then f (xk ) = 1 so limk→∞ xk = 1 6= 0. (2) Let D = R, f (x) = sin( x1 ), for x 6= 0 and f (0) = 0. Then f is discontinuous at x = 0 since 2 2 = (−1)k+1 , = 0 and f lim k→∞ (2k + 1)π (2k + 1)π and the latter sequence is not convergent.
Remark: The discontinuity in the first example is removable, i.e. the discontinuity at x = 0 can be removed by changing the value of the function at 0. The discontinuity in the second example is essential since no matter what value is assigned to the function at x = 0, the discontinuity cannot be removed. (3) D = R2 , f (x, y) = x2xy +y 2 , for (x, y) 6= O and f (0, 0) = 0. The function f is discontinuous at (0, 0). Indeed, for the sequence of points 1 1 pk = ( , ), k k
k ∈ N,
lim pk = O and
k→∞
lim f (pk ) =
k→∞
1 1 1 /(2 2 ) = 6= 0. 2 k k 2
1 The discontinuity at (0, 0) is essential since if qk = ( k1 , 2k ), then
lim qk = O,
k→∞
but
lim f (qk ) =
k→∞
1 1 2 1 /( + 2 ) = . 2k 2 k 2 4k 5
So we obtain two different limits from two different choices of sequences. (4) D = R2 , f (x, y) = continuous on R2 .
x2 y 2 x2 +y 2 ,
for (x, y) 6= O, and f (0, 0) = 0. This function f is
We first show continuity at p0 = (x0 , y0 ) 6= O. If limk→∞ (xk , yk ) = (x0 , y0 ), then limk→∞ x2k yk2 = x20 y02 , and limk→∞ (x2k + yk2 ) = x20 + y02 6= 0. So, by Theorem 2.13, limp→p0 f (p) = f (p0 ). When p0 = O, we have x20 + y02 = 0, so because the denominator becomes zero, we must prove it directly: if (x, y) 6= (0, 0), the x2 + y 2 6= 0 and x4 + 2x2 y 2 + y 4 x2 + y 2 1 x2 y 2 ≤ = = |(x, y)|2 . x2 + y 2 2(x2 + y 2 ) 2 2 √ Hence, if |(x, y)| = |(x, y) − (0, 0)| < 2ǫ, we have |f (x, y) − f (0, 0)| < ǫ. |f (x, y) − f (0, 0)| =
i
i i
i
i
i
i
38
“muldown 2010/1/10 page 38 i
Chapter 2. Limits, Continuity, and Differentiation
Recall that if f : Rn 7→ Rℓ and g : Rℓ 7→ Rm , then g ◦ f is defined as g ◦ f (p) = g(f (p)). The domain of g ◦ f is {p ∈ Df : f (p) ∈ Dg } = f −1 (Dg ) ⊂ Df . Theorem 2.29. Suppose p0 ∈ Dg◦f . Then g ◦ f is continuous at p0 if (i) f is continuous at p0 , and (ii) g is continuous at f (p0 ). Proof. If {pk } is a sequence in Dg◦f ⊂ Df such that limk→∞ pk = p0 , then {f (pk )} is a sequence in Dg such that limk→∞ f (pk ) = f (p0 ), since f is continuous at p0 . Since g is continuous at f (p0 ), we then have limk→∞ g(f (pk )) = g(f (p0 )). Thus, limk→∞ g ◦ f (pk ) = g ◦ f (p0 ). Thus, by Theorem 2.25, g ◦ f is continuous at p0 . Corollary 2.30. If f : Rn 7→ Rm is continuous at p0 , then |f | : Rn 7→ R is continuous at p0 . Proof.
If g(q) = |q| for q ∈ Rm , then g is continuous on Rm since ||q| − |q0 || ≤ |q − q0 |.
Thus, if ǫ > 0 is given, |q − q0 | < ǫ =⇒ ||q| − |q0 || < ǫ. Therefore, g ◦ f = |f | is continuous at p0 since g is continuous at f (p0 ).
2.3
Global Properties of Continuous Functions
Again recall the notation that if f : Rn 7→ Rm with domain D ⊂ Rn and range f (D) ⊂ Rm , then for A ⊂ Rn , f (A) := {f (p) : p ∈ A ∩ D}, and for B ⊂ Rm , f −1 (B) := {p : f (p) ∈ B}. Example 2.31 Some functions and inverse images of part of their range: (1) f (x) = x2 , −1 ≤ x ≤ 1. Then 1 1 f ([0, ]) = [0, ], 2 4
1 1 f ([ , 2]) = [ , 1], 2 4
1 1 1 f −1 ([ , 3]) = [−1. − ] ∪ [ , 1]. 4 2 2
(2) f (x) = sin(x), −∞ < x < ∞. Then [2kπ, (2k + 1)π]\{(4k + 1)π/2} . f −1 ([0, 1)) = ∪∞ k=−∞
i
i i
i
i
i
i
2.3. Global Properties of Continuous Functions
“muldown 2010/1/10 page 39 i
39
c= f(S)
f S
0
R2
R
Figure 2.2. Figure for Example 2.31 (3).
S2
f
S1
1=f(S) 1 0
S
S 1
−1 = f(S ) 2
2
R2
R
Figure 2.3. Figure for Example 2.31 (4). 2
(3) f (x, y) = x4 + y 2 , (x, y) ∈ R2 . Then f (R2 ) = {x ∈ R : x ≥ 0}. For any fixed c > 0, the set S := {(x, y) : f (x, y) = c} is an ellipse in R2 , and f −1 ({c}) = S. (4) f (x, y) = xy for (x, y) ∈ R2 . Then f (R2 ) = R. The inverse images of 1 and −1 are the hyperbolas S1 := {(x, y) : xy = 1} = f −1 (1) and S2 := {(x, y) : xy = −1} = f −1 (−1), while {x − axis} ∪ {y − axis} = f −1 (0).
Theorem 2.32. If f : Rn 7→ Rm , then f is continuous on its domain D if and only if for each open set V ⊂ Rm , there is an open set U ⊂ Rn such that f −1 (V ) = U ∩ D. Proof. “=⇒” Let f be continuous on D and V be an open set in Rm . If p0 ∈ f −1 (V ) ⊂ D, then f is continuous at p0 and f (p0 ) ∈ V . Since V is open and f is continuous at p0 , there is a δ(p0 ) > 0 such that p ∈ B(p0 , δ(p0 )) ∩ D =⇒ f (p) ∈ V,
i
i i
i
i
i
i
40
“muldown 2010/1/10 page 40 i
Chapter 2. Limits, Continuity, and Differentiation
which implies B(p0 , δ(p0 )) ∩ D ⊂ f −1 (V ),
∀ p0 ∈ f −1 (V ).
(2.1)
Hence, if we set U := ∪p0 ∈f −1 (V ) B(p0 , δ(p0 )), then U is open and (2.1) implies U ∩ D ⊂ f −1 (V ).
(2.2)
However, from the definition of U , p0 ∈ f −1 (V ) implies p0 ∈ U ∩ D, i.e. f −1 (V ) ⊂ U ∩ D.
(2.3)
So, U is an open subset of Rn and, from (2.2) and (2.3), U ∩ D = f −1 (V ). that
“⇐=” Suppose that for each open V ⊂ Rm , there exists an open U ⊂ Rn such U ∩ D = f −1 (V ).
Let p0 ∈ D, and ǫ > 0 be given. Consider V = B(f (p0 ), ǫ). Then there exists an open U ⊂ Rn with U ∩ D = f −1 (V ). In particular, p0 ∈ U , so U is a neighborhood of p0 , and p ∈ U ∩ V =⇒ f (p) ∈ V. Thus, f is continuous at p0 Theorem 2.33 (Preservation of Connectedness). Suppose f : Rn 7→ Rm . If f is continuous on D and D is connected, then f (D) is connected. Proof. Suppose f (D) is not connected. Then there exist open sets V1 and V2 in Rm such that (i) V1 ∩ f (D) 6= ∅ and V2 ∩ f (D) 6= ∅; (ii) (V1 ∩ f (D)) ∩ (V2 ∩ f (D)) = ∅; and (iii) (V1 ∩ f (D)) ∪ (V2 ∩ f (D)) = f (D). By the Global Continuity Theorem (Theorem 2.32), there exist open sets U1 and U2 in Rn such that U1 ∩ D = f −1 (V1 ),
U2 ∩ D = f −1 (V2 ).
Then (i)’ U1 ∩ D 6= ∅, U2 ∩ D 6= ∅, from (i), (ii)’ (U1 ∩ D) ∩ (U2 ∩ D) = ∅, from (ii),
i
i i
i
i
i
i
2.3. Global Properties of Continuous Functions
“muldown 2010/1/10 page 41 i
41
(iii)’ (U1 ∩ D) ∪ (U2 ∩ D) = D, from (iii) Thus, D is disconnected, a contradiction to our hypothesis that D is connected. Therefore, our assumption that f (D) is disconnected is false. Corollary 2.34. Suppose f : Rn 7→ R. Let D be a connected subset of Rn , and f be a continuous real-valued function on D. If p, q ∈ D and f (p) < k < f (q), then there is a p0 ∈ D such that f (p0 ) = k. Proof. If there is no such p0 , then the sets V1 = (−∞, k) and V2 = (k, +∞) provide a disconnection of f (D) ; a contradiction to Theorem 2.33. Example 2.35 At any time there are two antipodal points on the equator at the same temperature. Indeed, let T (x) be temperature. Then T (x + 2π) = T (x). We suppose T is continuous and consider f (x) = T (x + π) − T (x). Then f (0) = T (π) − T (0), while f (π) = T (2π) − T (π) = T (0) − T (π) = −f (0). If f (0) = 0, then T (π) = T (0); otherwise, f (0) and f (π) are of opposite sign and k = 0 is a value between them. In the latter case, by Corollary 2.34, there is a point x0 , with f (x0 ) = 0 = T (x0 + π) − T (x0 ); i.e. T (x0 + π) = T (x0 ). Theorem 2.36 (Preservation of Compactness). Let f : Rn 7→ Rm . If D is compact and f is continuous on D, then f (D) is compact. Proof. Let G be an open covering of f (D). For each V ∈ G there is an open set U ⊂ Rn such that U ∩ D = f −1 (V ) by Theorem 2.32. (2.4)
Since {V : V ∈ G} covers f (D), the collection {U : U ∩ D = f −1 (V ), V ∈ G} covers D. Since D is a compact set, there must be a finite subcover {U1 , . . . , Uk }. From (2.4), the V ’s corresponding to the Ui satisfy f (Ui ) ⊂ Vi so {V1 , . . . , Vk } covers f (D). Thus, the cover G contains a finite subcover of f (D). Since we began with an arbitrary cover of f (D), so f (D) must be compact. Corollary 2.37. A continuous function on a compact set is bounded. Proof. f (D) is compact and therefore closed and bounded by the Heine-Borel Theorem. Corollary 2.38. A continuous real-valued function f : D ⊂ Rn 7→ R on a compact set D achieves its supremum and infimum, that is, it has a maximum and a minimum value. Proof. Let f : D ⊂ Rn 7→ R be continuous. Then f (D) is a compact subset of R (Theorem 2.36). Let M := sup{f (p) : p ∈ D} = sup f (D). We wish to show that
i
i i
i
i
i
i
42
“muldown 2010/1/10 page 42 i
Chapter 2. Limits, Continuity, and Differentiation
M ∈ f (D), that is, that M = f (p0 ) for some p0 ∈ D. Suppose M 6∈ f (D), which implies M > f (p) for all p ∈ D. Consider the function g(p) =
1 > 0, M − f (p)
for p ∈ D.
Then g is continuous on D. (Why?) Since D is compact, g is bounded by Corollary 2.37. Hence, there exists a finite number A such that 0 < g(p) ≤ A, ∀ p ∈ D 1 ≤ A, ∀ p ∈ D =⇒ 0 < M − f (p) 1 =⇒ ≤ M − f (p), ∀ p ∈ D A 1 =⇒ f (p) ≤ M − < M, ∀ p ∈ D. A The last inequality contradicts the fact that M is the least upper bound of f (D). Hence, we must have M ∈ f (D).
Exercises 2.32. 2.33. 2.34. 2.35. 2.36.
Prove that the two definitions of continuity are equivalent. √ Let f (x) = x, x ≥ 0. Show that f is continuous on [0, ∞). Show that every real polynomial of odd degree has at least one real root. Show that x4 + 7x3 − 9 has at least two real roots. Suppose f and g are continuous real valued functions on [0, 1] such that f (0) < g(0) and f (1) > g(1). Prove that there exists x ∈ (0, 1) such that f (x) = g(x). Hint: Draw a picture. 2.37. Give an alternative proof of Theorem 2.36 by showing directly that f (D) is closed and bounded if f is continuous on D and D is closed and bounded. Hint: If f (D) is not bounded, then there exists pk ∈ D such that |f (pk )| > k, k = 1, 2, . . .. If f (D) is not closed, there is at least one cluster point q of f (D) with q 6∈ f (D). The latter means there are points pk ∈ D so that limk→∞ f (pk ) = limk→∞ qk = q 6∈ f (D). Show that each case leads to a contradiction of the hypothesis. 2.38. Show that f (x) = x1 is not uniformly continuous on (0, 1].
2.4
Uniform Continuity
Definition 2.39. A function f : D ⊂ Rn 7→ Rm is uniformly continuous on D if, for each ǫ > 0, there is a δ > 0 so that (p, q ∈ D
and
|p − q| < δ) =⇒ |f (p) − f (q)| < ǫ.
i
i i
i
i
i
i
2.4. Uniform Continuity
“muldown 2010/1/10 page 43 i
43
Here the δ depends only on ǫ (and possibly D), but not on the choice of points. Example 2.40 Some examples on uniform continuity: (1) f (x) = x, −∞ < x < +∞ (D = R), is uniformly continuous on R. Obviously, if x, y ∈ R and |x − y| < ǫ, then |f (x) − f (y)| < ǫ. (2) f (x) = x2 , 0 ≤ x ≤ 1 (D = [0, 1]), is uniformly continuous on D. Indeed, |f (x) − f (y)| = |x2 − y 2 | = |x − y||x + y| ≤ 2|x − y| for x, y ∈ [0, 1]. Thus, if x, y ∈ D and |x − y| < 2ǫ , then |f (x) − f (y)| < ǫ, so f is uniformly continuous on D = [0, 1]. (3) f (x) = x2 , 0 ≤ x < +∞ (D = [0, +∞)), then f is not uniformly continuous on D. We will show the condition is not satisfied for ǫ = 1 by any choice of δ. Let δ > 0 be given and choose x = δ1 , y = 1δ + 2δ . Then |x − y| =
δ 1. 2 δ 2
Thus, f is not uniformly continuous on [0, ∞). √ (4) f (x) = x, 1 ≤ x < +∞ (D = [1, +∞)). Then f is uniformly continuous on D. Indeed, √ √ |x − y| 1 |f (x) − f (y)| = | x − y| = √ √ ≤ |x − y|, x+ y 2
∀ x, y ∈ [1, +∞).
Thus, if |x − y| < 2ǫ and x, y ∈ [0, +∞), we have |f (x) − f (y)| < ǫ. √ (5) f (x) = x, D = [0, +∞), is uniformly continuous on D. We again use |x − y| |f (x) − f (y)| = √ √ . x+ y We may assume then |f (x) √ that x > y and suppose |x − y| < δ. √− √ If x√≥ δ, √ f (y)| < √δδ = δ. If y < x < δ, then |f (x)− f (y)| = x − y < δ − 0 = δ. In either case, we see that |x − y| < δ = ǫ2 and x, y ∈ [0, +∞) =⇒ |f (x) − f (y)| < ǫ. Theorem 2.41. Let f : D ⊂ Rn 7→ Rm . If f is continuous on D and D is compact, then f is uniformly continuous on D. Proof. Let ǫ > 0 be given. By the continuity of f on D, for each p0 ∈ D, there exists a δ(ǫ, p0 ) > 0 such that p ∈ B(p0 , δ(ǫ.p0 )) ∩ D =⇒ |f (p) − f (p0 |
0 for all k.
Exercises 2.38. If f (x) = x1 , x 6= 0, then f is continuous on its domain. 2.39. Show that a polynomial f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 ,
(ai constants)
is continuous on R. Hint: Show that f (x) = constant and g(x) = x are continuous on R, and deduce the general result from Corollary 2.26.
i
i i
i
i
i
i
2.4. Uniform Continuity
“muldown 2010/1/10 page 45 i
45
2.40. Let f : R 7→ R be defined by f (x) =
1 − x, x ∈ Q; x, x∈ 6 Q.
Show that f is continuous at x = 12 and discontinuous elsewhere. 2.41. Let f : R 7→ R be continuous on R. Show that if f (x) = 0 for all x ∈ Q, then f (x) = 0 for all x ∈ R. 2.42. Use the inequalities | sin(x)| ≤ |x| and | cos(x)| ≤ 1 and the formula sin(x) − sin(u) = 2 sin(
x−u x+u ) cos( ) 2 2
to show that the sine function is uniformly continuous on R. 2.43. Using the results of exercises 38 and 42 show that the function defined by x sin( x1 ), x 6= 0; g(x) = 0, x = 0; is continuous on R. 2.44. Is it possible for f and g to be discontinuous and yet g ◦ f be continuous? 2.45. Let f be continuous on D. (a) Is it true that D open implies f (D) open? (This one is easy.) (b) Is it true that D closed implies f (D) closed? Hint: Consider f (x) = on R.
1 1+x2
1 2.46. If f (x) = 1+x 2 , then f is uniformly continuous on R. 2.47. Let f be a real-valued function with domain D, D being an open subset of Rn . Prove that f is continuous if and only if the sets
{p : f (p) > α},
{p : f (p) < α}
are open for each α ∈ R. 2.48. If f (x, y) = xy + x4 − y 4 , then the equation f (x, y) = 0 has at least four solutions on any circle x2 + y 2 = R2 > 0. 2.49. Prove the following facts about uniformly continuous functions: (a) Let f be uniformly continuous on (0, 1] and {xk } be a sequence of real numbers such that 0 < xk ≤ 1 and limk→∞ xk = 0. Prove that limk→∞ f (xk ) exists. Hint: What was Cauchy’s name? (b) If f is uniformly continuous on (0, 1], then f may be defined at x = 0 so that f is continuous on [0, 1]. (c) If f : Rn 7→ Rm , and f is uniformly continuous on the domain D ⊂ Rn , then f may also be defined on D\D so that f is continuous on D. (D denotes the closure of D.) See Exercise 1.39 and 1.41.
i
i i
i
i
i
i
46
“muldown 2010/1/10 page 46 i
Chapter 2. Limits, Continuity, and Differentiation
2.50. If f : Rn 7→ Rm with domain D ⊂ Rn , the set G := {(p, f (p)) : p ∈ D} ⊂ Rn+m is called the graph of f . Suppose D is connected. Is it true that f is continuous on D if and only if G is connected? What if “connected” is replaced by “closed”? “Compact”? Prove any statements you believe to be true and find a counterexample for any false statements. 2.51. If f is real valued and f is continuous at p0 with f (p0 ) > 0, then f (p) > 0 for each p in a neighborhood of p0 . Hint: ǫ = 21 f (p0 ). 2.52. Prove that f is continuous at a point p0 in its domain D if and only if, for each sequence {pk } of points in D such that limk→∞ pk = p0 , we have {f (pk )} convergent. (We say nothing about limk→∞ f (pk ).)
2.5
Limits
The notion of a limit which was introduced for sequence can, as you recall from first year Calculus, be extended to any function. Definition 2.42. Let f : Rn 7→ Rm with domain D ⊂ Rn . Let p0 be a cluster point of D (it is not assumed that p0 ∈ D). A point L ∈ Rm is the limit at p0 of f if, for each ǫ > 0, there exists a δ > 0 such that (p ∈ D
and
0 < |p − p0 | < δ) =⇒ |f (p) − L| < ǫ.
Write lim f (p) = L. p→p0
If no such L exists, we say the limit at p0 of f does not exist. xy Example 2.43 (1) f (x, y) = p is defined for (x, y) 6= O, i.e. D = 2 x + y2 R2 \{O}. We claim lim(x,y)→(0,0) f (x, y) = 0. Indeed, using (a + b)2 ≥ 0 =⇒ a2 + b2 ≥ 2ab, we see that 1 x2 + y 2 xy 1p 2 1 = x + y 2 = |(x, y) − (0, 0)|. p ≤ p 2 2 2 2 x +y 2 x +y 2 2 If ǫ > 0 is given, then 0 < |(x, y)| < 2ǫ implies |f (x, y)| < ǫ.
(2) The function defined on R2 by f (x, y) =
0, if y 6= x2 ; 1, if y = x2 ;
does not have a limit at O = (0, 0) since each neighborhood of (0, 0) contains points (x, y) 6= (0, 0) at which f (x, y) = 0 and at which f (x, y) = 1.
i
i i
i
i
i
i
2.5. Limits
“muldown 2010/1/10 page 47 i
47
Just as in the case of sequences there is a Cauchy criterion for the existence of limp→p0 f (p). Theorem 2.44 (Cauchy Criterion). The limit limp→p0 f (p) exists if and only if for each ǫ > 0, there is a δ > 0 such that p, q ∈ D ∩ B(p0 , δ)\{p0 } =⇒ |f (p) − f (q)| < ǫ. Proof. “=⇒” Suppose limp→p0 f (p) = L; thus, if ǫ > 0 is given, there exists a δ > 0 such that p ∈ D, 0 < |p − p0 | < δ =⇒ |f (p) − L| < ǫ/2. Then, if p, q ∈ D ∩ B(p0 , δ)\{p0 } , we have
|f (p) − f (q)| ≤ |f (p) − L| + |f (q) − L| < ǫ/2 + ǫ/2 = ǫ,
so Cauchy’s Criterion is satisfied. “⇐=” Let Cauchy’s Criterion be satisfied, that is, for a given ǫ > 0, there is a δ(ǫ) > 0 such that p, q ∈ D ∩ B(p0 , δ(ǫ))\{p0 } =⇒ |f (p) − f (q)| < ǫ.
Let {pk } be any sequence in D for which limk→∞ pk = p0 , with all pk 6= p0 , k = 1, 2, 3, . . .. Hence, for the aforementioned δ > 0, there is a N such that k ≥ N =⇒ 0 < |pk −p0 | < δ(ǫ). Therefore, if k, m ≥ N , then pk , pm ∈ D∩ B(p0 , δ(ǫ))\{p0 } ,
and by the Cauchy Criterion, this implies |f (pk ) − f (pm )| < ǫ. Thus, {f (pk )} is a Cauchy sequence, and therefore has a limit; say limk→∞ f (pk ) = L. We need to show that limp→p0 f (p) = L. From the facts that limk→∞ p = p0 and limk→∞ f (pk ) = L, we can choose an N1 so that 0 < |pN1 − p0 | < δ(ǫ/2) and |f (pN1 ) − L| < ǫ/2. Therefore, if p ∈ D and 0 < |p − p0 | < δ(ǫ/2), we have |f (p) − L| ≤ |f (p) − f (pN1 )| + |f (pN1 ) − L| < ǫ/2 + ǫ/2 = ǫ, (by the Cauchy Criterion and the choice of N1 ). Thus, limp→p0 f (p) = L. In fact, general limits are reduced to consideration of limits of sequences by the following result. Theorem 2.45. The limp→p0 f (p) exists and equals L if and only if limk→∞ f (pk ) exists and equals L for each sequence {pk } in D\{p0 } such that limk→∞ pk = p0 . Proof.
Exercise 55.
i
i i
i
i
i
i
48
“muldown 2010/1/10 page 48 i
Chapter 2. Limits, Continuity, and Differentiation
A point p0 is called isolated if it is not a cluster point of D. Check back to the definition of continuity and observe that a function is always continuous at an isolated point of its domain. Corollary 2.46. A function f is continuous at a cluster point of its domain if and only if limp→p0 f (p) = f (p0 ). Proof.
Follows from Theorems 2.25 and 2.45.
The following corollaries follow immediately from the corresponding statements for sequences. Corollary 2.47. Let f : D ⊂ Rn 7→ Rm , f = (f1 , . . . , fm ), L = (L1 , . . . , Lm ), and p0 ∈ D. Then limp→p0 f (p) = L if and only if limp→p0 fi (p) = Li , for all i = 1, 2, . . . , m. Corollary 2.48. If f, g : Rn 7→ R, p0 ∈ (Df ∩ Dg ), and limp→p0 f (p) = L, limp→p0 g(p) = M , then (i) limp→p0 (f (p) + g(p)) = L + M. (ii) limp→p0 (f (p)g(p)) = LM. (iii) limp→p0
f (p) g(p)
=
L M,
if M 6= 0.
If p0 is a cluster point of D0 ⊂ D we may talk about the limit with respect to D0 at p0 , i.e., (D0 ) limp→p0 f (p) = L if, for each ǫ > 0 there is a δ > 0 such that p ∈ D0 and 0 < |p − p0 | < δ implies |f (p) − L| < ǫ. Special Notation: One sided limits. D0 = (x0 , +∞) : (D0 ) lim f (x) = lim+ f (x) x→x0
x→x0
D0 = (−∞, x0 ) : (D0 ) lim f (x) = lim− f (x) x→x0
x→x0
Example 2.49 Some one-sided limits: (1) If f (x) = 1 for x > 0 and f (x) = x for x ≤ 0, then lim f (x) = 1,
x→0+
(2) Suppose f : R2 7→ R is defined by f (x, y) =
lim f (x) = 0.
x→0−
0, if y 6= x2 ; 1, if y = x2 .
Evidently, the limit at O = (0, 0) with respect to points on the parabola y = x2 is 1, while the limit with respect to the complement of the parabola
i
i i
i
i
i
i
2.6. Differentiation of real valued functions of a real variable
“muldown 2010/1/10 page 49 i
49
is 0. Notice that in this case, we have the limit along any line through the origin is 0 at O, i.e. lim f (tx0 , ty0 ) = 0
t→0
if (x0 , y0 ) 6= O.
However, the limit lim(x,y)→(0,0) f (x, y) does not exist.
The moral of this is that when the limit properties of functions of more than one variable are being considered, it is not enough to look at their behaviour on straight lines.
2.6
Differentiation of real valued functions of a real variable
We review the most important results on differentiation from first year calculus. Throughout this section f : R 7→ R. Definition 2.50. If f is defined in a neighborhood of x0 and lim
x→x0
f (x) − f (x0 ) x − x0
exists, then this limit is called the derivative of f at x0 and is denoted f ′ (x0 ). Proposition 2.51. If f ′ (x0 ) exists, then f is continuous at x0 . Proof.
Since f ′ (x0 ) exists, given ǫ = 1, there is a δ > 0 such that f (x) − f (x0 ) ′ − f (x0 ) < 1. 0 < |x − x0 | < δ =⇒ x − x0
Hence, |f (x)−f (x0 )| < [|f ′ (x0 )| + 1] |x−x0 |. From this it follows that limx→x0 f (x) = f (x0 ). Definition 2.52. The function f has an interior relative maximum (respectively, minimum) at c if there is a neighborhood U of c such that f (x) ≤ f (c)
∀x∈U
(respectively, f (x) ≥ f (c)
∀ x ∈ U ).
Theorem 2.53. If f has an interior relative maximum or minimum at c and f ′ (c) exists, then f ′ (c) = 0. Proof.
We are given that f ′ (c) = lim
x→c
f (x) − f (c) . x−c
i
i i
i
i
i
i
50
“muldown 2010/1/10 page 50 i
Chapter 2. Limits, Continuity, and Differentiation
Suppose that f has an interior relative maximum at x = c. Then there exists a δ > 0 so that |x − c| < δ =⇒ f (x) ≤ f (c). Therefore, f (x) − f (c) ≤0 x−c f (x) − f (c) =⇒ lim ≤ 0. x→c x−c
c < x < c + δ =⇒
Similarly, f (x) − f (c) ≥0 x−c f (x) − f (c) =⇒ lim ≥ 0. x→c x−c
c − δ < x < c =⇒
The two inequalities together imply f ′ (c) = 0.
Corollary 2.54 (Rolle’s Theorem). Suppose that (i) f is continuous on [a, b] (ii) f ′ exists on (a, b) (iii) f (b) = f (a). Then there exists c ∈ (a, b) such that f ′ (c) = 0. Proof. First, if f (x) = f (a) = f (b) for all x ∈ (a, b), then f ′ (x) = 0 for all x ∈ (a, b). If there is an x1 ∈ (a, b) with f (x1 ) > f (a) = f (b), then f being continuous on the compact set [a, b] achieves its maximum value m = sup{f (x) : x ∈ [a, b]} at some point c ∈ (a, b). Thus, f has an interior relative maximum at c and f ′ (c) = 0. If there is an x1 ∈ (a, b) with f (x1 ) < f (a) = f (b), then f has an interior relative minimum at some c ∈ (a, b) and f ′ (c) = 0. Corollary 2.55 (Mean Value Theorem). Suppose (i) f is continuous on [a, b] (ii) f ′ exists on (a, b) Then there exists c ∈ (a, b) such that f ′ (c)(b − a) = f (b) − f (a).
i
i i
i
i
i
i
2.6. Differentiation of real valued functions of a real variable
“muldown 2010/1/10 page 51 i
51
Proof. Consider the function φ(x) = [f (x) − f (a)](b − a) − [f (b) − f (a)](x − a). The function φ is continuous on [a, b], φ′ exists on (a, b) and φ(a) = φ(b) = 0. By Rolle’s Theorem, there is a c ∈ (a, b) such that 0 = φ′ (c) = f ′ (c)(b − a) − [f (b) − f (a)]. Corollary 2.56 (Cauchy Mean Value Theorem). Suppose that (i) f, g are continuous on [a, b] (ii) f ′ and g ′ exist on (a, b) Then there exists c ∈ (a, b) such that f ′ (c)[g(b) − g(a)] = g ′ (c)[f (b) − f (a)]. Proof.
Consider the function φ(x) = [f (x) − f (a)][g(b) − g(a)] − [f (b) − f (a)][g(x) − g(a)].
Theorem 2.57 (L’Hospital’s Rule). g ′ (x) 6= 0 for all x ∈ (a, b), and either
If f and g are differentiable on (a, b),
(i) limx→b− f (x) = 0 and limx→b− g(x) = 0; or (ii) limx→b− f (x) = ∞ and limx→b− g(x) = ∞, then lim
x→b−
f ′ (x) f (x) = L =⇒ lim = L. g ′ (x) x→b− g(x)
Note: limx→b− f (x) = ∞ means that, for each real number N , there is a δ > 0 such that x ∈ (b − δ, b) =⇒ f (x) > N . ′ (x) = Proof. Case (i): Suppose limx→b− f (x) = limx→b− g(x) = 0 and limx→b− fg′ (x) L. If ǫ > 0 is given, there is a δ > 0 such that ′ f (x) x ∈ (b − δ, b) =⇒ ′ − L < ǫ. (2.8) g (x) Let f, g be defined at b by f (b) = g(b) = 0. Then f and g are continuous on [x, b] if x > a. From the Cauchy Mean Value Theorem, f (x) − f (b) f ′ (c) f (x) = = ′ , g(x) g(x) − g(b) g (c) Therefore,
′ f (x) f (c) = < ǫ, − L − L g(x) g ′ (c)
b − δ < x < c < b.
for
x ∈ (b − δ, b).
i
i i
i
i
i
i
52 Thus, limx→b−
“muldown 2010/1/10 page 52 i
Chapter 2. Limits, Continuity, and Differentiation f (x) g(x)
= L. ′
(x) Case (ii): Suppose limx→b− f (x) = limx→b− g(x) = ∞ and limx→b− fg′ (x) = L. As before, (2.8) holds if x ∈ (b−δ, b) for some δ > 0. However, we cannot define f (b) and g(b) in this case so that f is continuous at b. Let x0 = b − δ and x ∈ (b − δ, b). Then f ′ (c) f (x) − f (x0 ) = ′ , for some c ∈ (b − δ, b). g(x) − g(x0 ) g (c)
Thus, by (2.8), for x0 = b − δ and x ∈ (b − δ, b), f (x) − f (x0 ) f (x) =
1 . 2
(2.11)
Thus, for x ∈ (b − δ1 , b), f (x) 1 f (x) − L < h(x) − Lh(x) , from (2.11) 2 g(x) g(x) f (x) 1 f (x) − L < h(x) − L + |L||1 − h(x)| =⇒ 2 g(x) g(x) < ǫ + |L|ǫ, (2.9) and (2.11). Thus, if ǫ > 0, there exists δ1 such that f (x) − L < 2(1 + |L|)ǫ. x ∈ (b − δ1 , b) =⇒ g(x)
Therefore,
lim
x→b−
f (x) = L. g(x)
Remark: Define limx→∞ = L if, for each ǫ > 0, there exists N such that x ≥ N implies |f (x) − L| < ǫ. L’Hospital’s Rule is also valid if limx→b− is replaced by limx→∞ . Minor changes are required for the proof however.
Exercises x2 − a2 for x 6= a, show that lim f (x) = 2a. x→a x−a 2.55. Prove Theorem 2.45. 2.54. If f (x) =
i
i i
i
i
i
i
2.6. Differentiation of real valued functions of a real variable 2.56. Show that the limit
“muldown 2010/1/10 page 53 i
53
xy 2 (x,y)→(0,0) x2 + y 4 lim
does not exist even though (D0 )
xy 2 , + y4
lim
(x,y)→(0,0) x2
D0 any straight line through O
exists and equals 0. 2.57. Recall the rules for differentiation of constants, sums, products, composites (chain rule) which you have learned from calculus. Prove one or two of them. 2.58. Draw the graph of the function f (x) = | sin(x)|, x ∈ R. Check that the derivative exists and is 0 at each relative maximum but does not exist at each relative minimum. 2.59. Prove that there is no real number k such that the equation x3 − 3x + k = 0 has two distinct solutions in [0, 1]. 2.60. If c0 , . . . , cn are real numbers such that c0 +
c1 cn−1 cn + ...+ + = 0. 2 n n+1
prove that c0 + c1 x + . . . + cn−1 xn−1 + cn xn = 0 has at least one solution in (0, 1). p 2.61. If f (x) = |x|, then f ′ (x) exists if x 6= 0 and does not exist if x = 0. √ 2.62. Prove that 10.243 (no calculator!). Hint: By the Mean √ < 10.250 √ < 105 5 Value Theorem 105 = 100 + 2√ where 100 < c < 105. c 2.63. Prove L’Hospital’s Rule with limx→b− replaced by limx→∞ . limx→∞ xe−x = 0. 2.64. Prove the following limits arctan(x) =1 (i) lim x→0 x 1 ex − e−x − 2 tan(x) =− (ii) lim 3 x→0 3 sin (x)
Prove that
x4 − 5x3 + 6x2 + 4x − 8 =3 x→2 x4 − 7x3 + 18x2 − 20x + 8 1 = 0. (iv) lim cot(x) − x→0 x csc(x) = e. (v) lim 1 + tan(x) x→0 log x − π2 = 0. (vi) lim + tan(x) x→ π 2 (iii) lim
2.65. If f ′ (x) exists for all x near c, f is continuous at c and limx→c f ′ (x) = A, then f ′ (c) exists and equals A.
i
i i
i
i
i
i
54
“muldown 2010/1/10 page 54 i
Chapter 2. Limits, Continuity, and Differentiation
2.66. (Darboux Property of the Derivative) Let f ′ (x) exist for each x ∈ [a, b] and f ′ (a) = α, f ′ (b) = β. Suppose α < γ < β. Show that there is a c ∈ (a, b) such that f ′ (c) = γ. Hint: Show that g(x) = f (x) − γx must achieve its minimum in (a, b). This exercise shows that derivatives, like continuous functions, have the Intermediate Value Property. However, a derivative need not be continuous on its domain. See Exercise 70 (i) and (ii). 2.67. Let 0, if −1 ≤ x ≤ 0; f (x) = 1, if 0 < x ≤ 1. Is there a function g such that g ′ (x) = f (x) on −1 ≤ x ≤ 1. 2.68. The function defined by x2 , x ∈ Q; g(x) = 0, x 6∈ Q;
is continuous at exactly one point. Is it differentiable there? 2.69. (Taylor’s Theorem) Suppose that (i) f (n−1) exists and is continuous on [α, β], and (ii) f (n) exists on [α, β], (a) Show that for m > 0 f (β) = f (α) +
(β − α)n−1 (n−1) (β − α) ′ f (α) + . . . + f (α) + Rn (f ; α, β), 1! (n − 1)!
where Rn (f ; α, β) =
(β − α)m (β − γ)n−m (n) f (γ) m(n − 1)!
for some point γ ∈ (α, β). This is called Schl¨omilch’s form of the remainder. Special cases are given by taking (β − α)n (n) f (γ) (Lagrange form) n! (β − α)(β − γ)n−1 (n) f (γ) (Cauchy form) Rn (f ; α, β) = (n − 1)!
m = n; m = 1;
Rn (f ; α, β) =
The Lagrange form is the easiest to remember and is adequate for most purposes. Hint: Let the constant C be defined by f (β)−f (α)−
(β − α) ′ (β − α)n−1 (n−1) f (α)−. . .− f (α)−(β−α)m C = 0, 1! (n − 1)!
and apply Rolle’s Theorem to φ(x) = f (β)−f (x)−
(β − x) ′ (β − x)n−1 (n−1) f (x)−. . .− f (x)−(β−x)m C. 1! (n − 1)!
i
i i
i
i
i
i
2.6. Differentiation of real valued functions of a real variable
“muldown 2010/1/10 page 55 i
55
(b) (i) If f (x) = ex , show that lim Rn (f ; 0, x) = 0 for all x. That is, show n→∞ that x2 xn−1 x = ex , ∀ x. + ...+ lim 1 + + n→∞ 1! 2! (n − 1)! (b) (ii) If f (x) = sin(x), then limn→∞ Rn (f ; 0, x) = 0 for all x. (b) (iii) If f (x) =
1 1−x ,
x 6= 1, then limn→∞ Rn (f ; 0, x) = 0 for −1 < x < 1.
(c) (i) How large must I take n to approximate e to four decimal places by the expression 1 1 1 1 + + + ...+ . 1! 2! (n − 1)! √ (c) (ii) Approximate 3 e to five decimal places.
(c) (iii) Prove that the closest integer to n! e is divisible by (n − 1). Hint: Use Taylor’s formula for e−1 . √ (c) (iv) Compute 97 to four decimal places. (d) Suppose f ′′ (x) exists and is non-negative (non-positive) in a neighborhood of c and that f ′ (c) = 0. Show that f has a relative minimum (maximum) at c. (e) Suppose f ′′ (x) exists in a neighborhood of c and is continuous at c. Show that f has a relative minimum (maximum) at c if f ′ (c) = 0 and f ′′ (c) > 0 (< 0). 2.70. Let f0 (x) = sin( x1 ), f1 (x) = x sin( x1 ), f2 (x) = x2 sin( x1 ), f3 (x) = x3 sin( x1 ), for x 6= 0 and fi (0) = 0 for i = 0, 1, 2, 3. (i) fi are differentiable at any point x 6= 0 (Chain Rule). (ii) f0 is discontinuous at x = 0.
(iii) f1 is continuous at x = 0, but is not differentiable at x = 0. (iv) f2 is differentiable at x = 0 but f2′ is discontinuous at x = 0. (v) f3 is differentiable at x = 0 and f3′ is continuous at x = 0. 2.71. Let pn be defined by pn (x) =
dn 2 (x − 1)n , dxn
n = 1, 2, . . . .
(i) Show that pn is a polynomial of degree n. (ii) The equation pn (x) = 0 has exactly n roots in (−1, 1). 2.72. Doesn’t time fly when you’re having fun?
i
i i
i
i
i
i
56
“muldown 2010/1/10 page 56 i
Chapter 2. Limits, Continuity, and Differentiation
i
i i
i
i
i
i
“muldown 2010/1/10 page 57 i
Chapter 3
Riemann Integration
The definition of the Riemann integral is essentially the same in higher dimensions as in one dimension. You may have learned Riemann integration of functions of one variable by the lower and upper Riemann sums approach; the treatment adopted here is slightly different but equivalent to that approach (see Exercise 17).
3.1
Content and the Riemann Integral
Recall that a closed interval in Rn has the form I = [a1 , b1 ]×[a2 , b2 ]×. . .×[an , bn ] := {(x1 , x2 , . . . , xn ) : ai ≤ xi ≤ bi , i = 1, 2, . . . , n}. The diameter of I was defined to be
v n 1/2 u uX = t (bk − ak )2 . λ(I) := (b1 − a1 )2 + . . . + (bn − an )1 k=1
If p, q ∈ I, then |p − q| ≤ λ(I).
Definition 3.1. The content of I, or Jordan measure of I is defined to be µ(I) = (b1 − a1 )(b2 − a2 ) · · · (bn − an ) =
n Y
k=1
(bk − ak ).
In R, R2 , R3 , µ is length, area, and volume respectively. We write µn if the dimension of the space needs to be emphasized. A set S ⊂ Rn has content zero (Jordan measure zero) if, for each ǫ > 0, there is a finite collection of intervals Ii , i = 1, . . . , k, such that S ⊂ ∪ki=1 Ii
and
k X
µ(Ii ) < ǫ.
i=1
For such an S, we write simply, µ(S) = 0. 57
i
i i
i
i
i
i
58
“muldown 2010/1/10 page 58 i
Chapter 3. Riemann Integration
Example 3.2 Some sets with content zero: (1) A set containing only a single point has content zero in Rn . (2) A set containing a finite number of points has content zero in Rn . (3) The set {(x, 0) : 0 ≤ x ≤ 1} has content zero in R2 . The proof is essentially given in Figure 3.1. (4) {(x, y) : |x| + |y| = 1} has content zero in R2 . The proof is essentially given by Figure 3.1. It also will follow from (5) and (6).
ε (0,0)
(1,0) 1/k (α)
(b)
Figure 3.1. (a) The rectangle P has height ǫ and length 1. (b) The rectangles are squares of side length 1/k, so µ( Ii ) = 4/k < ǫ, if k is sufficiently large.
(5) If f is a continuous real valued function on [0, 1], then {(x, f (x)) : 0 ≤ x ≤ 1}, the graph of f , has content zero in R2 . Proof. Let ǫ > 0. Since f is uniformly continuous on [0, 1] (Theorem 2.41), there exists δ > 0 such that x, y ∈ [0, 1] and |x − y| < δ =⇒ |f (x) − f (y)| ≤ ǫ/2. Now let m be a natural number so that mδ < 1 and (m + 1)δ ≥ 1. Consider the following intervals in R2 : Ik = [kδ, (k + 1)δ] × [f (kδ) − ǫ/2, f (kδ) + ǫ/2],
Im = [mδ, 1] × [f (mδ) − ǫ/2, f (mδ) + ǫ/2].
k = 0, . . . , m − 1,
If x ∈ [kδ, (k + 1)δ] ∩ [0, 1], then |x − kδ| < δ, so |f (x) − f (kδ)| ≤ ǫ/2 =⇒ f (x) ∈ [f (kδ) − ǫ/2, f (kδ) + ǫ/2]. Thus, (x, f (x)) ∈ Ik if x ∈ [kδ, (k + 1)δ] ∩ [0, 1]. Hence, {(x, f (x)) : 0 ≤ x ≤ 1} ⊂ ∪m k=1 Ik so µ {(x, f (x)) : 0 ≤ x ≤ 1} = 0.
and
m X
k=1
µ(Ik ) = (mδ + 1 − mδ)ǫ = ǫ,
i
i i
i
i
i
i
3.1. Content and the Riemann Integral
“muldown 2010/1/10 page 59 i
59
ε
δ δ
δ δ
Figure 3.2. A covering of a graph of a continuous function on [0, 1] by rectangles. (6) The union of a finite collection of sets with zero content has zero content. (7) A circle has zero content in R2 . Indeed, the circle of radius r can be written as √ the union of the two half circles which are the graphs of f (x) = ± r2 − x2 , and thus has zero content by (5) and (6).
3.1.1
Partition of I
Let I be the rectangle I = [a1 , b1 ] × [a2 , b2 ] × . . . × [an , bn ]. Let Pi be finite sets of points Pi := {ti,j : j = 0, . . . , mi },
with ai = ti,0 < . . . < ti,mi = bi ,
i = 1, . . . , n.
Then P = P1 × P2 × . . . × Pn is said to be a partition of I. Notice that a partition generates a subdivision of I into a finite collection of closed non-overlapping subintervals of the form [tj1 ,1 , tj1 +1,1 ] × [tj2 ,2 , tj2 +1,2 ] × . . . × [tjn ,n , tjn +1,n ]. The total number of subintervals generated by P is m1 · m2 · · · mn . A partition Q is a refinement of P if P ⊂ Q. Notice that a refinement of P further subdivides the intervals Ii generated by P .
3.1.2
Riemann Sums
Let I be a closed interval in Rn and f : I 7→ Rm . If P is a partition of I which generates a subdivision of I into subintervals Ii , then any sum of the form X f (pi )µ(Ii ), pi ∈ Ii S(P, f ) = i
i
i i
i
i
i
i
60
“muldown 2010/1/10 page 60 i
Chapter 3. Riemann Integration
I
Ii
Figure 3.3. A covering of a graph of a continuous function on [0, 1] by rectangles. is a Riemann sum of f corresponding to the partition P . Definition 3.3 (Riemann Integral). Let f : I 7→ Rm where I is a closed interval in Rn . Let α ∈ Rm . If, for each ǫ > 0, there is a partition Pǫ of I such that if P ⊃ Pǫ and S(P, f ) is any Riemann sum corresponding to P , then |S(P, f ) − α| < ǫ, and the function f is said to be Riemann integrable on I and to have Riemann integral α. We write Z Z α = f or α = f dµ. I
I
Proposition 3.4. If f is Riemann integrable on I, then the integral of f on I is unique. R R Proof. Suppose α1 = I f and α2 = I f . Let ǫ > 0 be given. Then there are partitions P1,ǫ , P2,ǫ of I such that P ⊃ Pi,ǫ =⇒ |S(P, f ) − αi | < ǫ/2,
i = 1, 2.
But the partition P1 ∪ P2 ⊃ P1 and P1 ∪ P2 ⊃ P2 , so that |α1 − α2 | ≤ |S(P1 ∪ P2 , f ) − α1 )| + |S(P1 ∪ P2 , f ) − α2 | ≤ ǫ/2 + ǫ/2 = ǫ. We have thus shown |α1 − α2 | < ǫ for each ǫ > 0. Hence,|α1 − α2 | = 0, i.e. α1 = α2 .
Exercises 3.1. If f is integrable on I, then f is bounded on I.
i
i i
i
i
i
i
3.2. Cauchy Criteria and Properties of Integrals
“muldown 2010/1/10 page 61 i
61
3.2. f : I 7→ Rm is integrable on I if and only if each component function fi , i = 1, . . . , m, of f is integrable on I. Moreover, Z Z Z f= f1 , . . . , fm . I
I
I
It therefore suffices to consider only real-valued functions, just as in the case of sequences.
3.2
Cauchy Criteria and Properties of Integrals
R Theorem 3.5 (Cauchy Criterion). Let f : I 7→ Rm . I f exists if and only if for each ǫ > 0 there exists a partition Pǫ of I such that if P, Q ⊃ Pǫ , then |S(P, f ) − S(Q, f )| < ǫ for all Riemeann sums S(P, f ), S(Q, f ) corresponding to P ,Q. R Proof. “=⇒” If α = I f exists, then for each ǫ > 0 there is a partition Pǫ of I such that if P ⊃ Pǫ , then |S(P, f ) − α| < ǫ/2. (3.1) Let P, Q ⊃ Pǫ , then (3.1) implies |S(P, f ) − S(Q, f )| ≤ |S(P, f ) − α| + |α − S(Q, f )|
0. Since K has content zero, there exists a partition Poǫ of I such that if Ii are the subintervals generated by Poǫ , then X ǫ . (3.8) µ(Ii ) < 4N i,K∩Ii 6=∅
Let
K
L L
1
2
Figure 3.4. The subintervals, L1 meeting K and those not L2 . L1 = ∪i,K∩Ii 6=∅ Ii ,
L2 = ∪i,K∩Ii =∅ Ii .
Then f is continuous on L2 , which is a compact subset, and so f is uniformly continuous on L2 . There exists δ > 0 such that p, q ∈ L2 and |p − q| < δ =⇒ |f (p) − f (q)|
0, there is a finite collection of intervals {Ik } such that X K ⊂ ∪k Ik and µ(Ik ) ≤ ǫ. k
K has Lebesgue measure zero if, for each ǫ > 0, there is a countable collection of intervals {Ik } satisfying the displayed relation. This simple extension of the concept of measure zero has profound consequences. We know that the set of rationals in [0, 1] does not have Jordan content. However, if this set is enumerated as {rk : k = 1, 2, 3, . . .}, then rk ∈ Ik with µ(Ik ) = 2ǫk yields {rk : k = 1, 2, 3, . . .} ⊂ ∪k Ik
and
X k
µ(Ik ) = ǫ
∞ X 1 = ǫ. 2k k=1
i
i i
i
i
i
i
3.2. Cauchy Criteria and Properties of Integrals
“muldown 2010/1/10 page 69 i
69
Thus the set of rationals in [0, 1] has Lebesgue measure zero. In fact, the same argument shows that Q, or any countable set of real numbers has Lebesgue measure zero. Countable sets are not the only sets of Lebesgue measure zero however; in fact, they can be quite complicated. The remarkable Cantor set which you will study in Real Analysis has Lebesgue measure zero but nonetheless has the same cardinality as [0, 1], i.e. it has zero “length” but has the same “number” of points as [0, 1]. A simple discussion of the Cantor set can be found in Bartle’s book. An interesting theorem of Lebesgue states that, for a bounded real-valued R function f , the Riemann integral I f exists if and only if the set of points in I at which f is discontinuous has Lebesgue measure zero. For example, for the functions in Exercises 15 and 16, the one in the first exercise is discontinuous at each point in [0, 1], while the function in the second exercise is only discontinuous at the rational points in [0, 1]
Exercises 3.3. Suppose f and g are bounded on I and f (p) = g(p) for all p ∈ I\K, where K has content zero. Prove that Z Z Z f exists =⇒ g exists and equals f. I
I
I
3.4. Prove Theorem 3.14. 3.5. Give an example of a countable set which has content and one which does not have content (contented and discontented sets??). Is there a countable set with positive content? 3.6. Prove Theorem 3.15 (b). 3.7. Prove Theorem 3.15 (c). 3.8. Prove Theorem 3.15 (e). Hint: Use parts (a) and (b). 1 1 3.9. Prove that {( m , k ) : m, k = 1, 2, . . .} has content zero in R2 . 3.10. Prove that the set of points in [0, 1] × [0, 1] with rational coordinates does not have content in R2 . 3.11. Let D be a subset of Rn Rwhich has content zero and f be any bounded function on D. Prove that D f exists and equals zero. 3.12. RLet D be any bounded subset of Rn and f (p) = 0 for all p ∈ D. Prove that D f exists and equals 0. Note that D does not necessarily have content. R1 3.13. Show that 0 sin( x1 ) dx exists and is independent of the value assigned to the integrand at x = 0. R R 3.14. If D f exists and D1 is any subset of D with content, then D1 f exists (D itself does not necessarily have content). Note that this exercise proves Rb Rc in particular that the existence a f implies the existence of a f for any c ∈ (a, b).
i
i i
i
i
i
i
70
“muldown 2010/1/10 page 70 i
Chapter 3. Riemann Integration
3.15. If f (x) = 0, x 6∈ Q and f (x) = 1 for x ∈ Q, then f is not Riemann integrable on [0, 1]. 3.16. If 0, if x 6∈ Q f (x) = 1 , if x = p and gcd(p, q) = 1, q q R1 then 0 f = 0. 3.17. If P is a partition of the interval I and f : I 7→ R, f bounded, define S(P, f ) and S(P, f ), the lower and upper Riemann sums corresponding to the partition P to be inf{S(P, f )} and sup{S(P, f )} repectively, the inf and sup being taken over all Riemann sums corresponding to the partition P and the function f . Let Z
I
f = sup{S(P, f )}
Z
and
P
I
f = inf P {S(P, f )},
the sup and inf here being taken over all partitions P of I. (i) Show that Q ⊃ P implies S(P, f ) ≤ S(Q, f ) ≤ S(Q, f ) ≤ S(P, f ). (ii) Show that
R
I
f≤
R
If.
(iii) (Definition). f is integrable on I if Z
I
f=
Z
R
I
f=
f= I
Z
R
I
f , and then we define
f. I
(iv) Show that f is integrable in this sense if and only if there is exactly one number α such that S(P, f ) ≤ α ≤ S(P, f ) R for every partition P of I, in which case I f = α. R (v) (Cauchy Criterion). Show that I f exists in this sense if and only if, for each ǫ > 0, there exists a partition Pǫ of I such that |S(Pǫ , f ) − S(Pǫ , f )| < ǫ. (vi) Show that for a bounded f , f is integrable on I in the sense of (iii) if and only if f is integrable in the sense adopted in these notes and that R both definitions give the same value for I f .
i
i i
i
i
i
i
3.3. Evaluation of Integrals
3.3 3.3.1
“muldown 2010/1/10 page 71 i
71
Evaluation of Integrals Real valued functions of a real variable
Theorem 3.16 (Fundamental Theorem of Calculus). If f is continuous on [a, b], then there is a function F on [a, b] such that F ′ (x) = f (x),
a ≤ x ≤ b.
Rx Proof. Consider the function F (x) = a f , a ≤ x ≤ b which exists by Exercise 14, we have from Theorem 3.15 (d) and (f) that F (x + h) − F (x) =
Z
x+h
f = f (ch )h,
x
where ch is in the interval between x and x + h. Since f is continuous on [a, b], F (x + h) − F (x) = lim f (ch ) = f (x). h→0 h→0 h lim
Any function F with the property that F ′ (x) = f (x), a ≤ x ≤ b, is called an antiderivative of f on [a, b]. Proposition 3.17. If F1 and F2 are antiderivatives of f , then F1 −F2 is a constant function. Proof.
F1′ (x) − F2′ (x) = f (x) − f (x) = 0, for all x. Therefore, F1 − F2 is constant.
Theorem 3.18. If f is continuous on [a, b] and F is any antiderivative of f on [a, b], then Z b f = F (b) − F (a). a
Rx Proof. By the proceding Proposition, there is a constant C so that a f = F (x) − Ra C. When x = a, this yields a f = 0 = F (a) − C, while using this when x = b we Rb find a f = F (b) − C = F (b) − F (a). Corollary 3.19 (Change of Variable Formula). Suppose that (i) φ′ exists and is continuous on [a, b]. (ii) f is continuous on φ([a, b]).
i
i i
i
i
i
i
72
“muldown 2010/1/10 page 72 i
Chapter 3. Riemann Integration
Then
Z
Z
φ(b)
f=
f (x) dx =
φ(a)
Proof.
b
a
φ(a)
φ(b)
Z
Z
(f ◦ φ)φ′ ,
b
or
f (φ(u))φ′ (u) du.
a
If F is an antiderivative of f , then d F (φ(u)) = f (φ(u))φ′ (u), du
so that F ◦ φ is an antiderivative of (f ◦ φ)φ′ . Therefore, Z
φ(b) φ(a)
f = F (φ(b)) − F (φ(a)) =
Z
b a
(f ◦ φ)φ′ .
You will have noticed that we used the Chain Rule here even though it was not yet proven in these Notes. A proof will be given in the next chapter in a more general context. R π/2 Example 3.20 To evaluate 0 sin2 (u) cos(u) du, we take f (x) = x2 , φ(u) = sin(u), and φ′ (u) = cos(u). Then Z
π/2
2
sin (u) cos(u) du =
Z
φ(π/2) 2
x dx =
0
φ(0)
0
Z
1
x2 dx =
1 . 3
Corollary 3.21 (Integration by Parts Formula). If f ′ and g ′ are continuous on [a, b]. then Z b Z b f g′ + f ′ g = f (b)g(b) − f (a)g(a). a
a
Proof. An antiderivative of f g ′ + f ′ g is f g. Therefore, the formula follows immediately.
Example 3.22 Z
0
1
1 Z xex dx = xex − 0
0
1
1 ex dx = e − (ex ) = e − (e − 1) = 1. 0
i
i i
i
i
i
i
3.3. Evaluation of Integrals
“muldown 2010/1/10 page 73 i
73
Real valued functions on R2
3.3.2
The important theorem of Fubini allows us to extend the use of the Fundamental Theorem of Calculus to functions of more than one variable. Theorem 3.23 (Fubini’s Theorem). Let f : I = [a, b] × [c, d] 7→ R. Suppose that R (ii) I f exists (ii)
Rd
Then
c
f (x, y) dy = F (x) exists for each x ∈ [a, b].
Rb a
F (x) dx exists and Z
b
F (x) dx = a
Proof. The definition of I such that
Z
a
R
b
"Z
#
d
f (x, y) dy dx = c
Z
f. I
f states that if ǫ > 0 is given, there is a partition Pǫ of Z P ⊃ Pǫ =⇒ S(P, f ) − f < ǫ, I
I
for any Riemann sum S(P, f ). Writing this in detail,
P = {x0 , x1 , . . . , xm } × {y0 , y1 , . . . , yk } ⊃ Pǫ m k Z X X f (x∗i , yj∗ )(xi − xi−1 )(yj − yj−1 ) − f < ǫ, =⇒ I i=1 j=1
where xi−1 ≤ x∗i ≤ xi and yj−1 ≤ yj∗ ≤ yj . This may be written as
m Z k X X (xi − xi−1 ) f (x∗i , yj∗ )(yj − yj−1 ) − f < ǫ. I i=1 j=1
(3.10)
Condition (ii) implies that for any fixed set of numbers x∗i , i = 1, . . . , m, the partition {yj } of [c, d] may be chosen to be so fine that X Z d k ǫ ∗ ∗ ∗ f (xi , y) dy < f (xi , yj )(yj − yj−1 ) − , i = 1, . . . , m, b − a c j=1
Rd since each of the integrals c f (x∗i , y) dy exists. Therefore, for the x∗i we have X k ǫ ∗ ∗ ∗ f (xi , yj )(yj − yj−1 ) − F (xi ) < , i = 1, . . . , m. (3.11) b−a j=1
i
i i
i
i
i
i
74
“muldown 2010/1/10 page 74 i
Chapter 3. Riemann Integration
Then (3.11) implies P Pk Pm m i=1 (xi − xi−1 ) j=1 f (x∗i , yj∗ )(yj − yj−1 ) − i=1 (xi − xi−1 )F (x∗i ) Pm ǫ ǫ < b−a i=1 (xi − xi−1 ) = b−a (b − a) = ǫ. The triangle inequality with (3.10) and (3.12) gives Z m X F (x∗i )(xi − xi−1 ) − f < 2ǫ. I
(3.12)
i=1
Thus, from the definition of the integral,
Rb a
F exists and equals
R
I
f.
Corollary 3.24 (Interchanging the order of integration). Let I = [a, b]×[c, d]. If R (i) I f exists
(ii)
(iii)
Rd
then
c
f (x, y) dy exists for each x ∈ [a, b],
Rb a
f (x, y) dx exists for each y ∈ [c, d], Z
a
b
"Z
#
d
f (x, y) dy dx = c
Z
d c
"Z
#
b
f (x, y) dx dy. a
Corollary 3.25. Let I = [a, b] × [c, d]. Suppose (i) f is bounded on I, (ii) f is continuous on I\K with µ2 (K) = 0 (R2 content zero), (iii) µ1 (K ∩ {(x, y) : c ≤ y ≤ d}) = 0 for each x ∈ [a, b] (R1 content on vertical segments). Then
Z
I
f=
Z
a
b
"Z
c
d
#
f (x, y) dy dx.
R Proof. Conditions (i) and (ii) imply that the integral I f exists by Theorem 3.9. Condition (iii) says that the intersection of K with each vertical line (considered as Rd a set in R) has content zero so that c f (x, y) dy exists for each x ∈ [a, b] again by Theorem 3.9. Therefore, all the conditions of Fubini’s Theorem are satisfied.
i
i i
i
i
i
i
3.3. Evaluation of Integrals
“muldown 2010/1/10 page 75 i
75
Corollary 3.26. If D := {(x, y) : a ≤ x ≤ b, φ(x) ≤ y ≤ ψ(x)}, where φ and ψ are continuous on [a, b] and f : R2 7→ R is continuous on D, then # Z b "Z ψ(x) Z f= f (x, y) dy dx. D
a
φ(x)
Proof. Use Corollary 3.25. The graphs of φ and ψ have zero content in R2 . Each vertical line intersects each graph once, and these two points have 1 dimensional content zero. See Figure 3.5
ψ D ϕ a
b
Figure 3.5. Area between two curves. Example 3.27 Applications of Fubini’s Theorem: (1) f (x, y) = xy 2 , I = [0, 1] × [0, 1]. Then Z Z 1 Z 1 Z xy 2 dy dx = f= I
Also,
Z
0
0
f=
I
Z
1
0
Z
1 0
1
x 1 dx = . 3 6
1
y2 1 dy = . 2 6
0
Z xy dx dy = 2
0
(2) f (x, y) = x3 y 2 , D = {(x, y) : 0 ≤ x ≤ 1, x2 ≤ y ≤ x}. Then (see Figure 3.6 Z Z 1 Z x Z 1 1 3 3 y=x 3 2 x y dx f= x y dy dx = y=x2 D 0 x2 0 3 7 Z 1 6 x x x9 x10 1 1 = dx = − −− . = 3 6 21 30 70 0 0 Also,
Z
"Z
√
#
Z
√
1 4 2 x= y x y dy x=y y 0 4 0 5 Z 1 4 y y6 y 7 1 1 y dy = − − . = = 4 4 20 28 0 70 0
f= D
Z
1
y
3 2
x y dx dy =
1
i
i i
i
i
i
i
76
“muldown 2010/1/10 page 76 i
Chapter 3. Riemann Integration
y=x 2
y=x
Figure 3.6. Second area in Example 3.27. The examples just given have iterated integrals which are all easily evaluated. However, one sometimes encounters iterated integrals where the antiderivatives cannot be found in terms of elementary functions. A simplification is sometimes achieved by using Fubini’s Theorem to reverse the order of the integration. i R 1 hR 1 R Example 3.28 (1) Let 0 y ey/x dx dy = D f . By Fubini’s Theorem, this integral may be rewritten as Z 1 Z x Z 1 Z 1 y=x y/x y/x e dy dx = xe dx = x(e − 1) dx = (e − 1)/2. 0
0
y=0
0
0
(2) Suppose that Z
1
2
Z
x
f (x, y) dy dx =
0
where φ(y) =
Z
f=
Z
2
#
f (x, y) dx dy
1, if 0 ≤ y ≤ 1 y, if 1 ≤ y ≤ 2.
Hence, the last integral may be rewritten as Z 1 Z 2 Z 2 Z f (x, y) dx dy + 0
2
φ(y)
0
D
"Z
1
1
y
2
f (x, y) dx dy.
Further worked examples may be found in Buck, pp. 115–119.
3.3.3
Real valued functions on Rn
Fubini’s Theorem may be stated for n-dimensional intervals as follows
i
i i
i
i
i
i
3.3. Evaluation of Integrals
“muldown 2010/1/10 page 77 i
77
y=x y=x x=1
D
x=1
D
x=2
y=0
y=0
Figure 3.7. The figures for Example 3.28. Theorem 3.29 (Fubini’s Theorem). Let Iℓ and Im be closed intervals in Rℓ and Rm respectively, with n = ℓ + m. Thus, In = Iℓ × Im = {(p, q) : p ∈ Iℓ , q ∈ Im } is a closed interval in Rn . Let f : In 7→ R and suppose R (i) In f exits R (ii) F (p) = Im f (p, q) dq exists for each p ∈ Iℓ . Then
exists and equals Notes:
R
In
Z
Iℓ
f.
F =
Z Z Iℓ
Im
f (p, q) dq dp
(1) The proof of Theorem 3.29 is exactly the same as that given before when n = 2. (2) The symbols “dp”, “dq” above are simply used as devices to indicate the spaces on which we are integrating. (3) For a more general formulation of Fubini’s Theorem where the condition (ii) is dropped see Calculus on Manifolds by M. Spivak (p. 58). However, the above statement of the theorem is sufficient for our needs. (4) We will prove a change of variables formula for integrals in higher dimensions in a later chapter.
Exercises 3.18. Let f be a real-valued function on [a, b] such that f ′ (x) exists for each x ∈ [a, b] Rb and a f ′ exists. Prove that Z b f (b) − f (a) = f ′. a
i
i i
i
i
i
i
78
“muldown 2010/1/10 page 78 i
Chapter 3. Riemann Integration (You may not use the Fundamental Theorem of Calculus. Why?)
3.19. Let f be a non-negative continuous function on [a, b]. Prove that content in R2 of the set
Rb a
f is the
D := {(x, y) : a ≤ x ≤ b, 0 ≤ y ≤ f (x)}.
Rb R That is, show D 1 = a f . 3.20. Let D := {(x, y) : 1 ≤ x ≤ 3, x2 ≤ y ≤ x2 + 1}. Show that # Z "Z 2 3
x +1
µ(D) =
dy dx = 2.
x2
1
3.21. Let f (x, y) = x2 + y 2 , D = {(x, y) : 0 ≤ x ≤ a, 0 ≤ y ≤ b}. Show Z 1 f = ab(a2 + b2 ). 3 D R 3.22. Let f (x, y) = xy, D be the region in the diagram. Show D f = this in two ways).
131 120 .
(Do
y y=1 y=x
y=(x−3) 4
0
2
x
Figure 3.8. Figure for Exercise 22. R 3.23. Find D f where f (x, y) = sin( xa + yb ) and D = {(x, y) : 0 ≤ x ≤ πa/2, 0 ≤ y ≤ πb/2}. 3.24. Let D be the region bounded by the curves x2 − y 2 = 1, x2 + y 2 = 4, which R contains (0, 0). Find D f where f (x, y) = x2 . R 3.25. Let f (x, y) = x. Prove that D f = 77/4, where D is the region in R2 illustrated in Figure 3.9. 3.26. Let f (x, y) = g(x) with (x, y) ∈ [a, b] × [c, d] = I. R Rb R (i) Prove that if a exists, then I f exists. Deduce from this that D f exists where D is any subset of I which has content. R1 (ii) Suppose g is defined on [0, 1] and 0 g exists. Prove that Z
0
1
Z
1
x
Z g(t) dt dx =
1
tg(t) dt.
0
3.27. Let f : I 7→ R be a bounded function on the closed interval I.
i
i i
i
i
i
i
3.3. Evaluation of Integrals
“muldown 2010/1/10 page 79 i
79
y
y=4
1 y=(2x+4)/3 0 2 y=x − 4x + 5 000 111 000 111 00 11 x=4 00 11 y=x/2
x=0
x
0
Figure 3.9. Figure for Exercise 25. (i) If
R
I
f exists, then
R
I
f 2 exists. [Hint:
|f (p)2 − f (q)2 | = |f (p) − f (q)||f (p) + f (q)| ≤ 2M |f (p) − f (q)|,
where M := sup{|f (p)| : p ∈ I}.] R R R (ii) Deduce from (i) that if I f and I g exist, then I f g exists. [Hint: (f − g)2 = f 2 − 2f g + g 2 .]
3.28. The Mean Value Theorem for Integrals (Theorem 3.15) implies the following If f is continuous on [a, b], then there exists c ∈ [a, b] such that Z b f = f (c)(b − a). a
Can you replace “continuous” by a less restrictive condition which still implies the result? Hint: [What was Darboux’s first name?] 3.29. (Cavalieri’s Principle) Let A and B be subsets of R2 with content. If x ∈ R, define Ax := {y : (x, y) ∈ A}, Bx := {y : (x, y) ∈ B}.
(“sections” of A and B). Suppose that, for each x, Ax and Bx have content in R and µ1 (Ax ) = µ1 (Bx ). Prove that µ2 (A) = µ2 (B). [Hint: Spell Fubini.] R1 3.30. Let f be a real-valued function on [0, 1] such that 0 f exists. Define ak by k 1X j ak := f ( ), k j=1 k
k = 1, 2, 3, . . . .
Show that {ak } is a convergent sequence and that Z 1 lim ak = f. k→∞
0
3.31. Let f, g be integrable on [a, b] with g(x) ≥ 0, a ≤ x ≤ b, and f continuous on [a, b]. Prove that there exists c ∈ [a, b] such that Z b Z b f g = f (c) g. a
a
i
i i
i
i
i
i
80
“muldown 2010/1/10 page 80 i
Chapter 3. Riemann Integration
3.32. Let D be a compact subset in Rn and f : Rn 7→ Rm be continuous on D. Show that {(p, f (p)) : p ∈ D} has Jordan content zero in Rn+m . 3.33. Let f be a continuous real-valued function on [0, 1]. (i) Prove that
Z
π
xf (sin(x)) dx = π 0
π/2
f (sin(x)) dx.
0
(ii) From part (i) deduce that Z π 0
π2 x sin(x) . dx = 2 4 2 − sin (x)
3.34. Use Fubini’s Theorem to show that # Z 1 "Z Z 1 "Z 2−√y f (x, y) dx dy = √ 0
Z
0
y
#
x2
f (x, y) dy dx+
0
Z
1
2
"Z
(x−1)2
#
f (x, y) dy dx.
0
R 3.35. Show that D 1 = 61 where D := {(x, y, z) : x ≥ 0, y ≥ 0, z ≥ 0, 0 ≤ x + y + z ≤ 1}. 3.36. Let D be a subset of R2 with content and f be a positive continuous function on D. Use Fubini’s Theorem to show that if K := {(x, y, z) : (x, y) ∈ D, 0 ≤ z ≤ f (x, y)}, then µ3 (K) =
Z
K
1=
Z
f.
D
Deduce that mµ2 (D) ≤ µ3 (K) ≤ M µ2 (D) where m, M are lower and upper bounds for f on D. 3.37. Evaluate Z 2Z 2 Z πZ π 2 sin(y) ex dx dy and dy dx. y 0 y 0 x 3.38. Show
Z
0
1
"Z √ 2 1−y −
√
1−y 2
sin
πy √ 1 − x2
#
dx dy = 1.
Explain carefully why each integral you consider exists. 3.39. See also exercises pp. 122-124 in Buck. References for Chapter III R. G. Bartle: Chapter VI R. C. Buck: Chapter III.
i
i i
i
i
i
i
“muldown 2010/1/10 page 81 i
Chapter 4
Differentiation 0f Functions of Several Variables 4.1 4.1.1
Preliminaries Linear Functions
Definition 4.1. A function L : Rn 7→ Rm is linear, if for all p, q ∈ Rn and λ ∈ R, there holds (i) L(p + q) = L(p) + L(q), (L is additive) (ii) L(λp) = λ L(p), (L is homogeneous). If v0 ∈ Rm and L is linear, then the function M : Rn 7→ Rm defined by M (p) = v0 + L(p). is an affine function. Linear algebra plays a prominent background role in this chapter as the next theorem suggests. Theorem 4.2. A function L : Rn 7→ Rm is linear if and only if there is a matrix C := [Ci,j ], i = 1, . . . , m, j = 1, . . . , n, such that if p = (x1 , . . . , xn ) ∈ Rn and L(p) = q = (y1 , . . . , ym ) ∈ Rm , then C1,1 . . . C1,n x1 y1 .. .. .. .. , or qT = CpT (T is the transpose). . = . ... . . x ym Cm,1 . . . Cm,n n Pn In particular, yi = j=1 Ci,j xj , i = 1, . . . , m.
Proof. “⇐=” If L is representable by a matrix as above, then L is linear by the nature of matrix multiplication from linear algebra. “=⇒” Conversely, let L be linear, L(p) = (L1 (p), . . . , Lm (p)) = (y1 , . . . , ym ). 81
i
i i
i
i
i
i
82
“muldown 2010/1/10 page 82 i
Chapter 4. Differentiation 0f Functions of Several Variables
If ej is the vector with 1 is the jth coordinate and has zeros elsewhere, j = 1, . . . , n, then p = (x1 , . . . , xn ) = x1 e1 + . . . + xn en =⇒ L(p) = L(x1 e1 + . . . + xn en ) = x1 L(e1 ) + . . . + xn L(en ).
Thus, for the coordinate functions, Li (p) = Li (x1 e1 + . . . + xn en ) = x1 Li (e1 ) + . . . + xn Li (en ), i = 1, . . . , m. Pn Setting Ci,j = Li (ej ), we see that yi = j=1 Li (ej )xj , and the matrix C has the form C = [ L(e1 )T . . . L(en )T ] . Recall from linear algebra that the rank of a matrix C is the number of vectors in the largest linearly independent set which can be chosen from the columns of C. Also, if L : Rn 7→ Rm is linear with the matrix representation C, then rank C = dim(L(Rn )), here L(Rn ) := {x1 L(e1 ) + . . . + xn L(en ) : xi ∈ R, i = 1, . . . , n}. So, the minimum number of vectors which it takes to span L(Rn ) in Rm is precisely equal to the rank C. The range of L(Rn ) is a linear subspace of Rm (e.g., if m = 3, it is either the origin, or a line through the origin, or a plane containing the origin, or all of R3 ). If v0 is a vector in Rm , then v0 + L(Rn ) := {v0 + v : v ∈ L(Rn )} = {v0 + L(u) : u ∈ Rn }, is called an affine space and is evidently L(Rn ) translated by the vector v0 .
z 3
3
L(R )
v +L(R ) 0
(0,0,0)
vo
u
v o+u y
(0,1,0)
x Figure 4.1. An affine plane in R3 . Example 4.3
1 1 rank 1 1 0 1
0 0 = 2. 1
i
i i
i
i
i
i
4.1. Preliminaries
“muldown 2010/1/10 page 83 i
83
Thus, for the linear map L : R3 7→ R3 corresponding to this matrix, the range L(R3 ) has dimension 2. In fact, it consists of all vectors of the form (s, s, t), that is the plane x = y. An example of an affine space in R3 is (0, 1, 0) + L(R3 ), which is the parallel plane through the point (0, 1, 0), i.e., the plane y = x + 1. Theorem 4.4. If L is linear, there is a constant M such that |L(p)| ≤ M |p|,
∀ p ∈ Rn .
Proof. Using the matrix representation of a linear map, we have by the CauchySchwartz inequality v v uX u n n X u n 2 uX t Ci,j t x2j Ci,j xj | ≤ |yi | = |Li (p)| = | j=1
j=1
j=1
i=1 j=1
j=1
v v v um uX uX m X n uX u u n 2 2 t =⇒ |L(p)| = t yi2 ≤ t Ci,j xj i=1
v uX n um X 2 . or |L(p)| ≤ M |p| where M := t Ci,j i=1 j=1
Corollary 4.5. A linear function L : Rn 7→ Rm is uniformly continuous on Rn . Proof.
By the previous theorem |L(p1 ) − L(p2 )| = |L(p1 − p2 )| ≤ M |p1 − p2 |.
Thus, |p1 − p2 | < ǫ/M =⇒ |L(p1 ) − L(p2 )| < ǫ. Definition 4.6. L : Rn 7→ Rm is one-to-one (1-1) if L(p1 ) = L(p2 ) implies p1 = p2 . Equivalently, p1 = 6 p2 implies L(p1 ) 6= L(p2 ). The next theorem says that for one-to-one linear functions, the inequality in Theorem 4.4 can be reversed! Theorem 4.7. Suppose L : Rn 7→ Rm is linear. Then L is one-to-one if and only if there exists a constant k > 0 such that |L(p)| ≥ k|p| for all p ∈ Rn . Proof.
“⇐=” If such a constant k exists, then |L(p1 ) − L(p2 )| = |L(p1 − p2 )| ≥ k|p1 − p2 |.
Thus, p1 6= p2 =⇒ |L(p1 ) 6= L(p2 ).
i
i i
i
i
i
i
84
“muldown 2010/1/10 page 84 i
Chapter 4. Differentiation 0f Functions of Several Variables
“=⇒” L is continuous on Rn by Corollary 4.5, which implies that |L| is continuous on Rn , and in particular, |L| is continuous on S := {p : |p| = 1}. Since S is compact and |L| is continuous on S, |L| achieves its minimum on S at some point, call it p0 . Thus, we define k := min{|L(p)| : |p| ∈ S} = |L(p0 )|,
and |p0 | = 1.
Since L(O) = 0 by linearity and L is one-to-one, we have k = |L(p0 )| > 0 = |L(O)|. Finally, for an arbitrary p ∈ Rn \{O}, p/|p| ∈ S and 1 1 p = |L(p)|. k ≤ L |p| |p|
Exercises 4.1. Let L : R2 7→ R3 be linear with L(e1 ) = (2, 1, 0), L(e2 ) = (1, 0, −1), where e1 = (1, 0), e2 = (0, 1). Find L(2, 0), L(1, 1), L(1, 3). Draw pictures. 4.2. Show that L(R2 ) 6= R3 for the function in the last exercise. 4.3. Show that if L : R2 7→ R3 is linear, then L(R2 ) 6= R3 . 4.4. Let L : R3 7→ R2 be linear. Show that there are non-zero vectors p ∈ R3 such that L(p) = O. a b 4.5. If L : R2 7→ R2 has a matrix , show that c d (i) L(R2 ) is a point if and only if a = b = c = d = 0. (ii) L(R2 ) is a line if and only if ∆ = ad − bc = 0 and a2 + b2 + d2 + c2 > 0. (iii) L(R2 ) = R2 if and only if ∆ 6= 0.
4.6. Show that in case (iii) of the last exercise, that L is one-to-one (and only in that case), and that the inverse function L−1 is linear with matrix d −b ∆ ∆ . −c a ∆
∆
4.7. Show that the sum and composition of two linear functions are linear. What are the matrix representations of these? 4.8. If L : Rn 7→ Rm is linear and one-to-one, then L−1 is linear on its domain. 4.9. Let f : Rn 7→ Rm be such that (i) f (p + q) = f (p) + f (q), for all p, q ∈ Rn (additive) (ii) f is continuous at O.
Prove that f is a linear function. 4.10. Let f : R 7→ Rm be such that f (λx) = λf (x)
(f is homogeneous).
i
i i
i
i
i
i
4.1. Preliminaries
“muldown 2010/1/10 page 85 i
85
Notice that f must be linear since f (x) = f (1)x. Show that homogeneity does not imply linearity for functions of more than one variable. HINT: Consider the function ( 3 3 x +y if (x, y) 6= O 2 2, f (x, y) = x +y 0, otherwise.
4.1.2
Straight Lines and Curves
If c and u are points in Rn , then {c + tu : t ∈ R} is the straight line through c in the direction of u. Recall that {p + t(q − p) : t ∈ R} is the straight line through p and q. It may also be considered as the straight line through p in the direction of p − q.
c+tu p+t(q−p) u q
q−p c
p
0
0
Figure 4.2. The straight line through p in the direction of q − p. Notice that this representation of a line is not unique; you may replace u by any multiple λu, λ 6= 0, and still get the same line. If f : R 7→ Rn , then we say that the set {f (t) : t ∈ R} is a curve in Rn (a continuous curve if f is continuous). The line through f (τ0 ) and f (τ ) when f (τ ) 6= f (τ0 ), is {f (τ0 ) + t(f (τ0 ) − f (τ )) : t ∈ R}. or equivalently, {f (τ0 ) + t
f (τ ) − f (τ0 ) : t ∈ R}. τ − τ0
Thus, it is the line through f (τ0 ) in the direction
f (τ ) − f (τ0 ) . We therefore define τ − τ0
f
τ
0
f(τ )
τ
f(τ ) 0
R
n
Figure 4.3. The tangent line to a curve and a secant line. the tangent to the curve f (τ0 ) to be {f (τ0 ) + tf ′ (τ0 ) : t ∈ R}, the line through f (τ0 )
i
i i
i
i
i
i
86
“muldown 2010/1/10 page 86 i
Chapter 4. Differentiation 0f Functions of Several Variables
in the direction of f ′ (τ0 ), provided f ′ (τ0 ) = lim
τ →τ0
f (τ ) − f (τ0 ) τ − τ0
exists.
Evidently, ′
f (τ0 ) =
fn (τ ) − fn (τ0 ) f1 (τ ) − f1 (τ0 ) lim , . . . , lim τ →τ0 τ →τ0 τ − τ0 τ − τ0
= (f1′ (τ0 ), . . . , fn′ (τ0 )).
It is also convenient to think of f (t) as being the position of a particle at time t. Then, f ′ (t) = (f1′ (t), . . . , fn′ (t)) is the velocity vector of the particle at time t. Notice also that the linear function L(t) = tf ′ (τ0 ) is the best linear approximation to f (τ0 + t) − f (τ0 ) in a neighborhood of t = 0 in the sense that f (τ0 + t) − f (τ0 ) − tf ′ (τ0 ) = 0. lim t→0 t
Example 4.8 The direction of the tangent to {(t, t2 ) : t ∈ R} = C in R2 at the origin is (1, 2t)|t=0 = (1, 0). So the tangent line to C at the origin is {O + t(1, 0) : t ∈ R} or {(t, 0) : t ∈ R}.
4.2
The Directional Derivative and the Differential
Definition 4.9. Let f : Rn 7→ Rm be a function with domain D ⊂ Rn . (i) If c is an interior point of D, u ∈ Rn , and f (c + tu) − f (c) , t→0 t lim
(t ∈ R)
exists, then this limit is called the directional derivative of f at c in the direction of u, and is denoted fu (c). (ii) If ei is the vector in Rn with 1 in the i-th coordinate and zeros elsewhere, then fei (c) is usually denoted by ∂f (c) ∂xi and is called the partial derivative of f at c with respect to the variable xi , i = 1, . . . , n. When we compute f (c + tu) − f (c) t we are in fact restricting our attention to the behaviour of the function f at c with respect to a straight line through c in the direction of u. This straight line in Rn (at least the portion of it that lies in D) is mapped by f into a curve in f (D) ⊂ Rm . lim
t→0
i
i i
i
i
i
i
4.2. The Directional Derivative and the Differential
“muldown 2010/1/10 page 87 i
87
f(c+tu) f(D)
f
D
f(c)+tf(u)
c+tu f(c)
c n
R
R
m
Figure 4.4. The vector fu (c) as the direction of the tangent in Rm . The vector fu (c) is the direction of the tangent to this curve in Rm . (See Figure 4.4.) It is also instructive to consider the graph G(D) ⊂ Rn+m of f , where G : n R 7→ Rn+m is defined by G(p) = (p, f (p)), p ∈ D. Computing Gu (c) = lim
t→0
G(c + tu) − G(c) (c + tu, f (c + tu)) − (c, f (c)) = lim t→0 t t (tu, f (c + tu) − f (c)) = (u, fu (c)). = lim t→0 t
Thus, Gu (c) = (u, fu (c)) is the direction (in Rn+m ) of the tangent to the curve {G(c + tu) : t ∈ R} = {(c + tu, f (c + tu)) : t ∈ R}. at the point (c, f (c)). (See Figure 4.5.) m
R
G(D)
(c,f(c))=G(c) (c+tu,f(c)+tf (c))=G(c)+t G (c) u
u
(c+tu,f(c+tu))=G(c+tu)
c n
R
D
c+tu
Figure 4.5. The directional derivative and the graph of a function. Example 4.10 We give several examples illustrating properties of directional derivatives: (1) Consider the function f : R2 7→ R defined by f (x, y) = x2 + y 2 . Then its graph is the function G : R2 7→ R3 defined by G(x, y) = (x, y, x2 + y 2 ). The directional derivative of f at (x, y) in the direction of e1 = (1, 0) is ∂f ∂x = 2x. 2 (0, 0) = 0 which implies that the x-axis in R is mapped by In particular, ∂f ∂x G onto a curve in R3 which has a horizontal tangent at G(0, 0) = (0, 0, 0).
i
i i
i
i
i
i
88
“muldown 2010/1/10 page 88 i
Chapter 4. Differentiation 0f Functions of Several Variables y
z G
2
z=x 2+ y 2
f(R ) f (1,0)
0
x
(0,0,0) 3
(0,0)
y
R
R
2
R
x
Figure 4.6. The figure for Example 4.10 (1). (2) For the function f (x1 , x2 ) = (x1 , x2 , x21 + x22 ) and points c = (c1 , c2 ). u = (u1 , u2 ), we have f (c + tu) − f (c) t 1 = (c1 + tu1 , c2 + tu2 , (c1 + tu1 )2 + (c2 + tu2 )2 ) − (c1 , c2 , c21 + c22 ) t 1 = (tu1 , tu2 , 2tu1 c1 + 2tu2 c2 + t2 u21 + t2 u22 ) t = (u1 , u2 , 2u1 c1 + 2u2 c2 + tu21 + tu22 ). Therefore, fu (c) = lim (f (c + tu) − f (c)) t→0
Notice that
1 = (u1 , u2 , 2u1 c1 + 2u2 c2 ). t
u1 1 = 0 u2 2u1 c1 + 2u2 c2 2c1
so that fu (c) is a linear function of u.
0 u 1 1 u2 2c2
(3) f : R 7→ R such that f ′ (c) exists. Then f (c + tu) − f (c) f (c + tu) − f (c) = u t tu so that fu (c) = lim
t→0
f (c + h) − f (c) f (c + tu) − f (c) = lim u = f ′ (c)u. h→0 t h
Again, fu (c) is linear in u (with matrix [f ′ (c)]!).
i
i i
i
i
i
i
4.2. The Directional Derivative and the Differential (4) Now for the bad news. Define f 0, f (x, y) = 1, 0,
“muldown 2010/1/10 page 89 i
89
: R2 7→ R by when y 6= x2 ;
when y = x2 and (x, y) 6= O;
when (x, y) = O.
Then fu (0, 0) = 0 for every u = (u1 , u2 ) ∈ R2 and is linear in u, but f is not continuous at (0, 0). In other words, the directional derivative exists in every direction at the origin, but the function is not continuous there. What happened to the univariate “differentiable implies continuous”?
(5) This example shows that the directional derivative does not have to be linear in u. Define f : R2 7→ R by ( 2 x y when x3 = 6 y2; 3 2, f (x, y) = x −y 0, when x3 = y 2 . Then f is not continuous at (0, 0) (why?!). Computing fu (0, 0), for u = (u, v) ∈ R2 , we find ( 2 −u v , if v 6= 0; f(u,v) (0, 0) = 0, if v = 0. In particular, f(u,v) is not linear in u = (u, v).
The preceding examples show that when f is a “nice” function at c, then fu (c) is linear in u, but in general it does not need to be linear even if it exists for all u. Observe however, that all the fu (c) in the above examples are homogeneous in u, that is, fλu (c) = λfu (c) for all λ ∈ R. When they fail to be linear, it is the additive property that fails. The homogeneity in u is always true when fu (c) exists.
Exercises 4.11. Prove that if fu (c) exists, then fλu (c) = λfu (c). 4.12. Prove that the expressions given in the directional derivatives of (4) and (5) in Example 4.10 are correct. 4.13. Check that the following partial derivatives are correct: (a) The function f (x, y) = x2 + y 3 has partial derivatives ∂f (x, y) = 2x, ∂x
∂f (x, y) = 3y 2 , ∂y
i
i i
i
i
i
i
90
“muldown 2010/1/10 page 90 i
Chapter 4. Differentiation 0f Functions of Several Variables (b) The function f (x, y) = (sin(xy), ex , cos(y)) has partial derivatives ∂f (x, y) = (y cos(xy), ex , 0), ∂x
∂f (x, y) = (x cos(xy), 0, − sin(y)). ∂y
4.14. For the function f : R2 7→ R3 defined by f (x, y) = (x2 + y 3 , sin(y), ex ), prove that fu (x, y) exists for each (x, y) ∈ R2 and each direction u = (u, v). Show also that fu (x, y) is linear in u = (u, v) with matrix 2x 3y 2 0 cos(y) . ex 0 Motivation: At first glance it seems that the directional derivative determines the properties of functions in the same way that the derivative in one variable does. Unfortunately, that is not quite the case. For example, in Example 4.10 (4), we have shown that a function may be discontinuous at a point where all directional derivatives exist. Therefore, a more rigorous approach must be used to extend the role of differentiation. By reforulating the definition, we can realize the “expected” properties, and hence the usefulness, of the derivative. To this end, we reformulate the definition of the derivative of a function in one variable to show case the linearity as found in Example 4.10 (3). Definition 4.11 (Derivative in one variable reformulated). A function f : R 7→ R is differentiable at c if there exists a number f ′ (c) such that f (x) − f (c) = f ′ (c) x−c f (x) − f (c) |f (x) − f (c) − f ′ (c)(x − c)| =⇒ lim − f ′ (c) = lim = 0. x→c x→c x−c |x − c| lim
x→c
Thus, the function f : R 7→ R is differentiable at c if and only if there exists a linear function L : R 7→ R such that lim
x→c
|f (x) − f (c) − L(x − c)| =0 |x − c|
(4.1)
and L is given by L(u) = f ′ (c)(u), for all u in R.
We can consider the linear function L to be the best linear approximation in the sense of (4.1) to f (x) − f (c) in a neighborhood of c, i.e. f (x) − f (c) ≈ L(x − c), or f (x) ≈ L(x − c) + f (c) = f ′ (c)(x − c) + f (c). This is the approximation along the tangent line (affine function) learned in calculus. Definition 4.12 (Differential in several variables). Let f : D ⊂ Rn 7→ Rm with c and interior point of D, the domain of f . Suppose L : Rn 7→ Rm is linear and |f (p) − f (c) − L(p − c)| = 0. lim p→c |p − c|
i
i i
i
i
i
i
4.2. The Directional Derivative and the Differential
“muldown 2010/1/10 page 91 i
91
(x,f(x))
f(x)
(x,f(c)+L(x−c))
(c,f(c))
f(c)
c
x
Figure 4.7. Derivative as best linear approximation. Then f is said to be differentiable at c, while L is called the differential of f at c and is denoted Df (c), that is L(u) = Df (c)u, for all u ∈ Rn . Example 4.13 If f : R 7→ R is given by f (x) = x3 , then f ′ (c) = 3c2 , so L(u) = 3c2 u for all u ∈ R. Indeed, computing |f (p) − f (c) − L(p − c)| = |p3 − c3 − 3c2 (p − c)| = |p − c||p2 + pc + c2 − 3c2 | = |p − c||p2 + pc − 2c2 |
=⇒ lim
p→c
|f (p) − f (c) − L(p − c)| = lim |p2 + pc − 2c2 | = 0. p→c |p − c|
Theorem 4.14. For a function f : D ⊂ Rn 7→ Rm and a point c interior to D, we have Df (c)(u) = L(u) ⇐⇒ Dfi (c) = Li (u),
∀ u ∈ Rn
and
i = 1, . . . , m,
where f (p) = (f1 (p), . . . , fm (p)),
L(u) = (L1 (u), . . . , Lm (u)).
Proof. This follows because the limit exists if and only if the limit of each component exists (this uses the equivalence of d(p, c) and d∞ (p, c)): |f (p) − f (c) − L(p − c)| =0 |p − c| |fi (p) − fi (c) − Li (p − c)| ⇐⇒ lim = 0, p→c |p − c| lim
p→c
i = 1, . . . , m
The matrix for the differential must have the form provided by the next theorem. Theorem 4.15. If f : D ⊂ Rn 7→ Rm and f is differentiable at an interior point c of D, then
i
i i
i
i
i
i
92
“muldown 2010/1/10 page 92 i
Chapter 4. Differentiation 0f Functions of Several Variables
(i) fu (c) exists for each u ∈ Rn and Df (c)(u) = fu (c). (ii) The matrix of the linear transformation Df (c) is ∂f1 ∂f1 . . . ∂x (c) ∂x1 (c) n .. .. , . ... . ∂fm ∂fm ∂x1 (c) . . . ∂xn (c)
This matrix is the Jacobian Matrix of f at c, and can be denoted also as h i ∂f ′ ∂x (c) , or simply f (c).
Proof.
(i) If Df (c) exists, then for u ∈ Rn \O and p = c + tu, |f (c + tu) − f (c) − Df (c)(tu)| =0 |tu| f (c + tu) − f (c) 1 lim − Df (c)(u) = 0 =⇒ |u| t→0 t f (c + tu) − f (c) = Df (c)(u). =⇒ lim t→0 t lim
t→0
Here we used the homogeneous property of linearity, Df (c)(tu) = tDf (c)(u) for the second step. (ii) recall from Theorem 4.2 that the matrix of L is [Ci,j ] = [Li (ej )]. Here Li (ej ) = Dfi (c)(ej ) = (fi )ej (c) by part (i) ∂fi (c), i = 1, . . . , m, j = 1, . . . , n. = ∂xj Notes: In the case when n = m, the determinant det f ′ (c) is called the Jacobian of f at c and is often denoted by Jf (c)
or
∂(f1 , . . . , fn ) (c). ∂(x1 , . . . , xn )
Part (ii) of the Theorem means that if u = (u1 , . . . , un ) ∈ Rn , then for i = 1, . . . , m, Li (u) = Dfi (c)(u) =
n X ∂fi (c)uj . ∂x j j=1
Corollary 4.16. If the differential exists, it is unique. n Proof. fu (c) is unique (from the uniqueness of limits) for each u ∈ R h . Alteri ∂fi ∂fi ′ (c) are unique and hence the matrix f (c) = (c) is natively, the partials ∂x ∂xj j
unique.
i
i i
i
i
i
i
4.2. The Directional Derivative and the Differential
“muldown 2010/1/10 page 93 i
93
Theorem 4.17. If f is differentiable at c, then f is continuous at c. Proof. Let L = Df (c). From the definition of Df (c), given ǫ > 0, there is a δ(ǫ) > 0 such that 0 < |p − c| < δ(ǫ) =⇒
|f (p) − f (c) − L(p − c)| < ǫ, |p − c|
i.e. |f (p) − f (c) − L(p − c)| < ǫ|p − c|. In particular, with ǫ = 1, we have |p − c| < δ(1) =⇒ |f (p) − f (c) − L(p − c)| < |p − c| =⇒ |f (p) − f (c)| ≤ (1 + M )|p − c|,
Theorem 4.4
=⇒ lim f (p) = f (c). p→c
Theorem 4.18 (Important). Let f : Rn 7→ Rm . If the partial derivatives ∂fi , ∂xj
i = 1, . . . , m,
j = 1, . . . , n,
exist in a neighborhood of c and are continuous at c, then f is differentiable at c. Proof. By Theorem 4.14, it is sufficient to prove the case m = 1 (i.e. to assume f is a real-valued function). By the hypothesis, for each ǫ > 0, there exists a δ > 0 such that ∂f ∂f ∂f |p − c| < δ =⇒ (p) exists and (p) − (c) < ǫ. ∂xi ∂xi ∂xi Let p = (x1 , . . . , xn ), c = (γ1 , . . . , γn ), and |p − c| < δ. Define
p = p0 , p1 = (γ1 , x2 , . . . , xn ), p2 = (γ1 , γ2 , x3 , . . . , xn ), . . . , c = pn . (See Figure 4.8.) Note that
p = (γ ,γ2) 2
p =(x ,x ) 0
1 2
1
p = (γ1, x2) 1
Figure 4.8. Moving in successive coordinate directions. |pk − c| ≤ |p − c| < δ
and
f (p) − f (c) = f (p0 ) − f (pn ) =
(4.2) n X
k=1
[f (pk−1 ) − f (pk )] .
(4.3)
i
i i
i
i
i
i
94
“muldown 2010/1/10 page 94 i
Chapter 4. Differentiation 0f Functions of Several Variables
Now on each line segment between pk−1 and pk , k = 1, . . . , n, we are really considering a real-valued differentiable function of a real variable so we may use the Mean Value Theorem. ∂f ∗ (p )(xk − γk ) (4.4) f (pk−1 ) − f (pk ) = ∂xk k where p∗k is on the line segment from pk−1 to pk , k = 1, . . . , n. Clearly, p∗k ∈ {p : |p − c| < δ} by (4.2), since this set is convex. From (4.3) and (4.2), we find f (p) − f (c) =
n X ∂f ∗ (p )(xk − γk ). ∂xk k k=1
If u = (u1 , . . . , un ), let L(u) be defined by L(u) =
n X ∂f (c)uk . ∂xk k=1
Then since |pk − c| < δ when |p − c| < δ, we have h ∂f ∗ f (p) − f (c) − L(p − c) = Pn k=1 ∂xk (pk ) − ≤
Pn
k=1
∂f ∗ ∂xk (pk )
−
|f (p)−f (c)−L(p−c)| |p−c|
(xk − γk )
2 1/2 P 1/2 n [ k=1 (xk − γk )2
∂f ∂xk (c)
(Cauchy-Schwartz) √ ≤ nǫ|p − c| =⇒
i
∂f ∂xk (c)
0 be given. Since φ is differentiable at c, there is a neighborhood U of c and a constant K > 0 such that if p ∈ U , then |φ(p) − φ(c)| < K|p − c| (see the proof of Theorem 4.17). Since Lψ is the differential of ψ at b = φ(c), there exists a δ > 0 such that ǫ |q − b| < δ =⇒ |ψ(q) − ψ(b) − Lψ (q − b)| < |q − b|. K Thus, from these two inequalities, if p is sufficiently close to c, we have ǫ |ψ(φ(p)) − ψ(φ(c)) − Lψ (φ(p) − φ(c))| < |φ(p) − φ(c)| K < ǫ|p − c|.
Therefore, assertion (1) is proved. For the proof of assertion (2), from the linearity of Lψ , there is a constant M > 0 such that |Lψ (u)| ≤ M |u|, for all u ∈ Rn . Thus, |Lψ (φ(p) − φ(c) − Lφ (p − c))| |φ(p) − φ(c) − Lφ (p − c)| ≤M →0 |p − c| |p − c| as p → c by definition of Lφ . The Chain Rule in higher dimensions is more interesting than in one dimension; for example, it includes the rules for differentiating sums and products as special cases. Thus, Theorem 4.23 is a corollary to Theorem 4.24. This can be seen by considering the functions F : Rn 7→ R2m , F (p) = (φ(p), ψ(p))
G : R2m 7→ Rm , G(q1 , q2 ) = αq1 + βq2 , H : R2m 7→ R, H(q1 , q2 ) = q1 • q2 .
q1 , q2 ∈ Rm
i
i i
i
i
i
i
102
“muldown 2010/1/10 page 102 i
Chapter 4. Differentiation 0f Functions of Several Variables
Then G ◦ F = αφ + βψ and H ◦ F = φ • ψ. Theorem 4.29 (Mean Value Theorem). Suppose f : Rn 7→ R. If a, b ∈ Rn and f is differentiable at each point of the line segment S between a and b, then there is a point c ∈ S, c 6= a or b, such that f (b) − f (a) = Df (c)(b − a). In the notation of Exercise 23, this may be written as f (b) − f (a) = ∇f (c) • (b − a) n X ∂f (c)(bj − aj ), = ∂x j j=1 where b = (b1 , . . . , bn ) and a = (a1 , . . . , an ). Proof. We reduce the problem to one variable by introducing the function F : [0, 1] 7→ R defined by F (t) = f (a + t(b − a)), 0≤t≤1 = f (λ(t)), λ(t) = a + t(b − a). By the Chain Rule, F ′ (t) exists for 0 ≤ t ≤ 1 and F ′ (t) = DF (t)(1) = Df (λ(t)) ◦ Dλ(t)(1) = Df (λ(t)) λ′ (t) = Df (λ(t))(b − a). By the Mean Value Theorem for F (t), F (1) − F (0) = F ′ (t0 )(1 − 0) = F ′ (t0 ), for some t0 ∈ (0, 1). Thus, f (b) − f (a) = Df (λ(t0 ))(b − a) = Df (c)(b − a),
c = λ(t0 ).
The Mean Value Theorem does not hold for functions f : Rn 7→ Rm if m > 1. Can you tell why? If you cannot, try to carry out a proof with m > 1. See Exercise 30 for what can be said.
Exercises 4.26. Let f : R 7→ R be differentiable. If F : R2 7→ R is defined by ∂F (a) F (x, y) = f (xy), then x ∂F ∂x = y ∂y ,
∂F (b) F (x, y) = f (ax + by), then b ∂F ∂x = a ∂y ,
i
i i
i
i
i
i
4.2. The Directional Derivative and the Differential
“muldown 2010/1/10 page 103 i
103
∂F (c) F (x, y) = f (x2 + y 2 ), then y ∂F ∂x = x ∂y .
4.27. Define f : R2 7→ R by (x2 + y 2 ) sin 2 1 2 , if (x, y) 6= (0, 0) x +y f (x, y) = 0, if (x, y) = (0, 0).
∂f Show that f is differentiable at (0, 0), but ∂f ∂x and ∂y are not continuous at (0, 0). 4.28. Let f : Rn 7→ R be a differentiable function and C be a smooth curve in Rn on which the function is constant. Prove that for any point c ∈ C, the tangent to C at c is perpendicular to ∂f ∂f ∇f (c) = (c), . . . , (c) . ∂x1 ∂xn
4.29. Suppose f : Rn 7→ Rn is differentiable. If for all collections of n Co-linear points {p1 , . . . , pn } in Rn the determinant ∂f1 ∂f1 ∂x (p1 ) . . . ∂x (p1 ) n 1 .. .. .. . . . ∂fn (p ) . . . ∂fn (p ) ∂x1
n
∂xn
n
is non-zero, then show that f is 1-to-1 on R . Thus, if n = 1 and f ′ (x) 6= 0 for each x ∈ R, then f is 1-to-1 on R. Show that the function f (x, y) = (x3 − y, ex+y ) is 1-to-1 on R2 . 4.30. If f : Rn 7→ Rm is differentiable at each point c in the line segment between a and b and satisfies |Df (c)(u)| ≤ M |u|, for each u ∈ Rn , then |f (b) − f (a)| ≤ M |b − a|. 4.31. (Euler’s Theorem) The real-valued function f (x) = f (x1 , . . . , xn ) is homogeneous of degree m if n
f (tx) = f (tx1 , . . . , txn ) = tm f (x1 , . . . , xn ) = tm f (x),
∀ t ∈ R.
For example the functions x3 + y 3 + 3x2 y,
x5 + y 5 + z 5 , (x + y + z)5
x2 y 7 + 2z 4 x5 ,
are homogeneous of degree 3, 0, 9 respectively. Prove that if f is differentiable and homogeneous of degree m, then x • ∇f = x1
∂f ∂f + . . . + xn = mf. ∂x1 ∂xn
[Hint: Differentiate the formula defining the concept of homogeneity with respect to t and set t = 1. Was Euler an Edmonton hockey player???]
i
i i
i
i
i
i
104
“muldown 2010/1/10 page 104 i
Chapter 4. Differentiation 0f Functions of Several Variables
4.32. (Project) Carry out the following: (i) Suppose f is continuous on [a, b] × [c, d] = I to R. Prove that if F (x) =
Z
d
f (x, y) dy,
c
then F is continuous on [a, b]. [Hint: f is uniformly contiguous on I.] (ii) Suppose f and ∂f ∂x are continuous on I. Let F be as in part (i). Prove that F ′ exists, is continuous on [a, b], and ′
F (x) =
Z
d
∂f (x, y) dy. ∂x
d
∂f (x, y) dy ∂x
c
[Hint: Let φ(x) =
Z
c
then R x φ is continuous by part (i). Use the Fubini Theorem to′ show that φ = F (x) − F (a). Hence, F is an antiderivative of φ so F exists and a equals φ.]
(iii) Suppose α(x) and β(x) have continuous derivatives on [a, b], and f and ∂f ∂x are continuous on I. Show d dx
Z
β(x)
α(x)
f (x, t) dt = f (x, β(x))β ′ (x) − f (x, α(x))α′ (x) +
Z
β(x)
α(x)
∂f (x, t) dt. ∂x
[Hint: Apply the Chain Rule and (ii) to Z z f (x, t) dt, at (x, y, z) = (x, α(x), β(x)).] F (x, y, z) = y
(iv) From the formula Z π 0
π dx , = 2 a + b cos(x) (a − b2 )1/2
a > 0,
|b| < a,
establish the results Z
π
0
Z
0
π
dx aπ = 2 2 (a + b cos(x)) (a − b2 )3/2 −bπ cos(x) dx = 2 . 2 (a + b cos(x)) (a − b2 )3/2
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order Rb
dx , 0 x2 +a2
(v) Evaluate Z
b
0
[Hint:
4.3
R∞ 0
105
a > 0, and from your result deduce that
dx 1 = 3 arctan 2 2 2 (x + a ) 2a
Show also that
“muldown 2010/1/10 page 105 i
Z
∞ 0
f = limT →∞
π dx = 3, (x2 + a2 )2 4a
RT 0
b b + 2 2 , a 2a (b + a2 )
a > 0.
a > 0.
f if this limit exists.]
Partial Derivatives of Higher Order
Definition 4.30. Let f : Rn 7→ R be such that ∂f ∂ also exists at this point, then we write ∂xj ∂xi ∂ ∂2f = ∂xj ∂xi ∂xj
and, in particular, ∂2f ∂ = 2 ∂xi ∂xi
∂f ∂xi
∂f ∂xi
∂f ∂xi
exists for some point. If
,
.
These are the partial derivatives of second order. Partial derivatives of third and higher order are similarly defined. Example 4.31 Let f (x, y) = x3 + 3y 2 + 2xy. Then ∂f ∂f = 3x2 + 2y, = 6y + 2x ∂x ∂y ∂2f ∂2f ∂2f = 6x, =2= , 2 ∂x ∂x∂y ∂y∂x
∂2f = 6. ∂y 2
It is usually the case, (but not always, see Exercise 13, page 349 in Buck), that the successive partial derivatives may be taken in any order we please, e.g., in the above example we have seen ∂2f ∂2f = . ∂x∂y ∂y∂x The following two theorems give sufficient conditions for this. There is no loss of generality in the fact that these theorems are proved in R2 only; we are only concerned with the behaviour of f with respect to two of the variables in any case.
i
i i
i
i
i
i
106
“muldown 2010/1/10 page 106 i
Chapter 4. Differentiation 0f Functions of Several Variables
Theorem 4.32. Let f : R2 7→ R. If the mixed partials ∂2f , ∂x∂y
∂2f ∂y∂x
exist and are continuous on an open set U ⊂ R2 , then they are equal at each point of U . Proof.
Let I = [a, b] × [c, d] ⊂ U . By the Fubini Theorem ! Z Z d Z b 2 ∂2f ∂ f J1 = = dx dy I ∂x∂y a ∂x∂y c Z d ∂f ∂f = (b, y) − ( (a, y) dy ∂y ∂y c = f (b, d) − f (b, c) − f (a, d) + f (a, c). ! Z b Z d 2 Z ∂ f ∂2f = dy dx J2 = a c ∂y∂x I ∂y∂x Z b ∂f ∂f = (x, d) − (x, c) dx ∂x ∂x a = f (b, d) − f (a, d) − f (b, c) + f (a, c).
Therefore, J1 = J2 for each interval I ⊂ U . Now, suppose that ∂2f ∂2f (x0 , y0 ) 6= (x0 , y0 ) ∂y∂x ∂x∂y for some (x0 , y0 ) ∈ U . We may even say that one is larger than the other, say, ∂2f ∂2f (x, y) − (x, y) > 0, ∂x∂y ∂y∂x in a small interval I containing (x0 , y0 ) by continuity of the mixed partials. But then Z 2 ∂ 2f ∂ f > 0, − J1 − J2 = ∂x∂y ∂y∂x I by Theorem 3.15(a); a contradiction. Hence, we must have equality throughout U .
The next Theorem is more general than the one just proved, but is also more difficult to prove. Theorem 4.33. If f : R2 → R satisfies (i)
∂2 f ∂f ∂f ∂x , ∂y , ∂x∂y ,
exist in a neighborhood U of (x0 , y0 ), and
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order (ii)
∂2 f ∂x∂y
then
∂2 f ∂y∂x (x0 , y0 )
Proof.
“muldown 2010/1/10 page 107 i
107
is continuous at (x0 , y0 ), exists and equals
∂2f ∂x∂y (x0 , y0 ).
We want to show that
∂ 2f (x0 , y0 ) = lim k→0 ∂y∂x
∂f ∂x (x0 , y0
+ k) − k
∂f ∂x (x0 , y0 )
exists and equals
∂2f (x0 , y0 ), ∂x∂y (4.5)
or equivalently, lim
k→0
(
∂f ∂x (x0 , y0
+ k) − k
∂f ∂x (x0 , y0 )
) ∂ 2f − (x0 , y0 ) = 0. ∂x∂y
The idea of the proof is to show that ∂f ∂x (x0 , y0
∂f ∂x (x0 , y0 )
∂2f ∗ (x (k), y ∗ (k)) + terms going to zero with k ∂x∂y (4.6) ∂2 f and with (x∗ (k), y ∗ (k)) → (x0 , y0 ) as k → 0, so that the continuity of ∂x∂y at (x0 , y0 ) may be used. To that end, given ǫ > 0, choose δ = δǫ > 0 so that 2 ∂ f ∗ ∗ ∂2f ∗ ∗ (x , y ) − (x0 , y0 ) < ǫ/2. (4.7) max{|x0 − x |, |y0 − y |} < δ =⇒ ∂x∂y ∂x∂y + k) − k
=
Fix k with 0 < |k| < δ. We rewrite the numerator of the quotient in the limit of (4.5) using (a) the definition of the partial derivative, (b) the fact that the Mean Value Theorem for one variable can be applied to g(y) = f (x0 + h, y) − f (x0 , y) for any fixed h (why?), and (c) that the Mean Value Theorem for one variable can be applied to h(x) = ∂f ∂y (x, y1 ) for any fixed y1 (why?), as follows ∂f ∂f ∂ (x0 , y0 + k) − (x0 , y0 ) = [f (x, y0 + k) − f (x, y0 )]x=x0 ∂x ∂x ∂x =
[f (x0 + h, y0 + k) − f (x0 + h, y0 )] − [f (x0 , y0 + k) − f (x0 , y0 )] + η(h), h with η(h) → 0 as h → 0 by (a)
dg k g(y0 + k) − g(y0 ) + η(h) = (y0 + θh,k k) + η(h) h dy h h i ∂f ∂f ∂y (x0 + h, y0 + θh,k k) − ∂y (x0 , y0 + θh,k k) = k + η(h), where |θh,k | < 1 by (b) h =
i
i i
i
i
i
i
108
“muldown 2010/1/10 page 108 i
Chapter 4. Differentiation 0f Functions of Several Variables
=k
∂ 2f (x0 + γh,k h, y0 + θh,k k) + η(h), ∂x∂y
where |γh,k | < 1 by (c).
If h = h(k) is chosen to be small enough that |h| < |k| and |η(h)| < |k|2 , then, after dividing by k, we obtain (4.6) with x∗ = x∗ (k) = x0 +γh,k h, y ∗ = y ∗ (k) = y0 +θh,k k, and η(h)/k as the term that goes to zero with k. Since max{|x∗ (k) − x0 |, |y ∗ (k) − x0 |} = max{|x0 + γh,k h − x0 |, |y0 + θh,k k − y0 |} ≤ |k|, all the requirements are met to achieve ∂f (x , y + k) − ∂f (x , y ) ∂2f ∂2f ∂2f 0 0 ∂x 0 0 ∗ ∗ ∂x − (x0 , y0 ) = (x , y ) − (x0 , y0 ) < ǫ, k ∂x∂y ∂x∂y ∂x∂y
using (4.7) provided |k| < ǫ/2, for then |η(h)/k| < |k| < ǫ/2.
Notation: Let D ⊂ Rn . By C(D) we will denote the set of all continuous functions on D. By C k (D), we mean all functions defined on D having all k-th order partial derivatives continuous on D. The range of the functions will be clear from the context in which the notation is used. We first give Taylor’s Theorem in an expanded form emphasizing two-variables. The Theorem can be stated cleanly in a form similar to the univariate form even in higher dimensions which we do in a second pass. Theorem 4.34 (Taylor’s Theorem). If f : U ⊂ Rn 7→ R, f ∈ C k (U ), where U is a convex open subset of Rn , and a, b ∈ U . (i) For n = 2, we can write f (x), b = (b1 , b2 ), a = (a1 , a2 ), as j k−1 X ∂ ∂ 1 (b1 − a1 ) + (b2 − a2 ) f (a) + Rk , f (b) = j! ∂x1 ∂x2 j=0 where for some point c on the line segment joining b and a, k 1 ∂ ∂ Rk = (b1 − a1 ) + (b2 − a2 ) f (c). k! ∂x1 ∂x2 (ii) For an arbitrary n, with b = (b1 , b2 , . . . , bn ), a = (a1 , a2 , . . . , an ), we have f (b) =
k−1 X j=0
1 j!
j ∂ ∂ (b1 − a1 ) + . . . + (bn − an ) f (a) + Rk ∂x1 ∂xn
where for some point c on the line segment joining b and a, k ∂ 1 ∂ (b1 − a1 ) Rk = + . . . + (bn − an ) f (c). k! ∂x1 ∂xn
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order Proof.
“muldown 2010/1/10 page 109 i
109
As in the proof of the Mean Value Theorem, let F (t) = f (λ(t)),
λ(t) = a + t(b − a).
By the Chain Rule, F ′ (t) = Df (λ(t))(b − a)
b 1 − a1 F (t) = [ b 2 − a2 ∂f ∂f (λ(t)) + (b2 − a2 ) (λ(t)) = (b1 − a1 ) ∂x1 ∂x2 ∂ ∂ f (λ(t)). + (b2 − a2 ) = (b1 − a1 ) ∂x1 ∂x2 ′
∂f ∂x1 (λ(t))
∂f ∂x2 (λ(t)) ]
The last expression is again a differentiable function of t (if k ≥ 2), so we can apply the same argument to it as we did for f to get inductively ∂ ∂ ∂ ∂ F ′′ (t) = (b1 − a1 ) (b1 − a1 ) f (λ(t)) + (b2 − a2 ) + (b2 − a2 ) ∂x1 ∂x2 ∂x1 ∂x2 j ∂ ∂ (j) F (t) = (b1 − a1 ) + (b2 − a2 ) f (λ(t)). ∂x1 ∂x2 Applying Taylor’s Theorem in one variable on 0 ≤ t ≤ 1 to obtain F (1) in terms of the derivatives of F at zero, we find F (1) =
k−1 X j=0
1 (j) F (0) + Rk , j!
Rk =
1 (k) F (ξ), k!
for some 0 < ξ < 1.
Part (i) of the Theorem follows by substituting j ∂ ∂ (j) + (b2 − a2 ) f (a), F (1) = f (b), F (0) = (b1 − a1 ) ∂x1 ∂x2
c = λ(ξ).
Part (ii) is exactly the same, except more terms are used in the sum of partials because there are more variables. To view Taylor’s Theorem somewhat differently, we define the gradient (see Exercise 23, which you may have done already), or the “operator” ∇ defined on real-valued differentiable functions on Rn with values in the real-valued functions on Rn : ∂f ∂ ∂ ∂f , ∇ : f 7→ . (4.8) ,..., ,..., ∇ := ∂x1 ∂xn ∂x1 ∂xn The gradient vector has many important applications (see Exercise 23). One is that the directional derivative in the direction u ∈ Rn at a point c can be given by the inner product with the gradient vector at the point fu (c) = Df (c)(u) =
n X j=1
uj
∂f = ∇f (c) • u = u • ∇f (c) =: h∇f (c), ui. ∂xj
(4.9)
i
i i
i
i
i
i
110
“muldown 2010/1/10 page 110 i
Chapter 4. Differentiation 0f Functions of Several Variables
Below we phrase Taylor’s Theorem using the gradient notation, which is simpler in form but richer in meaning. When properly viewed it allows the concept to expand to several variables without unnecessary clutter. (Unfortunately, our minds sometimes demands a struggle with the clutter before we can fully conceptualize. Don’t be afraid to get dirty with this.) Before stating Taylor’s Theorem, we do some house-keeping on notation and look at iterations of the particular operator u • ∇. Look at how the gradient operator is used to define, for a fixed direction u, the directional derivative as an “operator” on differentiable functions: Given u in Rn :
∇ • u = u • ∇ : f 7→
Pn
n X j=1
uj
∂f . ∂xj
∂f : Rn → R is differentiable, the operator may If the resulting function j=1 uj ∂x j be applied once again. Quite literally, this gives n X ℓ=1
uℓ
n n n n X ∂ X ∂f X X ∂ 2f ∂2f uℓ uj uj = uℓ uj . = ∂xℓ j=1 ∂xj ∂xℓ ∂xj ∂xℓ ∂xj j=1
This is the operator
(4.10)
ℓ,j=1
ℓ=1
(u • ∇)2 f := (u • ∇) (u • ∇)f .
As long as the resulting function (the function defined as the last sum in (4.10) is differentiable, we can apply the operator again. In this way, we define inductively the operator, (u • ∇)k f := (u • ∇) (u • ∇)k−1 f .
This can be written out in long form (in the same manner as the binomial formula) as a k-fold sum (u • ∇)k f =
X
1≤j1 ,...,jk ≤n
uj1 · · · ujk
∂kf . ∂xj1 · · · ∂xjk
Since we can only apply this operator again if all the partial derivatives of each ∂kf ∂xj1 · · · ∂xjk
(4.11)
exist, it is defined on the class of functions C N (U ), the space of functions defined on an open set U ⊂ Rn with all partial derivatives (4.11) continuous up to order k = N. Now we are ready to restate and prove Taylor’s Theorem: Theorem 4.35 (Taylor’s Theorem). Let f ∈ C N (U ) where U is an open convex set in Rn . If a is any point in U , then the value of the function f at any point b ∈ U is given by f (b) = f (a) +
N −1 X k=1
k 1 (b − a) • ∇ f (a) + RN , k!
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order
“muldown 2010/1/10 page 111 i
111
where the remainder term RN is given by RN =
N 1 (b − a) • ∇ f (c) N!
for some point c on the line segment joining a to b.
Proof. We will reduce the theorem to the univariate Taylor theorem. In fact, one k can see directly the analogy when one views (b − a) • ∇ f as the “k-th directional derivative in the direction b − a”. Let λ(t) := a + t(b − a), and define F (t) := f (λ(t)),
0 ≤ t ≤ 1.
Then by the Chain Rule for several variables, we have b 1 − a1 ∂f ∂f .. (λ(t)), . . . , (λ(t)) F ′ (t) = Df (λ(t))(b − a) = . ∂x1 ∂xn b n − an ∂f ∂f (λ(t)) + . . . + (bn − an ) (λ(t)) = (b − a) • ∇ f (λ(t)). = (b1 − a1 ) ∂x1 ∂xn
Applying this formula inductively to the result, we find k k−1 f (λ(t)) = (b − a) • ∇ f (λ(t)). (4.12) F (k) (t) = (b − a) • ∇ (b − a) • ∇ Therefore, applying the univariate Taylor formula to F (t), we obtain F (1) =
N −1 X k=0
F (k) (0) + RN , k!
where
F (N ) (t0 ) , for some t0 , 0 < t0 < 1. N! Substituting (4.12) into this formula and noting λ(0) = a and λ(1) = b, we obtain the Theorem with c = a + t0 (b − a). RN =
4.3.1
Min-Max Theory
We use Taylor’s Theorem to look for tests for relative maximums and minimums as was done with the second derivative tests for functions of one variable. Suppose f has continuous second order partial derivatives. Then f (b) = f (a) + (b − a) • ∇f (a) + (b − a) • ∇)2 f (c)/2! for some c on the line segment from a to b. If f has a relative max or min at x = a, then f restricted to any coordinate direction is a real-valued function of one variable with a relative max or relative min at a. Therefore, (see Exercise 18)
i
i i
i
i
i
i
112
“muldown 2010/1/10 page 112 i
Chapter 4. Differentiation 0f Functions of Several Variables
Theorem 4.36. If the function f : Rn 7→ R has a relative maximum or relative minimum at an interior point a of its domain where its partial derivatives exist, then ∂f (a) = 0, ∂xj
for all j = 1, . . . , n.
(4.13)
Points that satisfy (4.13) are called stationary or critical points. These are the potential points for relative max and mins. At a stationary point a, Taylor’s theorem of order 2 reads 2 f (b) = f (a) + (b − a) • ∇ f (c)/2. Thus, we find 2 (b − a) • ∇ f (c) > 0 =⇒ f (b) > f (a), 2 (b − a) • ∇ f (c) < 0 =⇒ f (b) < f (a).
and
2 Hence the behavior of the operator (b − a) • ∇ f is important for determining whether the critical point is a max or min. We will view this a little differently, using matrix notation. Let u ∈ Rn , and consider (u • ∇)2 f written in some imaginative ways: (u • ∇)2 f
= (u • ∇) (u • ∇)f = uT ∇T ∇f u ∂ ∂x1 u1 . ∂f ∂f .. , . . . , ] = [u1 , . . . , un ] .. [ ∂x . ∂xn 1 ∂ un ∂x ∂n2 f 2 f . . . ∂x∂1 ∂x ∂x1 ∂x1 n 2 u ∂ f ∂2f ∂x2 ∂x1 . . . ∂x2 ∂xn .1 . = [u1 , . . . , un ] . .. .. .. . . . un
= [u1 , . . . , un ]
=
Pn
j=1
Pn
ℓ=1
(4.14)
2 ∂2 f f . . . ∂x∂n ∂x ∂xn ∂x1 n Pn 2 ∂ f u ℓ=1 ∂x1 ∂xℓ ℓ
Pn
.. .
∂2 f ℓ=1 ∂xn ∂xℓ uℓ 2
f . uj uℓ ∂x∂j ∂x ℓ
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order If we define the n × n matrix Ac by ∂2f Ac :=
“muldown 2010/1/10 page 113 i
113
∂x1 ∂x1 (c)
...
∂2 f ∂x1 ∂xn (c)
∂2f ∂x2 ∂x1 (c)
...
∂2 f ∂x2 ∂xn (c)
.. .
.. .
.. .
∂2f ∂xn ∂x1 (c)
...
∂2 f ∂xn ∂xn (c)
then the above equation (4.14) may be written as
(4.15)
(u • ∇)2 f (c) = uT Ac u. We make the following observations: If f has continuous partial derivatives of second order, then 1. Ac is symmetric (Theorem 4.32), 2. if (b − a)T Ac (b − a) = ((b − a) • ∇)2 f (c) > 0 for all b and c close to a, then f (a) is a relative minimum, 3. if (b − a)T Ac (b − a) = ((b − a) • ∇)2 f (c) < 0 for all b and c close to a, then f (a) is a relative maximum, 4. if in any small neighborhood of a there are b and b∗ with ( (b − a)T Ac (b − a) = ((b − a) • ∇)2 f (c) > 0 ∗
T
∗
∗
and
2
(b − a) Ac (b − a) = ((b − a) • ∇) f (c) < 0,
then f (a) is neither a max nor a min, and is called a saddle point because f (b) > f (a) while f (b∗ ) < f (a). 5. Since the second order derivatives are continuous, the function in (4.14) is continuous in u = b − c and therefore, (1)-(4) will be true for Ac replaced by Aa if b is sufficiently close to a. Thus, we need to consider the properties of the mapping u 7→ uT Au on Rn for a given n × n matrix A. These mappings are called quadratic forms, since the values uT Au are homogeneous multivariate polynomials of degree 2 in the variables u1 , . . . , un , that is, they have the form X aℓ,j uℓ uj . 1≤ℓ,j≤n
Definition 4.37. For an n × n matrix A, we say (i) A is positive definite (respectively, positive semi-definite) if uT Au > 0 (respectively, uT Au ≥ 0) for all u ∈ Rn \O;
i
i i
i
i
i
i
114
“muldown 2010/1/10 page 114 i
Chapter 4. Differentiation 0f Functions of Several Variables
(ii) A is negative definite (respectively, negative semi-definite) if uT Au < 0 (respectively, uT Au ≤ 0) for all u ∈ Rn \O; and (iii) indefinite otherwise. Some of the importance of positive definite matrices can be seen from the following remark. Remark: A symmetric real matrix is positive definite if and only if its eigenvalues are positive. (Result from linear algebra which we will take as known.) Theorem 4.38. Suppose f : Rn → R, f ∈ C 2 (U ) in an open set containing a, and suppose further that (i) a is a stationary point of f , that is, ∂f (a) = 0, ∂xj
j = 1, . . . , n,
and (ii) the symmetric matrix A(a) is given by ∂2 f T A(a) = ∇ ∇f (a) =
∂x1 ∂x1 (a)
...
∂2 f ∂x1 ∂xn (a)
∂2 f ∂x2 ∂x1 (a)
...
∂2 f ∂x2 ∂xn (a)
.. .
.. .
.. .
∂2 f ∂xn ∂x1 (a)
...
∂2 f ∂xn ∂xn (a)
.
Then f has (a) a relative maximum at a if A is negative definite; (b) a relative minimum at a if A is positive definite; (c) a saddle point at a if A is indefinite; and (d) unknown properties (test doesn’t work) at a if A is semi-definite. Proof.
The proof follows from the discussion above.
Before going further, we return to R2 and look at these ideas more concretely. A 2 × 2 symmetric matrix being positive (negative) semidefinite means Q(x, y) = [ x y ]
a b
b c
x = ax2 + 2bxy + cy 2 ≥ 0 (≤ 0), y
∀ (x, y) ∈ R2 .
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order
“muldown 2010/1/10 page 115 i
115
It is said to be positive (negative) definite if Q(x, y) > 0 (< 0) for all (x, y) 6= O. For two by two matrices this expression reminds one of the quadratic formula, and we have the following easily checked criterion. Lemma 4.39. Let the 2 × 2 symmetric matrix be as in the preceding paragraph. (i) The matrix is positive (negative) semidefinite if and only if ac − b2 ≥ 0
and
a ≥ 0 (≤ 0),
c ≥ 0 (≤ 0).
(ii) The matrix is positive (negative) definite if and only if ac − b2 > 0
and
a > 0 (< 0),
c > 0 (< 0).
(iii) The matrix is indefinite if and only if ac − b2 < 0. Proof.
The lemma will follow from three observations
(a) Q(x, 0) = ax2 has the same sign as a. (b) Q(0, y) = cy 2 has the same sign as c. (c) aQ(x, y) = (ax + by)2 + (ac − b2 )y 2 . For example, if ac − b2 < 0 and a > 0, then Q(x, 0) > 0 for any x, but the choice x0 = −by/a gives Q(x0 , y) < 0 for any y > 0. Before restating Theorem 4.38 for two variable, we introduce another commonly used notation for partial derivatives fx := fy :=
∂f ∂x ∂f ∂y
fxy := fyx :=
∂f ∂y∂x ∂f ∂x∂y
fxx := fyy :=
∂f ∂x∂x ∂f ∂y∂y
Theorem 4.40. Let f : R2 7→ R be in C 2 (U ) for some neighborhood U of (x0 , y0 ). Suppose (x0 , y0 ) is a stationary point for f ; that is, fx (x0 , y0 ) = fy (x0 , y0 ) = 0, and let
f (x, y) A(x, y) = xx fyx (x, y)
fxy (x, y) . fyy (xy)
Then at the point (x0 , y0 ), the function f has (a) a relative maximum if A(x0 , y0 ) is negative definite, i.e. if fxx fyy − (fxy )2 > 0,
fxx < 0
at (x0 , y0 );
i
i i
i
i
i
i
116
“muldown 2010/1/10 page 116 i
Chapter 4. Differentiation 0f Functions of Several Variables
(b) a relative minimum if A(x0 , y0 ) is positive definite, i.e. if fxx fyy − (fxy )2 > 0,
fxx > 0
at (x0 , y0 );
(c) a saddle point or Minimax if A(x0 , y0 ) is indefinite, i.e. if fxx fyy − (fxy )2 < 0,
at (x0 , y0 );
(d) all possibilities at (x0 , y0 ), including none-of-the-above, if A(x0 , y0 ) is properly semi-definite det A(x0 , y0 ) = fxx fyy − (fxy )2 = 0
11 00 00(c,f(c)) 11
01c
G
at (x0 , y0 ).
f(c) f f(U)
G(U) U n
n+1
R
R
R
G(p)=(p,f(p))
01c
G
11 00 00 G(U) 11
f(U) U
(c,f(c))
n+1
f
f(c)
n
R
R
R
G(p)=(p,f(p))
(c,f(c)) 1 0
G f
G(U) n+1
R
U
0c 1
1 f(c) 0
n
R
R
Figure 4.12. Graphical illustrations of a relative maximum (top), a relative minimum (middle), and a saddle point (bottom).
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order
“muldown 2010/1/10 page 117 i
117
Example 4.41 For the function f (x, y) = x2 + xy + y 2 , from the equations fx = 2x + y = 0 and fy = x + 2y = 0, we find that (x, y) = (0, 0) is the only stationary point. Now fxx (x, y) fxy (x, y) 2 1 = , fyx (x, y) fyy (x, y) 1 2
satisfies fxx (0, 0) = 2 > 0 and 2 · 2 − 12 = 3 > 0, so that the matrix is positive definite and f has a relative minimum at (0, 0). In fact, it is a global minimum since the matrix above is positive definite for all (x, y), so that the remainder in Taylor’s Theorem R2 (x, y) > 0 for any (x, y) 6= (0, 0).
Example 4.42 For the function f (x, y) = x2 + 4xy + y 2 , from the equations fx = 2x + 4y = 0 and fy = 4x + 2y = 0, we find that (x, y) = (0, 0) is the only stationary point, but in this case fxx (x, y) fxy (x, y) 2 4 = , satisfies 2 · 2 − 42 = −12 < 0 fyx (x, y) fyy (x, y) 4 2 so the matrix is indefinite. Hence, f has a saddle point at (0, 0). Alternately one can see this from the observation that f (x, y) > 0 on the coordinate axes except at the origin, but is negative on the line y = −x when x > 0. Example 4.43 For the function f (x, y) = 3x2 − y 2 + x3 , from the equations fx = 6x + 3x2 = 3x(2 + x) = 0,
and fy = −2y = 0,
we see that the stationary points are (0, 0) and (−2, 0). We observe 6 0 for (0, 0) is indefinite =⇒ saddle point 0 −2 −6 0 for (−2, 0) is negative definite =⇒ relative maximum. 0 −2 Example 4.44 For the function f (x, y) = x4 + y 4 , from the equations fx = 4x3 = 0,
and fy = 4y 3 = 0,
we see that the stationary point is (0, 0). We observe fxx (x, y) fxy (x, y) 12x2 0 = , has zero determinant at (0, 0). fyx (x, y) fyy (x, y) 0 12y 2 Thus, we have no information from the theorem. However, 12x2 ≥ 0, 12y 2 ≥ 0 and 144x2 y 2 ≥ 0 so that the matrix is positive semidefinite for all (x, y) and positive definite if x 6= 0 and y 6= 0. Thus, the remainder in Taylor’s theorem R2 (x, y) ≥ 0 for all (x, y) and f has a minimum at (0, 0). Of course, one can easily observe that f (x, y) > 0 when (x, y) 6= (0, 0) without resorting to second derivatives.
i
i i
i
i
i
i
118
“muldown 2010/1/10 page 118 i
Chapter 4. Differentiation 0f Functions of Several Variables
Example 4.45 For the function f (x, y) = x4 − y 4 , from the equations fx = 4x3 = 0,
and fy = −4y 3 = 0,
we see that the stationary point is (0, 0). We observe fxx (x, y) fxy (x, y) 12x2 0 = , has zero determinant at (0, 0). fyx (x, y) fyy (x, y) 0 −12y 2 However, the matrix is positive semidefinite on the line y = 0 and negative semidefinite on x = 0 so that (0, 0) is a saddle point. In this case, it is also obvious without considering second order partials that f (x, 0) > f (0, 0) if x 6= 0
and f (0, y) < f (0, 0) if y 6= 0.
The last two examples illustrates the fact that anything can happen in the case that fxx fyy −(fxy )2 = 0 at a stationary point. Our next example looks at a function on R3 . However, we need some more applicable criteria for the definitiveness of a matrix. The following criteria may be found, say, in the book by Gantmacher, Matrix Theory. Theorem 4.46. Let A be a symmetric n × n matrix. Then (a) A is positive semidefinite if and only if the determinants of all the k × k submatrices of A symmetric about the main diagonal are non-negative (≥ 0). (b) A is negative semidefinite if and only if the determinants of all the k × k submatrices of A symmetric about the main diagonal have sign (−1)k or are zero. (c) A is positive definite if and only if the determinants a1,1 . . . a1,k .. > 0, k = 1, 2, . . . , n. det ... ... . ak,1 . . . ak,k (d) A is negative definite if and only if the determinants a1,1 . . . a1,k .. > 0, k = 1, 2, . . . , n. (−1)k det ... ... . ak,1 . . . ak,k (e) A is indefinite if and only if it satisfies none of (a), (b), (c), or (d). The determinants in parts (c) and (d) are sometimes called the principal minors.
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order
“muldown 2010/1/10 page 119 i
119
Example 4.47 For the function f (x, y, z) = x2 + y 2 + z 2 + 2xyz, solving the equations fx = 2x + 2yz = 0,
fy = 2y + 2zx = 0,
and fz = 2z + 2xy = 0
yields stationary points at (0, 0, 0),
(−1, 1, 1),
, (1, −1, 1),
(1, 1, −1),
(−1, −1, −1).
The matrix A(x, y) is
fxx fyx fzx
fxz 2 fyz = 2z fzz 2y
fxy fyy fzy
2z 2y 2 2x . 2x 2
At (0, 0, 0), A(0, 0, 0) = 2I3×3 which is positive definite; hence f has a relative minimum at (0, 0, 0). At the other stationary points, |x| = |y| = |z| = 1 and xyz = −1, so the principal minors of A satisfy 2 2z det[2] > 0 det = 4 − 4z 2 = 0 2z 2 2 2z 2y det 2z 2 2x = 8(1 − x2 − y 2 − z 2 + 2xyz) < 0. 2y 2x 2 Thus, the matrix is indefinite at all the other critical points so that these are all saddle points. In the following exercises, assume all the differentiability you need unless otherwise specified.
Exercises 4.32. Find Vx , Vy , Vz , Vu , Vxy , Vxyz , Vxyzu for the following functions V (i)
x2 y 2 u , a2 − z 2
(ii)
x y z u xy + + + + . y z u x zu
Solution for (i): Vx =
2xy 2 u a2 −z 2
Vxy =
4xyu a2 −z 2
Vy =
2x2 yu a2 −z 2
Vz =
8xyzu (a2 −z 2 )2
Vxyz =
2x2 y 2 uz (a2 −z 2 )2
Vxyzu =
Vu =
x2 y 2 a2 −z 2
8xyz (a2 −z 2 )2
Solution for (ii): Vx =
1 y
Vu =
−z u2
−
u x2
+
1 x
+ −
y zu xy u2 z
Vy =
−x 2
Vxy =
+
−1 y2
1 z
+
+
1 zu
x zu
Vz =
−y z2
Vxyz =
+
−1 uz 2
1 u
−
xy z2 u
Vxyzu =
1 u2 z 2
i
i i
i
i
i
i
120
“muldown 2010/1/10 page 120 i
Chapter 4. Differentiation 0f Functions of Several Variables
4.33. If V = x2 − y 2 , x = 3u − 4v + 2, y = 2u + 3v − 1, show that Vu = 6x − 4y, Vv = −8x − 6y. 4.34. If V = x3 + 8y 3 − 18xy, find the points (x, y) at which Vx = Vy = 0 and investigate the nature of V at these points [Solution: (0, 0) is a saddle point, (3, 3/2) is a minimum.] 4.35. If P = x2 y − y 3 − y 2 z, Q = xy 2 − x3 − x2 z, R = xy 2 + x2 y, show that P (Qz − Ry ) + Q(Rx − Pz ) + R(Py − Qx ) = 0. 4.36. If u = x + y + z, v = x2 + y 2 + z 2 , w = x3 + y 3 + z 3 − 3xyz, show ∂(u, v, w) = 0. ∂(x, y, z) 4.37. If V, P, Q, R are C 1 functions of (x, y, z) satisfying the relations Vx = µP,
Vy = µQ,
for some µ 6= 0,
, Vz = µR,
show that P (Qz − Ry ) + Q(Rx − Pz ) + R(Py − Qx ) = 0. ∂Q 4.38. Let P, Q : R2 7→ R, with P, Q, ∂P ∂y , ∂x continuous.
(i) Prove that there is a real valued function f on R2 such that f ′ (x, y) = [ P (x, y) Q(x, y) ] if and only if
∂P ∂y
=
∂Q ∂x .
[HINT: “Only if” easy. ”If” – consider
f (x, y) =
Z
x
P (t, y0 ) dt +
Z
y
Q(x, t) dt.
y0
x0
See Exercise 32 (ii).] (ii) State what you would consider to be a generalization of part (i) for functions of three variables. (iii) Verify that each of the following is the Jacobian matrix f ′ (x, y) of some function f (x, y) and find f (x, y): (a)
[ 2xy
x2 + 3y 2 ] ,
[ 2ye2x + 2x cos(y)
(b)
e2x − x2 sin(y) ] .
4.39. If F (x, y) = x/(x2 + y 2 ) and G(x, y) = y/(x2 + y 2 ), show that (i) Fy = Gx , Fx = −Gy (ii) ∇2 F = ∇2 G = 0 where ∇2 =
∂2 ∂x2
+
∂2 ∂y 2 .
4.40. Show the following: (i) If V = V (x, y), x = x(u, v), and y = y(u, v), then
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order
(a) [ Vu (b)
Vuu Vvu
Vv ]
=
Vuv Vvv
= +
“muldown 2010/1/10 page 121 i
121
xu xv yu yv xu yu Vxx Vxy xu xv xv yv Vyx Vyy yu yv y yuv x xuv . + Vy uu Vx uu yvu yvv xvu xvv [ Vx
Vy ]
(ii) If U = U (x, y), V = V (x, y), x = x(u, v), y = y(u, v), show that ∂(U, V ) ∂(x, y) ∂(U, V ) = . ∂(u, v) ∂(x, y) ∂(u, v) [Hint: Use (i)(a)] (iii) If x = r cos(θ), y = r sin(θ), show 1 ∂(U, V ) ∂(U, V ) = . ∂(x, y) r ∂(r, θ) 4.41. If V = 3x2 + 2y 2 + φ(x2 − y 2 ), prove that yVx + xVy = 10xy. 4.42. If V = x2 + y 2 + φ(xy) + ψ(y/x), prove that x2 Vxx − y 2 Vyy + xVx − yVy = 4(x2 − y 2 ). 4.43. If V = V (x, y), x = ρ cos(φ), y = ρ sin(φ), show that 1 1 ∇2 V := Vxx + Vyy = Vρρ + Vρ + 2 Vφφ . ρ ρ p [Hint: ρ = x2 + y 2 , φ = arctan(y/x).] 4.44. If V = V (r, θ), ρ = c2 /r, φ = 2π − θ, show that ρ2 Vρρ + ρVρ + Vφφ = r2 Vrr + rVr + Vθθ . 4.45. If V = V (x, y), x = x(u, v), y = y(u, v), and xu = yv , xv = −yu , show Vuu + Vvv = x2u + x2v = yu2 + yv2 . Vxx + Vyy 4.46. (Euler’s Theorem continued, see Exercise 31) If V is a homogeneous function of (x, y, z) of the mth degree, show that x2 Vxx + y 2 Vyy + z 2 Vzz + 2xyVxy + 2yzVyz + 2zxVzx = m(m − 1)V.
i
i i
i
i
i
i
122
“muldown 2010/1/10 page 122 i
Chapter 4. Differentiation 0f Functions of Several Variables
2 4.47. If zy = F (zz ), F ′ 6= 0, then zxx zyy = zxy . 4.48. If z = xF (x + y) + G(x + y), show that
zxx − 2zxy + zyy = 0. 4.49. Do the following: (a) If z = F (y +m1 x)+G(y +m2 x) and m1 , m2 are the roots of the quadratic equation am2 + 2hm + b, then verify that azxx + 2hzxy + bzyy = 0. Show this equation is satisfied by z = xF (y + m1 x) + G(y + m2 x) if m is a double root of the quadratic equation. (b) The equation of lateral vibration of a taut string is ztt − c2 zxx = 0. Deduce from (a) that z = F (x + ct) + G(x − ct) is a solution.
(c) A vibrating string for which initial displacement and velocity are specified is governed by relations of the form ztt − c2 zxx = 0,
z(x, 0) = f (x),
zt (x, 0) = g(x).
Show that z(x, t) =
1 1 {f (x + ct) + f (x − ct)} + 2 2c
Z
x+ct
g
x−ct
is a solution to this problem. 4.50. Do the following: (a) If z = xφ(y/x) + ψ(y/x), then x2 zxx + 2xyzxy + y 2 zyy = 0. (b) If z = x3 φ(y/x) + ψ(y/x)/x3 , then x2 zxx + 2xyzxy + y 2 zyy + xzx + yzy = 9z. [Hint: If you are observant you don’t have to do all that differentiation.] 4.51. Discuss the nature of the stationary points of the functions (a) 34x2 − 24xy + 41y 2 . [Solution: (0, 0) is a minimum.] (b) 3x2 + 4xy − 4y 2 . [Solution: (0, 0) is a saddle.]
√ (c) x2 y − 4x2 − y 2 . [Solution: (0, 0) is a max; (±2 2, 4) are saddles.] √ (d) x2 y + 2x2 − 2xy + 3y 2 − 4x + 7y. [Solution: (1, −1) is a min; (1 ± 6, −2) are saddles.] 2 2 (e) x2 yz − 2xyz + x2 z + x2 + √ y + z + yz − 2xz√− 2x + 2y + z. [Solution: (1, −1, 0) is a min; (1 ± 2, −2, 1) and (1 ± 2, 0, −1) are saddles.]
i
i i
i
i
i
i
4.3. Partial Derivatives of Higher Order
“muldown 2010/1/10 page 123 i
123
4.52. Show that the function f (x, y) = (y − x2 )(y − 2x2 ) does not have a relative extrema at (0, 0) even though it has a relative minimum at this point when the domain is restricted to any line x = t, y = αt. [In the (x, y)-plane sketch the curves f (x, y) = 0, then determine the regions f (x, y) < 0, f (x, y) > 0; check that on each line through (0, 0), f (x, y) > 0 = f (0, 0) if (x, y) is close enough to (0,0).] 4.53. Show that the box of maximum volume which can be inscribed in the ellipsoid y2 z2 x2 + + =1 a2 b2 c2 √ √ √ has sides of length 2a/ 3, 2b/ 3, 2c/ 3. 4.54. Suppose we are given n points (xj , yj ) ∈ R2 and desire to find a function F (x) = Ax + B for which the quantity n X j=1
F (xj ) − yj
2
is minimized. Show that this problem leads to the equations Pn Pn Pn A j=1 x2j + B j=1 xj = j=1 xj yj Pn Pn A j=1 xj + nB = j=1 yj
which are easily solved for A and B. The line y = Ax + B is the line which best fits the given set of points in the sense of least squares. 4.55. Let {φk } be a sequence of real-valued continuous functions on [a, b] such that Z b 1, if k = m; φk φm = 0, if k 6= m, a Let f be a real-valued continuous function on [a, b]. Prove that the choice of constants γ1 , . . . , γℓ which minimizes the quantity !2 Z b ℓ X γk φk , f− a
for a given ℓ is γk =
k=1
Z
b
f φk ,
k = 1, . . . , ℓ.
a
This problem arises in the theory of Fourier Series. 4.56. The capacity of a condenser formed by two concentric spherical conductors of radii a and b, 0 < a < b, is C(a, b) =
ab . b−a
i
i i
i
i
i
i
124
“muldown 2010/1/10 page 124 i
Chapter 4. Differentiation 0f Functions of Several Variables Suppose that in measuring the radii of a and b of the spheres the measurements are subject to errors of amount δa and δb respectively. Given that products of the errors are negligible relative to the other quantities involved derive the following approximation on the error in C: δa b δb a δC ≈ − . C a b−a b b−a
4.57. The breaking weight W of a cantilever beam is given by the formula W ℓ = kbd2 where b is breadth, ℓ is length, d is depth, and k is a constant depending on the material of the beam. If the breadth is increased by 2% and the depth by 5%, show that the length should be increased by about 12% if the breaking weight is to remain unchanged. 4.58. In a triangle ABC, the area is calculated from the elements a, B, C, the measurements being subject to errors δa, δB, δC. Show that the error δS in the area is approximately given by δS δa cδB bδC ≈2 + + . S a a sin(B) a sin C 4.59. Let f : R2 7→ R, be f of class C 2 on a convex subset D of R2 . Suppose that at each point p ∈ D, the matrix fxx fxy fyx fyy is positive semidefinite. Prove that if p, q ∈ D, then 1 p+q ≤ f (p) + f (q) . f 2 2 What does this result mean geometrically?
4.4
Local Properties of C 1 Functions
Recall that if f : Rn 7→ Rn , that is f (p) = (f1 (p), . . . , fn (p)), p = (x1 , . . . , xn ), is differentiable at p0 , then the Jacobian of f at p0 is ∂f1 ∂f1 ∂x 1 . . . ∂xn .. .. = ∂(f1 , . . . , fn ) = J (p ). det f ′ (p0 ) = ... f 0 . . ∂(x1 , . . . , xn ) ∂fn . . . ∂fn ∂x1
∂xn p0
Definition 4.48. A function f is locally one-to-one on a subset D of its domain if there is a neighborhood U of p ∈ D on which f is one-to-one.
Lemma 4.49. If f : Rn 7→ Rn is of class C 1 at p0 and Jf (p0 ) 6= 0, then there is a neighborhood U of p0 on which f is one-to-one.
i
i i
i
i
i
i
4.4. Local Properties of C 1 Functions
“muldown 2010/1/10 page 125 i
125
Proof. Since the partials of f are continuous at p0 and Jf (p0 ) 6= 0, there is a convex neighborhood U of p0 such that if pi ∈ U , i = 1, . . . , n, then ∂f1 ∂f1 ∂x (p1 ) . . . ∂x (p1 ) n 1 .. .. .. (4.16) 6= 0. . . . ∂f ∂f n (p ) . . . n (p ) ∂x1
n
∂xn
n
If a, b ∈ U , then, applying the Mean-Value Theorem to each component of f , we find Pn ∂f1 f1 (b) − f1 (a) = i=1 (bi − ai ) ∂xi (p1 ) .. .. .. (4.17) . . . Pn ∂fn fn (b) − fn (a) = i=1 (bi − ai ) ∂xi (pn ) where pi ., i = 1, . . . , n, are some points on the line segment between a and b. But (4.16) and (4.17) imply (fi (b) − fi (a) = 0,
i = 1, . . . , n) ⇐⇒ b − a = 0
since the system of equations has full rank. Thus, f (a) = f (b) if and only if a = b and f is one-to-one on U . The following is an immediate consequence of this lemma: Theorem 4.50. Let f : Rn 7→ Rm , m ≥ n. Suppose D is an open subset of Rn and (i) f ∈ C 1 (D) (ii) rank f ′ (p) = n,
for all
p ∈ D,
then f is locally one-to-one on D. Proof. Suppose f = (f1 , . . . , fm ), m ≥ n. We wish to show that f is one-to-one on a neighborhood U of each point p0 ∈ D. Since rank f ′ (p0 ) = n, we may assume without loss of generality that ∂(f1 , . . . , fn ) (p0 ) 6= 0. ∂(x1 , . . . , xn ) (This can be achieved by relabelling the f ’s.) Thus, if f˜ : Rn 7→ Rn is defined by f˜ = (f1 , . . . , fn ), then f˜ ∈ C 1 (D) and Jf˜(p0 ) 6= 0, so by the Lemma, f˜ is one-toone on a neighborhood of p0 . But since f (p) = f (q) implies f˜(p) = f˜(q), f˜ being one-to-one implies f is one-to-one. Example 4.51 This example shows that in the preceding theorem, f need not be globally one-to-one. Take f (x, y) = (ex cos(y), ex sin(y)). Then x e cos(y) −ex sin(y) = e2x 6= 0, for all (x, y) ∈ R2 . Jf (x, y) = x e sin(y) ex cos(y)
i
i i
i
i
i
i
126
“muldown 2010/1/10 page 126 i
Chapter 4. Differentiation 0f Functions of Several Variables
Thus, from Theorem 4.50, f is locally one-to-one on R2 . However, f is not globally one-to-one on R2 since f (x, y + 2π) = f (x, y),
for all
(x, y) ∈ R2 .
Note: The strip {(x, y) : x ∈ R, y ∈ [−π, π)} is mapped onto R2 \O. In fact, the vertical lines x = c, are mapped to circles u2 + v 2 = ec , and the horizontal lines y = k, k 6= ±π/2 are mapped to lines v = u tan(k), y = π/2 is mapped to the positive v axis and y = −π/2 to the negative v axis. (See Figure 4.13.) y v=u tan(k) f
2π y=k x=c
u
x u 2 + v 2= e 2
Figure 4.13. The image of a strip for Example 4.51.
Lemma 4.52. Suppose f : Rn 7→ Rn is in C 1 (D) for an open set D ⊂ Rn . If Jf (p) 6= 0 for all p ∈ D, then f (D) is an open set of Rn . Proof. In order to prove f (D) is open, we must show that for any q0 ∈ f (D), there is a δ0 = δ(q0 ) > 0 such that B(q0 , δ0 ) ⊂ f (D). Let q0 = f (p0 ) for some p0 ∈ D. Since D is open, we may choose η0 > 0 so that both (a) B(p0 , η0 ) ⊂ D, since D is open, and (b) f is one-to-one on B(p0 , η0 ) by Theorem 4.50. Let C = ∂B(p0 , η0 ) = {p : |p − p0 | = η0 }. Then f (C) is compact (why?) and q0 = f (p0 ) 6∈ f (C). Note that δ = inf{|q − q0 | : q ∈ f (C)} =⇒ δ > 0
(Why?).
(See Figure 4.14.) We claim that B(q0 , δ/3) ⊂ f (D) and therefore f (D) is open. To prove this claim, for an arbitrary q ∈ B(q0 , δ/3), we will show that q ∈ f (D). To this end, consider the function φ : B(p0 , η0 ) ⊂ Rn 7→ R defined by 2
φ(p) = |f (p) − q| ,
p ∈ B(p0 , η0 ).
Now φ is continuous and B(p0 , η0 ) is compact, so there exists p∗ ∈ B(p0 , η0 )
such that φ(p∗ ) ≤ φ(p),
for all p ∈ B(p0 , η0 ).
i
i i
i
i
i
i
4.4. Local Properties of C 1 Functions
C
“muldown 2010/1/10 page 127 i
127
f
00 11 p0
1 0 q0
f(C)
Figure 4.14. The point q0 is away from the image of the boundary. Now p∗ 6∈ C since otherwise we would have p 2δ δ p p∗ ∈ C =⇒ φ(p∗ ) ≥ > > φ(p0 ), 3 3
contradicting that φ has a minimum at p∗ over B(p0 , η0 ). Therefore, p∗ is an interior minimum so the partial derivatives of φ are all zero at p∗ (Theorem 4.36). Thus, from the definition of φ φ(p) = |f (p) − q|2 =
n X i=1
(fi (p) − qi )2
n X ∂fi ∗ ∂φ ∗ (p ) = 0 = 2 (p ) (fi (p∗ ) − qi ) , =⇒ ∂xj ∂x j i=1
j = 1, . . . , n.
But the last line can be viewed as a homogeneous system of linear equations with coefficient matrix having determinant Jf (p∗ ) 6= 0. Consequently, we must have n o fi (p∗ ) = qi , i = 1, . . . , n =⇒ f (p∗ ) = q =⇒ q ∈ f (D). The proof is complete, and f (D) is open.
Remark: The condition that Jf (p) 6= 0 cannot be dropped even at a single point in D. Indeed, take D = R and f (x) = x2 . Then f (R) = [0, ∞) is not open even though f ′ (x) 6= 0 except at the single point x = 0. Theorem 4.53. Suppose f : D ⊂ Rn 7→ Rm , m ≤ n, for some open set D. If f ∈ C 1 (D) and the Jacobian matrix has full rank, i.e. rank f ′ (p) = m, for every p ∈ D, then f (D) is open. Proof. Given c ∈ D, we must prove that f (c) belongs to the interior of f (D). By relabeling the x′ s if necessary, we may assume that ∂(f1 , . . . , fm ) (c) 6= 0. ∂(x1 , . . . , xm ) Since f ∈ C 1 (D), there is an open neighborhood U of c such that ∂(f1 , . . . , fm ) (p) 6= 0, ∂(x1 , . . . , xm )
∀ p ∈ U.
i
i i
i
i
i
i
128
“muldown 2010/1/10 page 128 i
Chapter 4. Differentiation 0f Functions of Several Variables
Define f˜ : Rm 7→ Rm on the open set
by
e := {(x1 , . . . , xm ) : (x1 , . . . , xm , cm+1 , . . . , cn ) ∈ U } U f˜(x1 , . . . , xm ) = f (x1 , . . . , xm , cm+1 , . . . , cm ).
e ) and J ˜(x1 , . . . , xm ) 6= 0, for all (x1 , . . . , xm ) ∈ U. e From Lemma Then f˜ ∈ C 1 (U f m e ) is an open set in R . Hence 4.52, f˜(U e ) ⊂ f (U ) ⊂ f (D) f (c) = f˜(c1 , . . . , cm ) ∈ f˜(U
=⇒ f (c) ∈ f (D)◦ ,
∀ c ∈ D,
and so f (D) is open. e was open, why is that true? Question: In the proof it was stated that U
Lemma 4.54. If f is continuous and one-to-one on a compact set S, then the inverse function f −1 is continuous on f (S). Proof. Let q0 ∈ f (S). We wish to show f −1 is continuous at q0 . If f −1 were not continuous, then for some ǫ0 > 0, there is a sequence of points {qk } ⊂ f (S) for which lim qk = q0 and f −1 (qk ) − f −1 (q0 ) ≥ ǫ0 . (4.18) k→∞
Now the image of the sequence, {f −1 (qk )} is a sequence in the compact set S, hence there is a subsequence {f −1 (qkj )} such that lim f −1 (qkj ) = p1 ∈ S.
j→∞
(4.19)
By (4.18), p1 6= f −1 (q0 ). But f is continuous at p1 and f is one-to-one so, from (4.19), lim qkj = lim f f −1 (qkj ) = f (p1 ) 6= f f −1 (q0 ) = q0 , j→∞
j→∞
a contradiction. Thus, f −1 must be continuous at q0 .
In general, the requirement that S be compact cannot be dropped. It cannot even be replaced by the requirement that f (S) be compact as the next example shows: Example 4.55 Let f : [0, 2π) 7→ {x2 +y 2 = 1}, be defined by f (θ) = (cos(θ), sin(θ)). Then f ∈ C([0, 2π)), f is one-to-one and f ([0, 2π)) is compact in R2 . However, f −1 is discontinuous at (1, 0) (nearby points on the circle on either side of this point are at opposite ends of the interval under f −1 ).
i
i i
i
i
i
i
4.4. Local Properties of C 1 Functions
“muldown 2010/1/10 page 129 i
129
Lemma 4.56. Let f : Rn 7→ Rn . If f ∈ C 1 in a neighborhood of p0 and Jf (p0 ) 6= 0, then there is a neighborhood U of p0 and a constant m > 0 such that p ∈ U =⇒ |f (p) − f (p0 )| ≥ m |p − p0 | . Proof. Since Jf (p0 ) 6= 0, the differential Df (p0 ) : Rn 7→ Rn is one-to-one. Therefore, there exists m > 0 such that |Df (p0 )(u)|
=⇒ |Df (p0 )(p − p0 )|
≥ 2m|u|,
for all u ∈ Rn (Theorem 4.7),
≥ 2m|p − p0 |,
for all p.
(4.20)
But from the definition of Df (p0 ), there is a neighborhood U of p0 such that p ∈ U =⇒ |f (p) − f (p0 ) − Df (p0 )(p − p0 )| ≤ m|p − p0 |.
(4.21)
But then m|p − p0 | ≥ |f (p) − f (p0 ) − Df (p0 )(p − p0 )| ≥ |Df (p0 )(p − p0 )| − |f (p) − f (p0 )| ≥ 2m|p − p0 | − |f (p) − f (p0 )| by (4.21) =⇒
m|p − p0 | ≤ |f (p) − f (p0 )|.
We are now in a position to say something about the differentiability of the inverse function. Theorem 4.57. Let f : Rn 7→ Rn , and suppose (i) D is an open set of Rn , f ∈ C 1 (D), and Jf (p) 6= 0 at every p ∈ D, and (ii) f is one-to-one-on D. Then f −1 ∈ C 1 (f (D)) and Df −1 = (Df )−1 (the inverse matrix). Proof.
Let g : Rn 7→ Rn be defined by 1 f (p) − f (p0 ) − Df (p0 )(p − p0 ) g(p) := |p − p0 | =⇒ lim g(p) = O, p→p0
(by definition of Df (p0 ).).
(4.22) (4.23)
−1 Since Jf (p0 ) 6= 0, the inverse matrix Df (p0 ) exists and from (4.22), −1 −1 f (p) − f (p0 ) − Df (p0 )(p − p0 ) |p − p0 | Df (p0 ) g(p) = Df (p0 ) −1 f (p) − f (p0 ) − (p − p0 ). = Df (p0 ) (4.24)
i
i i
i
i
i
i
130
“muldown 2010/1/10 page 130 i
Chapter 4. Differentiation 0f Functions of Several Variables
By Theorem 4.53, f (D) is open and, in particular, it is a neighborhood of f (p0 ). Hence, with q0 = f (p0 ) and q = f (p), we may deduce from (4.24) and Lemma 4.56 that −1 −1 1 |q−q0 || Df (p0 ) (g(p))| ≥ f −1 (q) − f −1 (q0 ) − Df (p0 ) (q − q0 ) (4.25) m if p is in some neighborhood of p0 . Furthermore, from Lemma 4.54, f and f −1 are continuous at p0 and q0 respectively, so that q → q0 =⇒ f −1 (q) → f −1 (q0 ) =⇒ p → p0 . Therefore, from (4.25) and (4.23) we deduce −1 −1 (q − q0 ) f (q) − f −1 (q0 ) − Df (p0 ) = 0, lim q→q0 |q − q0 | since
lim q→q0
−1 Df (p0 ) (g(p)) = 0.
−1 Df (p0 ) (g(p)) = lim
p→p0
−1 −1 Thus, since Df (p0 ) is linear, Df −1 exists and equals Df (p0 ) . −1 From the fact that Df −1 (f (p0 )) = Df (p0 ) , it follows that f −1 ∈ C 1 (f (D)) −1 since the partial derivatives of f are rational functions of the partial derivatives of f in which the denominator Jf (p) is not zero. Example 4.58 We have seen that f (x, y) = (ex cos(y), ex sin(y)) is one-to-one on the strip 0 ≤ y < 2π. Recall x e cos(y) −ex sin(y) ′ = e2x 6= 0, det f (x, y) = det x e sin(y) ex cos(y) x e cos(y) ex sin(y) −1 . =⇒ [f ′ (x, y)] = e−2x −ex sin(y) ex cos(y) To find f −1 , solve (u, v) = (ex cos(y), ex sin(y)) for x and y: u2 + v 2 = e2x ,
and f
−1 ′
v u
v 1 log(u2 + v 2 ), y = arctan 2 u v 1 . log(u2 + v 2 ), arctan f −1 (u, v) = 2 u =⇒
=⇒
tan(y) =
x=
(u, v) =
u u2 +v 2 −v u2 +v 2
= e−2x
v u2 +v 2 u u2 +v 2
1 = 2 u + v2
u −v
ex cos(y) ex sin(y) . −ex sin(y) ex cos(y)
v u
i
i i
i
i
i
i
4.5. Implicit Function Theorem
“muldown 2010/1/10 page 131 i
131
−1 Thus, we see that Df −1 (u, v) = Df (x, y) .
Exercises 4.60. In Exercise 29 a sufficient condition was given for f : Rn 7→ Rn to be globally one-to-one. Verify that this condition is not satisfied by Example 4.51. (If you did not do Exercise 29, a proof is given in Lemma 4.49.) 4.61. Show that if f : R 7→ R and f ′ (x) 6= 0 for each x ∈ R, then f is one-to-one globally on R. 4.62. Let f : R 7→ R. Show that if f is continuous and one-to-one on a connected subset S of R, then f −1 is continuous on f (S). 4.63. If y = y(x), that is yi = yi (x1 , . . . , xn ), for i = 1, . . . , n, be C 1 . Show that ∂(y1 , . . . , yn ) ∂(y1 , . . . , yn ) ∂(x1 , . . . , xn ) 6= 0 =⇒ · = 1. ∂(x1 , . . . , xn ) ∂(x1 , . . . , xn ) ∂(y1 , . . . , yn ) 4.64. Let f (x, y) = (x2 , y/x) when x > 0. Find f ′ (x, y). Show that f is one-to-one −1 ′ on its domain by finding f −1 . Check that f ′ = f −1 . 4.65. Let ! x y f (x, y) = p . ,p x2 + y 2 x2 + y 2
Show that Jf (x, y) = 0 for all (x, y) ∈ R2 \O and f is not locally one-to-one anywhere on its domain. Show that the range of f is the circle u2 + v 2 = 1 and thus contains no open subset.
4.5
Implicit Function Theorem
Question: If we are given a system of n equations in m + n unknowns, when and how can we choose n of the variables, call them y1 , . . . , yn to be defined implicitly as functions of the remaining m variables, call them x1 , . . . , xm ? Writing the “independent” variables x1 , . . . , xm first and the “dependent” variables y1 , . . . , yn second, we are asking When does the system of equations fi (x1 , . . . , xm , y1 , . . . , yn ) = 0,
i = 1, 2, . . . , n
(4.26)
uniquely define y1 , . . . , yn implicitly as functions of x1 , . . . , xm , i.e. when are there functions φi so that yi = φi (x1 , . . . , xm ),
i = 1, 2, . . . , n?
(4.27)
Of course, the way the question was originally stated, part of the problem is to find which are the independent variables and which are the dependent variables. The equations will not be written for you with the independent variable so conveniently
i
i i
i
i
i
i
132
“muldown 2010/1/10 page 132 i
Chapter 4. Differentiation 0f Functions of Several Variables
labelled. However, since we can always rearrange variables (once we identify them), for the sake of the statement of the Theorem, we assume the given order. If the equations (4.26) are linear in the variables yi , then we know we have a solution if the coefficient matrix is nonsingular. But observe, the coefficient matrix of linear functions is nothing more than the matrix of partial derivatives for those functions with respect to the linear variables. This suggests that the answer lies in whether or not the Jacobian of functions with respect to the dependent variables in nonzero. Theorem 4.59 (Implicit Function Theorem). Suppose that (i) f : Rm+n → Rn is C 1 (D) for some open domain D ⊂ Rm+n , and (ii) there is a point (p0 , q0 ) in D for which f (p0 , q0 ) = O
and
∂(f1 , . . . , fn ) (p0 , q0 ) 6= 0, ∂(y1 , . . . , yn )
(4.28)
where p = (x1 , . . . , xm ) denotes points in Rm , q = (y1 , . . . , yn ) denotes points in Rn , and (p, q) denotes points in Rm+n . Then there is a unique function φ : Rm → Rn defined in a neighborhood U of p0 such that φ ∈ C 1 (U ) and φ(p0 ) = q0
and
Moreover, if yi = φi (p), i = 1, . . . , n, we have ∂f1 ∂f1 ∂f1 ∂f1 . . . ∂y . . . ∂x ∂y1 ∂x1 n m . .. .. .. = − .. .. . . . . . . . ∂fn ∂x1
and
...
∂fn ∂xm
h i ( ∂f ∂x
∂fn ∂y1
∀p ∈ U.
f (p, φ(p)) = O,
... h ih i ∂φ = − ∂f ∂y ∂x )
∂fn ∂yn
∂φ1 ∂x1
. . .
∂(f1 , . . . , fn ) ∂(y1 , . . . , yi−1 , xj , yi+1 , . . . , yn ) ∂φi ∂yi = =− , ∂(f1 , . . . , fn ) ∂xj ∂xj ∂(y1 , . . . , yn )
∂φn ∂x1
... .. . ...
(4.29)
∂φ1 xm
.. . ,
∂φn ∂xm
i = 1, . . . , n, j = 1, . . . , m.
(4.30)
(4.31)
Before we proceed to the proof, we illustrate the theorem by several examples: Example 4.60 Take f (x, y) = x2 − y and note f (0, 0) = 0, (∂f /∂y)(0, 0) = −1. Hence, f (0, 0) = 0 can be solved by y = φ(x) = x2 in a neighborhood of x = 0. Notice further that ∂f −2x dy ∂x = 2x = = − ∂f . dx −1 ∂y
i
i i
i
i
i
i
4.5. Implicit Function Theorem
“muldown 2010/1/10 page 133 i
133
What about solving for x in terms of y? Now f (0, 0) = 0 and (∂f /∂x)(0, 0) = 0 means that the condition (4.28) is not satisfied. The equation f (x, y) = 0 cannot be solved in the form x = ψ(y) with ψ defined in a full neighborhood of y = 0. √ Notice there are two solutions of the form x = ± y defined on [0, ∞), but even these are not C 1 at x = 0. There are infinitely many discontinuous solutions defined √ √ on [0, ∞); for example, one is ψ(y) = y, y ∈ Q, ψ(y) = − y, y 6∈ Q. Example 4.61 Given the equations x+y+z =0 x + y + z − 2xz − 1 = 0 2
2
2
(f1 (x, y, z) = 0) (f2 (x, y, z) = 0)
which of the variables can be solved in terms of the others and at what points? Basically, we want to test when (4.28) holds. Therefore, we choose pairs of variables and look at the Jacobian: ∂(f1 , f2 ) 1 1 = = 2(y − x + z). 2x − 2z 2y ∂(x, y)
Therefore, we can solve for x, y as a function of z in a neighborhood of any point except those lying on the plane x − z − y = 0. Substituting x = −z − y into the second equation gives 4z 2 + 4zy + 2y 2 + −1 = 0,
or z 2 + (z + y)2 −
1 = 0, 2
q which implies y = −z ± 12 − z 2 . Substituting this in the first equation gives x as a function of z. Testing another pair of variables, we find ∂(f1 , f2 ) 1 1 = 0. = 2x − 2z 2z − 2x ∂(x, z)
Hence, we can never solve for x, z as a function of y. One can check this directly: from the first equation y = −(x + z) which substituted into the second equation gives 2(x + z)2 − 1 = 0, or 2y 2 = 1. Thus, eliminating x also eliminates z. Since the equations are symmetric in x and z, it should be no surprise to discover that we can solve for z, y as a function of x in a neighborhood of any point except those lying on the plane x − z − y = 0. Example 4.62 Consider the equations 2x2 − u + v = 0 2y 3 − u − v = 0
z + u − v2 = 0
(f1 (x, y, z, u, v) = 0) (f2 (x, y, z, u, v) = 0) (f3 (x, y, z, u, v) = 0)
Checking several possibilities, we find −1 1 ∂(f1 , f2 , f3 ) = −1 −1 ∂(u, v, z) 1 −2v
0 0 = 2. 1
i
i i
i
i
i
i
134
“muldown 2010/1/10 page 134 i
Chapter 4. Differentiation 0f Functions of Several Variables ∂(f1 , f2 , f3 ) ∂(u, v, x) ∂(f1 , f2 , f3 ) ∂(u, v, y) ∂(f1 , f2 , f3 ) ∂(x, y, z)
−1 +1 4x 0 = 4x(2v + 1) = −1 −1 1 −2v 0 −1 +1 0 = −1 −1 6y 2 = −6y 2 (2v − 1) 1 −2v 0 2x 0 0 = 9 6y 2 0 = 12xy 2 . 0 0 1
This shows that we can always solve for u, v, z in terms of x and y, but for the other combinations tested, there are points at which we cannot apply the theorem. The first case leads to the solution v = y 3 − x2 ,
u = x2 + y 3 ,
z = −x2 − y 3 + y 6 − 2y 3 x2 + x4 .
Example 4.63 Consider the equations x2 − yu xy + uv
= 0 = 0.
(4.32)
We will view this as f (x, y, u, v) = (x2 − yu, xy + uv) = O and define implicitly u, v as functions of x and y in a neighborhood of (x0 , y0 ). This requires from (4.28) that we have a point (x0 , y0 , u0 , v0 ) such that ∂(f1 , f2 ) −y0 0 = = −y0 u0 6= 0. x20 − y0 u0 = 0, x0 y0 + u0 v0 = 0, and v0 u0 ∂(u, v)
Now, the last equation implies y0 6= 0 and u0 6= 0, and putting this in (4.32) indicates that x0 6= 0 as well. The equations are easy to solve as u=
x2 , y
v=
−y 2 . x
According to (4.31), we should have 2x 0 −u 0 y u x u ∂u 2x ∂u u −x2 =− = , =− =− = 2 ∂x −yu y ∂y −yu y y −y 2x 2 v 2x −y y y2 y 2xv −y 2 ∂v x = 2 =− =− − = 2 − 2 ∂x −yu u yu x x x −y −u v x x v −2y ∂v =− = − − == . ∂y −yu u y x
i
i i
i
i
i
i
4.5. Implicit Function Theorem
“muldown 2010/1/10 page 135 i
135
You will see from the simple exercises for this section that when the basic condition on the Jacobian in (4.28) is not satisfied there may be no solution, or no C 1 solution, or indeed infinitely many solutions. Consider the function F : Rn+m 7→ Rn+m defined by
Proof of Theorem 4.59.
F (p, q) = (p, f (p, q)). Then clearly f JF (p0 , q0 ) =
∈ C 1 (D) implies F ∈ C 1 (D) and Im×m ∂f1 ∂x1
.. .
∂fn ∂x1
∂f1 ∂xm
... .. . ...
.. .
∂fn ∂xm
Om×n ∂f1 ∂f1 . . . ∂yn ∂(f1 , . . . , fn ) ∂y1 = (p0 , q0 ) 6= 0. .. .. .. ∂(y1 , . . . , yn ) . . . ∂fn ∂fn . . . ∂y (p0 ,q0 ) ∂y1 n
By Theorem 4.50, F has an inverse function in a neighborhood W of (p0 , q0 ) and n
R (v)
n
n
F(W)
F m
R (u)
00 11 F(p,q)
00 11 (p ,q ) R (q) 00 011 0 00 11 f 00 11 00 11 00 11 W 00 11 00 11 00 11 m R (p) 1 0 (p,q)
R
f(W) 0
1 0 0f(p,q) 1
(p 0,0)
Figure 4.15. Illustrating the proof of the Implicit Function Theorem. by Theorem 4.57, F −1 ∈ C 1 (F (W )), and F (W ) is a neighborhood (Theorem 4.53) of (p0 , O) = F (p0 , q0 ). Now setting F (p, q) =⇒ (p, q) =⇒
= =
p =
=⇒ (u, v)
=
(p, f (p, q)) = (u, v), F
−1
(Notice u = p)
(u, v) = (u, θ(u, v))
(4.33)
u and q = θ(u, v) F (F −1 (u, v)) = F (u, θ(u, v)) = (u, f (u, θ(u, v)).
for all (u, v) in the neighborhood F (W ) of (p0 , O). In particular, v = f (u, θ(u, v))
(u = p)
and thus, O = f (p, θ(p, O)) for all p in a neighborhood U of p0 . Therefore, O = f (p, φ(p))
if
φ(p) = θ(p, O),
and φ(p0 ) = θ(p0 , O) = q0
i
i i
i
i
i
i
136
“muldown 2010/1/10 page 136 i
Chapter 4. Differentiation 0f Functions of Several Variables
so that (4.29) holds. Also φ ∈ C 1 (U ) follows from (4.33) since F −1 ∈ C 1 (F (W )). To prove (4.31), we apply the Chain Rule to O = f (p, φ(p)) to obtain Im×m O = Df (p, φ(p)) Dφ(p) n
=⇒ 0 =
∂fi X ∂fi ∂φk + , ∂xj ∂yk ∂xj
i, j = 1, . . . , m,
k=1
which gives (4.30). Viewing that as a linear system of equations for unknowns ∂φk /∂xj , k = 1, . . . , n, Cramer’s Rule gives (4.31) Remark: The function φ is the unique solution to f (p, φ(p)) = O with φ(p0 ) = q0 since if to the contrary we had two solutions f (p, φ1 (p)) = f (p, φ2 (p)) = O
and φ1 (p) 6= φ2 (p)
for some p ∈ U , then
F (p, φ1 (p)) = F (p, φ2 (p)) = (p, O), contradicting the fact that F is one-to-one on W . This means that the graph of φ, the set {p, φ(p)) : p ∈ U } is the whole set F −1 (O) ∩ W . Corollary 4.64. Let f : R2 7→ R. If f is of class C 1 in a neighborhood of (x0 , y0 ) with ∂f (x0 , y0 ) 6= 0, f (x0 , y0 ) = 0 and ∂y then the equation f (x, y) = 0 has a unique solution y = φ(x) with φ(x0 ) = y0 , which exists and is continuously differentiable in a neighborhood U of x0 and φ′ (x) =
dy ∂f /∂x =− . dx ∂f /∂y
Corollary 4.65. Let f : R3 7→ R. If f is of class C 1 in a neighborhood of (x0 , y0 , z0 ) with ∂f f (x0 , y0 , z0 ) = 0 and (x0 , y0 , z0 ) 6= 0, ∂z then the equation f (x, y, z) = 0 has a unique solution z = φ(x, y) with φ(x0 , y0 ) = z0 , which exists and is continuously differentiable in a neighborhood U of (x0 , y0 ) and ∂f /∂x ∂z ∂f /∂y ∂z =− , == − . ∂x ∂f /∂z ∂y ∂f /∂z Corollary 4.66. Let f : R5 7→ R2 . If f = (f1 , f2 ) is of class C 1 in a neighborhood of (x0 , y0 , z0 , u0 , v0 ) with f1 (x0 , y0 , z0 , u0 , v0 ) = 0, ∂(f1 , f2 ) and (x0 , y0 , z0 , u0 , v0 ) 6= 0, ∂(u, v) f2 (x0 , y0 , z0 , u0 , v0 ) = 0,
i
i i
i
i
i
i
4.5. Implicit Function Theorem
“muldown 2010/1/10 page 137 i
137
then the equations f1 (x, y, z, u, v) = 0
and
f2 (x, y, z, u, v) = 0
have unique solution u = φ1 (x, y, z),
v = φ2 (x, y, z)
with
u0 = φ1 (x0 , y0 , z0 ),
v0 = φ2 (x0 , y0 , z0 ),
which exist and are continuously differentiable in a neighborhood U of (x0 , y0 , z0 ) and ∂(f1 ,f2 )
∂(f1 ,f2 )
∂φ2 ∂v ∂(u,x) = = − ∂(f ,f ) , 1 2 ∂x ∂x
∂u ∂φ1 ∂(x,v) = = − ∂(f ,f ) , 1 2 ∂x ∂x ∂(u,v)
∂(u,v)
∂(f1 ,f2 )
∂(f1 ,f2 )
∂φ1 ∂u ∂(y,v) = = − ∂(f ,f ) , 1 2 ∂y ∂y
∂v ∂φ2 ∂(u,y) = = − ∂(f ,f ) , 1 2 ∂y ∂y
∂(u,v)
∂(u,v)
∂(f1 ,f2 )
∂(f1 ,f2 )
∂φ1 ∂u ∂(z,v) = = − ∂(f ,f ) , 1 2 ∂z ∂z
∂v ∂φ2 ∂(u,z) = = − ∂(f ,f ) . 1 2 ∂z ∂z
∂(u,v)
∂(u,v)
Exercises 4.66. Prove Corollary 4.64. That is, work through the proof of Theorem 4.59 in this special case. 4.67. The equation y 2 − x2 = 0 has two C 1 solutions, y = ±x in a neighborhood of x = 0; this shows uniqueness may not hold. What condition of the Implicit Function Theorem does not hold? Check that there are four solutions of class C and infinitely many real-valued solutions. 4.68. Show that the equations x2 − yu = 0,
u = u0 ,
v = v0 ,
xy + uv = 0,
define u, v implicitly as functions of x and y in a neighborhood of any point (x0 , y0 ) if y0 u0 6= 0 and x0 − y0 u0 = 0, x0 y0 + u0 v0 = 0. Check that the Implicit Function Theorem gives 2x ∂u = , ∂x y
∂u −u = , ∂y y
∂v y 2vx =− − , ∂x u uy
∂v x v =− + ∂y u y
and verify these results by solving the equations directly. 4.69. Consider f (x, y, u, v) = (x3 + yu + v, xv + y 3 − u). At what points (x, y, u, v) can one solve f (x, y, u, v) = (0, 0) in the form (x, y) = (φ1 (u, v), φ2 (u, v))? Find the differential matrix ∂φ1 ∂φ1 ∂u ∂v ∂φ2 . ∂φ2 ∂u
∂v
i
i i
i
i
i
i
138
“muldown 2010/1/10 page 138 i
Chapter 4. Differentiation 0f Functions of Several Variables
4.70. Suppose φi (u1 , . . . , un , x1 , . . . , xn ) = 0, i = 1, . . . , n, is satisfied by uj = uj (x1 , . . . , xn ), the φ’s and u’s being C 1 functions. Prove that ∂(φ1 , . . . , φn ) ∂(φ1 , . . . , φn ) ∂(u1 , . . . , un ) · = (−1)n . ∂(u1 , . . . , un ) ∂(x1 , . . . , xn ) ∂(x1 , . . . , xn ) [Hint: Use the Chain Rule and think matrices.] 4.71. If u2 + v 2 + 2xuv + y = 0, uv + (u + v)y + x2 = 0, prove that ∂(u, v) uv(u + v) − x = . ∂(x, y) (u − v)[(u + v) + y(1 − x)] 4.72. If u1 = x1 + x2 + x3 + x4 , u1 u2 = x2 + x3 + x4 , u1 u2 u3 = x3 + x4 , and u1 u2 u3 u4 = x4 , show that ∂(x1 , x2 , x3 , x4 ) = u31 u22 u3 . ∂(u1 , u2 , u3 , u4 ) 4.73. Do the following (a) If V = ψ(u, v), φ(u, v) = E(x, y), χ(u, v) = F (x, y), and ∂(φ, χ)/∂(u, v) 6= 0. Prove ∂E ∂(ψ, χ) ∂F ∂(φ, ψ) ∂V ∂(φ, χ) · = + · . ∂x ∂(u, v) ∂x ∂(u, v) ∂x ∂(u, v) (b) If V = u2 + v 2 + uv, u + v = x2 + y 2 , u3 + v 3 = 2xy, prove that 3
2x(x2 − y 2 ) ∂V + 8y(x2 + y 2 ). = ∂y (x2 + y 2 )2
4.74. If V = ψ(u1 , . . . , un ), φk = (u1 , . . . , un ) = fk (x1 , . . . , xm ) k = 1, . . . , n, and ∂(φ1 , . . . , φn )/∂(u1 , . . . , un ) 6= 0, show that n
∂V ∂(φ1 , . . . , φn ) X ∂fk ∂(φ1 , . . . , φk−1 , ψ, φk+1 , . . . , φn ) · · = . ∂xj ∂(u1 , . . . , un ) ∂xj ∂(u1 , . . . , un ) k=1
4.75. If F ∈ C 1 show that the equation F (F (x, y), y) = 0 may be solved for y as a function of x near (0, 0) provided F (0, 0) = 0, Fx (0, 0) 6= −1 and Fy (0, 0) 6= 0. 4.76. Suppose f (x, y) = 0 where f ∈ C 2 and ∂f /∂y 6= 0. Prove that ∂f ∂f 0 ∂x ∂y 1 ∂f d2 y ∂2f ∂ 2 f = 3 ∂x ∂x2 ∂x∂y . dx2 ∂f ∂f 2 ∂ f ∂2 f ∂y 2 ∂y
∂y∂x
∂y
i
i i
i
i
i
i
4.5. Implicit Function Theorem
4.5.1
“muldown 2010/1/10 page 139 i
139
Dimension
We have a notion of dimension for linear objects, specifically for vectors spaces, namely, the number of vectors in a basis. If L : Rk 7→ Rn is linear, L(x) = Ax, then the dimension of L(Rk ) ⊂ Rn is the rank of A. In particular, if rank A = k, then L(Rk ) ⊂ Rn is k-dimensional, the same dimension as Rk . Definition 4.67. A subset S of Rn is a k-dimensional segment, k > 0, if there is an open connected set D ⊂ Rk and a function f : Rk 7→ Rn such that (a) f ∈ C 1 (D) and f : D 7→ S is one-to-one and onto (i.e. f (D) = S), (b) rank f ′ (p) = k for all p ∈ D. A 0-dimensional segment in Rn is a single point in Rn . A subset S of Rn is a k-dimensional manifold if for every q ∈ S, there exists an open neighborhood V ⊂ Rn of q such that V ∩ S is a k-dimensional segment. Remarks: Note that k ≤ n always. A k-dimensional segment is automatically a k-dimensional manifold. If the condition of one-to-one is dropped in (a), then f is still locally one-to-one by Theorem 4.50, and so, f (U ) is a k-dimensional segment if U is a sufficiently small open subset of D. Example 4.68 The set S := {(x, y) : y = 3x, 0 < x < 1} is a 1-dimensional manifold (segment) in R2 . Indeed, let D = (0, 1) ⊂ R and define f (t) = (t, 3t), 0 < t < 1. Then f ′ (t) = [ 1 3 ] and rank f ′ = 1. Example 4.69 Example 4.70 The set S := {(x, y, z) : x2 +y 2 +z 2 = 1, is a 2-dimensional segment in R3 . In this case, consider
x > 0,
f (θ, φ) := cos(θ) sin(φ), sin(θ) sin(φ), cos(φ) = (x, y, z),
Then f ∈ C 1 (0, π/2) × (0, π/2) , f is one-to-one, and
− sin(θ) sin(φ) f ′ (θ, φ) = cos(θ) sin(φ) 0
Now rank f ′ (θ, φ) = 2 since for example π π 2 and 0 < φ < 2 .
∂(y,z) ∂(θ,φ)
y > 0, 0 0, at which the equations ∂f ∂f −x =0 ∂x ∂y ∂f ∂f z −y =0 ∂y ∂z ∂f ∂f x −z =0 ∂z ∂x
y
are satisfied. 4.93. (The H¨ older and Minkowski inequalities.) Let p > 1, q > 1 and xp p
(a) Show that the minimum of f (x, y) = x > 0, y > 0, and xy = 1 is 1. (b) Show that a ≥ 0, b ≥ 0 implies ab ≤
ap p
+ +
q
y q
1 p
+
1 q
= 1.
subject to the constraints
bq q .
(c) Show that ak ≥ 0, bk ≥ 0, k = 1, . . . , n implies n X
k=1
ak b k ≤
n X
apk
k=1
!1/p
n X
bqk
k=1
!1/q
(H¨older’s Inequality).
Pn Pn 1/p 1/q [Hint: Let A = ( k=1 apk ) and B = ( k=1 bqk ) and consider a = ak /A, b = bk /B.]
(d) If ak , bk , k = 1, . . . , n are real numbers and p ≥ 1, then n X
k=1
p
|ak + bk |
!1/p
≤
n X
k=1
p
|ak |
!1/p
+
n X
k=1
p
|bk |
!1/p
.
(Minkowski’s inequality) [Hint: |a + b|p = |a + b||a + b|p/q ≤ |a||a + b|p/q + |b||a + b|p/q if p > 1. Use H¨ older’s inequality.] References for Chapter IV R. G. Bartle: Chapter V. R. C. Buck: Chapter V.
i
i i
i
i
i
i
154
“muldown 2010/1/10 page 154 i
Chapter 4. Differentiation 0f Functions of Several Variables
i
i i
i
i
i
i
“muldown 2010/1/10 page 155 i
Chapter 5
Further Topics in Integration
5.1
Changes of Variables in Integrals
Every n × n matrix A can be expressed as a product of a finite number of matrices of the (block) form I O O A1 = O a O ← k-row O O I I O O O O O 1 O 1 O ← k-row A2 = O O I O O O 0 O 1 O ← j-row O O O O I I O O O O O 0 O 1 O ← k-row A3 = O O I O O . O 1 O 0 O ← j-row O O O O I
That is, every linear function from Rn to Rn can be expressed as a composition of linear functions of the form L1 (x1 , . . . , xk−1 , xk , xk+1 , . . . , xn ) = (x1 , . . . , xk−1 , axk , xk+1 , . . . , xn ) L2 (x1 , . . . , xk−1 , xk , xk+1 , . . . , xn ) = (x1 , . . . , xk−1 , xk + xj , xk+1 , . . . , xn ) L3 (x1 , . . . , xk−1 , xk , . . . , xj−1 , xj . . . , xn ) = (x1 , . . . , xk−1 , xj , . . . , xj−1 , xk , . . . , xn ) We shall not prove this statement. You should prove it yourself as an exercise; it is obviously true for n = 1, use induction on n. But first prove it for n = 2 to see how the general proof should go. 155
i
i i
i
i
i
i
156
“muldown 2010/1/10 page 156 i
Chapter 5. Further Topics in Integration
Q Recall the concepts of an interval in Rn , I = nj=1 [aj , bj ]; the Jordan content Q n of an interval in Rn , µ(I) = j=1 (bj − aj ); and the diameter of an interval I in Rn , P 1/2 n 2 . A special case of an interval is an n-cube λ(I) = j=1 (bj − aj ) Definition 5.1. If for the interval I, bj − aj = a, j = 1, . . . , n, then I is an bn −an 1 . n-cube of side a and center b1 −a 2 ,..., 2
Our immediate goal is to look at sets with content and to investigate what happens to the content after mapping by nice functions. We begin with investigating the content of intervals under linear functions, and get progressively more sophisticated. This is background for change of variables in integration. Lemma 5.2. If I is an interval in Rn and φ : Rn 7→ Rn is linear, then µ (φ(I)) = |Jφ |µ(I).
Proof. The matrices Aj are the Jacobian matrices for the linear functions Lj , j = 1, 2, 3, respectively. It is easy to determine what these matrices do to intervals I. Only the linear function L1 changes the content by changing the length of one side. Indeed, both L1 (I) and L3 (I) are again intervals. For L1 (I), the k-th interval becomes either [aak , abk ] (a > 0), or [abk , aak ] (a < 0), and the content is µ(L1 (I)) = |a|µ(I). For L3 (I), the intervals of the k-th and j-th coordinates are swapped, and thus, the overall content, namely the product of the lengths of the intervals, remains unchanged. Since L2 (I) is no longer an interval, we need to use the theory developed in Chapter 3. In passing from I to L2 (I) only the description of the k-th coordinate changes n L2 (I) = p = (x1 , . . . , xn ) : am ≤ xm ≤ bm , m = 1, . . . , k − 1, k + 1, . . . , n, o ak + xj ≤ xk ≤ bk + xj .
Let χL2 (I) denote the characteristic function of L2 (I). Chose an interval K ⊂ Rn so that L2 (I) ⊂ K. Let K2 be the 2-dimensional subinterval for the variables xj and xk , and let Kn−2 be the subinterval for the remaining variables. Then by the definition of content and Theorem 3.29, we have Z Z Z µ(L2 (I)) = χL2 (I) = χL2 (I) K
=
Z
In−2
Z
bj
aj
Kn−2
K−2
Z
1 dxk
bk +xj
ak +xj
= (bj − aj )(bk − ak )
Z
!
dxj
!
(Corollary 3.26)
1 = µ(I). In−2
i
i i
i
i
i
i
5.1. Changes of Variables in Integrals
“muldown 2010/1/10 page 157 i
157
Thus, the Lemma holds for the elementary linear functions. Since any linear function is a composition of finitely many elementary linear functions, the differential is the product of the differentials, and hence the Jacobian is the product of the Jacobians, the general case follows. The next three lemmas and Theorem 5.7 that follows are technical in nature. In view of the last lemma, you will find it easy to accept Theorem 5.7, so on your first reading, you may proceed directly to the important change of variables theorem, Theorem 5.8; however, read the statement of Theorem 5.7 first. Lemma 5.3. Suppose that φ : G ⊂ Rn 7→ Rn , is C 1 (G) on the open set G. Then D ⊂ G,
D compact and µ(D) = 0 =⇒ µ(φ(D)) = 0.
In particular, this holds if φ is a linear map of rank n. Proof.
If ǫ > 0, there is a finite collection of n-cubes Ij , j = 1, . . . , m, such that Pm D ⊂ ∪m j=1 Ij ⊂ G and j=1 µ(Ij ) < ǫ (Why?) φ ∈ C 1 (G) =⇒ φ ∈ C 1 ∪m j=1 Ij .
Thus, the partial derivatives of φ are continuous on the compact set ∪m j=1 Ij . Therefore, there exists M > 0 such that |Dφ(p)(u)| ≤ M |u|,
∀ p ∈ ∪m j=1 Ij ,
∀ u ∈ Rn .
From the Mean Value Theorem 4.29 applied to φ, we find |φ(p) − φ(q)| ≤ M |p − q|,
∀ p, q ∈ Ij , j = 1, . . . , m, √ ≤ M λ(Ij ) = nM µ(Ij )1/n , (Ij is an n-cube). √ Therefore, the diameter of φ(Ij ) is at most nM µ(Ij )1/n , and so we can find an n-cube Kj with n √ φ(Ij ) ⊂ Kj and µ(Kj ) ≤ 2 nM µ(Ij ), j = 1, . . . , m.
Consequently,
φ(D) ⊂ ∪nj=1 Kj
and µ(φ(D) ≤
n X j=1
n √ µ(Kj ) ≤ 2 nM ǫ.
It follows that φ(D) has content zero. Lemma 5.4. Suppose G ⊂ Rn is an open set and φ : G 7→ Rn satisfies φ ∈ C 1 (G)
and
Jφ (p) 6= 0,
∀ p ∈ G.
i
i i
i
i
i
i
158
“muldown 2010/1/10 page 158 i
Chapter 5. Further Topics in Integration
If D is a compact subset of G and D has content, then φ(D) is a compact set with content. Proof. We know that φ ∈ C(D) and D compact implies φ(D) is compact (Theorem 2.36). In particular, ∂D ⊂ D implies that the boundary of φ(D) is contained in φ(D). By Theorem 4.50 and Theorem 4.53, φ is locally one-to-one on G and maps open sets onto open sets. Therefore, both
φ(G)
and φ(G)\φ(D)
are open sets.
Now, if φ(p) = q is on the boundary of φ(D), then each neighborhood V of q contains a point q1 ∈ φ(G)\φ(D), which implies that each neighborhood U of p contains a point in G\D. Therefore, p is in the boundary of D and we have shown ∂(φ(D)) ⊂ φ(∂(D)).
(5.1)
The set D has content if and only if the content of its boundary is zero (Theorem 3.14). Therefore µ(∂D) = 0 =⇒ µ(φ(∂(D))) = 0, =⇒ µ(∂(φ(D))) = 0, =⇒ φ(D)
Lemma 5.3 by (5.1)
has content. (Theorem 3.14)
Lemma 5.5. Suppose that K is the n-cube of side 2r centered at O, i.e K = [−r, r] × . . . × [−r, r] . {z } | n times
Let G be an open set in Rn containing K and suppose we have a function ψ : G 7→ Rn such that ψ ∈ C 1 (G) and Jψ (p) 6= 0, ∀ p ∈ K. √ Then for α, 0 < α < 1/ n, |ψ(p) − p| < α|p|,
√ n √ n µ(ψ(K)) ∀ p ∈ K =⇒ 1 − α n < < 1+α n . µ(K)
Proof. From the proof of Lemma 5.4, (5.1), √ we know that ∂(ψ(K)) ⊂ ψ(∂(K)). If p = (x1 , . . . , xn ) ∈ ∂(K), then r ≤ |p| ≤ n r, and consequently, √ |ψ(p) − p| < α|p| ≤ α n r √ √ √ √ =⇒ r(1 − α n) ≤ |xj | − α n r ≤ |ψj (p)| ≤ |xj | + α n r ≤ r(1 + α n), |{z} some j
where the left most inequality is true for at least √ one j. Therefore, the boundary n)r with center O and inside an of ψ(K) lies outside an n-cube of side 2(1 − α √ n-cube of side 2(1 + α n)r with center O. Since µ(K) = (2r)n , this implies √ n √ n µ(ψ(K)) < 1+α n . 1−α n < µ(K)
i
i i
i
i
i
i
5.1. Changes of Variables in Integrals
“muldown 2010/1/10 page 159 i
159
ϕ (K)
K
p Figure 5.1. The cube K is mapped to√a region φ(K) √ whose boundary is captured between two cubes of side lengths 1 − α n and 1 + α n. We next need a stronger version of Lemma 5.2. Lemma 5.6. Suppose that L : Rn 7→ Rn is a linear map of full rank, and that ψ : Gopen ⊂ Rn 7→ Rn is in C 1 (G) and Jψ (p) 6= 0 for all p ∈ G. Then for any interval I ⊂ G, µ (L(ψ(I))) = |JL |µ(ψ(I)). Proof. K,
By Lemma 5.4, ψ(I) has content. Therefore, for any interval K with ψ(I) ⊂ Z χψ(I) . µ(ψ(I)) = K
By the Cauchy Criterion for integrals, Corollary 3.7, and by the proper choice of Riemann sums, given any ǫ > 0, there are finitely many intervals with nonoverlapping interior, I1 , . . . , Im , Im+1 , . . . , IN such that N ∪m j=1 Ij ⊂ ψ(I) ⊂ ∪j=1 Ij
and µ(ψ(I)) − ǫ ≤ Then
m X j=1
µ(Ij ) ≤ µ(ψ(I)) ≤
N X j=1
µ(Ij ) ≤ µ(ψ(I)) + ǫ.
N L ∪m j=1 Ij ⊂ L (ψ(I)) ⊂ L ∪j=1 Ij
and we have from Lemma 5.2
µ L ∪m j=1 Ij =⇒
m X j=1
≤ µ (L (ψ(I))) ≤ µ L( ∪N j=1 Ij
|JL |µ(Ij ) ≤ µ (L (ψ(I))) ≤
N X j=1
|JL |µ(Ij )
,
=⇒ |JL | (µ(ψ(I)) − ǫ) ≤ µ (L (ψ(I))) ≤ |JL | (µ(ψ(I)) + ǫ) . Since ǫ > 0 is arbitrary, the lemma follows.
i
i i
i
i
i
i
160
“muldown 2010/1/10 page 160 i
Chapter 5. Further Topics in Integration
Theorem 5.7 (The Jacobian Theorem). Suppose D is a compact subset of an open set G ⊂ Rn and φ : G 7→ Rn ,
φ ∈ C 1 (G)
with
Jφ (p) 6= 0,
∀p ∈ G.
Then, given ǫ > 0, there is a γ(ǫ) > 0 such that for any n-cube, K, with center p ∈ D and side length less than 2γ, |Jφ (p)| (1 − ǫ)n ≤
µ(φ(K)) ≤ |Jφ (p)| (1 + ǫ)n . µ(K)
−1
Proof. Since Jφ (p) 6= 0, (Dφ(p)) =: Lp exists and is linear. Moreover, det[Lp ] = 1/Jφ (p). Since D is compact and φ ∈ C 1 (D), the partial derivatives of φ are uniformly continuous on D and we can deduce the following: (a) There is an M > 0 such that |Lp (u)| ≤ M |u|,
∀p ∈ D
and ∀u ∈ Rn . (Why?)
(b) If ǫ > 0 is given, there is a δ = δ(ǫ) > 0 (independent of p) such that (p ∈ D
and |u| < δ) =⇒ |φ(p + u) − φ(p) − Dφ(p)(u)| ≤
ǫ √ |u|. M n
Indeed, for any p + u ∈ B(p, δ) ⊂ G centered on p, by the Mean Value Theorem (Theorem 4.29), φ(p + u) − φ(p) = Dφ(c)(u) for some c on the line segment joining p + u and p. The inequality follows for suitable δ by the uniform continuity of the partial derivatives when |p − c| < δ. For fixed p, define ψ(u) = Lp (φ(p + u) − φ(p)). Then
ǫ |ψ(u) − u| = |ψ(u) − Lp (Dφ(p))(u)| ≤ √ |u|, n √ Applying Lemma 5.5 with α = ǫ/ n to ψ, we find (1 − ǫ)n ≤
by (a) and (b).
µ(ψ(K)) ≤ (1 + ǫ)n . µ(K)
e But, by Lemma 5.6 applied to φe and K − p where φ(u) = φ(p + u) − φ(p), and from the fact that content is invariant under translation, we find e − p)) = | det[Lp ]|µ(φ(K e − p)) µ(ψ(K)) = µ(Lp (φ(K =
µ(φ(K)) µ(φ(K) − φ(p)) = |Jφ (p)| |Jφ (p)|
=⇒ |Jφ (p)| (1 − ǫ)n ≤
µ(φ(K)) ≤ |Jφ (p)| (1 + ǫ)n . µ(K)
Theorem 5.8 (Change of Variable Theorem). Suppose that
i
i i
i
i
i
i
5.1. Changes of Variables in Integrals
“muldown 2010/1/10 page 161 i
161
(i) G is an open subset of Rn ; (ii) φ : G 7→ Rn , φ ∈ C 1 (G), Jφ (p) 6= 0 for all p ∈ G; (iii) D is a compact, connected subset of G, D has content, and φ is one-to-one on D. Then φ(D) has content and for any f , f : φ(D) 7→ R, f ∈ C(φ(D)), we have Z Z (f ◦ φ) |Jφ |. f= φ(D)
D
Proof. The set φ(D) has content by Lemma 5.4. Without loss of generality, we may assume both f ≥ 0 and Jφ ≥ 0 (WHY?). By Theorem 3.12 both of the intergrals in the theorem exist. Therefore, it only remains to show that they are equal. For any ǫ, 1 > ǫ > 0, we may choose a partition on D consisting of n-cubes Kj , j = 1, . . . , m, which are sufficiently small so that R Pm (a) D (f ◦ φ)Jφ − j=1 (f ◦ φ)(qj )Jφ (pj )µ(Kj ) < ǫ, for any qj ∈ Kj and pj being the center of Kj ; (b) Jφ (pj )(1 − ǫ)n ≤
µ(φ(Kj )) ≤ Jφ (pj )(1 + ǫ)n , µ(Kj )
(item (a) by the definition of the integral and the continuity of Jφ (p), and item (b) by Theorem 5.7).
D
ϕ (D)
Figure 5.2. A partition of D and the distortion under the mapping φ. Now on the other hand Z m Z X f f= φ(D)
j=1
(since φ is one-to-one)
φ(Kj )
i
i i
i
i
i
i
162
“muldown 2010/1/10 page 162 i
Chapter 5. Further Topics in Integration
= =
m X
j=1 m X
f (p∗j )µ(φ(Kj )),
p∗j ∈ φ(Kj )
where qj ∈ Kj .
f (φ(qj ))µ(φ(Kj )),
j=1
Therefore from item (b) above, we have Z m X n f (φ(qj ))Jφ (pj )µ(Kj )(1−ǫ) ≤ (c) j=1
(Theorem 3.15(e))
m X
f≤
φ(D)
f (φ(qj ))Jφ (pj )µ(Kj )(1+ǫ)n .
j=1
Since ǫ > 0 is arbitrary, (a) and (c) together imply Z Z f. (f ◦ φ)Jφ = φ(D)
D
Remarks: Strictly speaking, the use of the Mean Value Theorem for Integrals, Theorem 3.15(e), is not applicable to those Kj which intersect the boundary of D. But this presents no problem (why ?). You will notice that Theorem 5.8 is much more restrictive than the corresponding result in R1 , Corollary 3.19, which states that Z
φ(b)
f=
φ(a)
Z
b
a
(f ◦ φ)φ′
without the restrictions that φ′ 6= 0 and φ being one-to-one. These restrictions are necessary in Rn because of considerations derived from the notion of “orientation” which will be discussed later. Example 5.9 Compute the area of the region (x, y) : 0 ≤ y, 0 < r2 ≤ x2 + y 2 ≤ R2 ⊂ R2 .
Let (x, y) = φ(u, v) = (v cos(u), v sin(u)). Then ∂(x, y) −v sin(u) cos(u) Jφ = = det . v cos(u) sin(u) ∂(u, v)
Therefore, |Jφ | = v and by Theorem 5.8 Z Z µ(φ(D)) = χφ(D) = (χφ(D) ◦ φ)|Jφ | φ(D)
= =
Z
χD |Jφ | =
D Z R r
πv dv =
Z
r
D
R
Z
π
v du dv
0
1 π R2 − r 2 . 2
i
i i
i
i
i
i
5.1. Changes of Variables in Integrals
“muldown 2010/1/10 page 163 i
163
y
v R r x r
π
R
u
Figure 5.3. (a) The regions D and φ(D) for Example 5.9 Alternatively, you may write this as Z Z ∂(x, y) du dv. 1 1 dx dy = ∂(u, v) D
φ(D)
Remark: The conditions “Jφ 6= 0” and “φ is one-to-one” etc, may fail to hold on a set S ⊂ D without affecting the result of Theorem 5.8 provided µ(S) = µ(φ(S)) = 0. All the integrals involved may be approximated as closely as we please by integrals over regions for which the conditions do hold. For instance, in Example 5.9, we may take r = 0 and get the correct formula for the area of the semicircle even though Jφ = 0 on the u-axis and φ maps any segment of that axis onto the point (0, 0) in the (x, y)-plane. The details are assigned as Exercise 4. Example 5.10 Find the area of the region in the (x, y)-plane bounded by the curve r = a + b cos(θ), 0 < b < a, where x = r cos(θ), y = r sin(θ). Now 9
8
8
6
7
4
6
2
5 0 4 −2
3
−4
2
−6
1
−8 −5
0 0
5
0
2
4
6
Figure 5.4. The region D in the (θ, r)-plane and its image, the inside of a cardioid in the (x, y)-plane for Example 5.10.
(x, y) = φ(r, θ) = (r cos(θ), r sin(θ)),
Jφ =
∂(x, y) cos(θ) = sin(θ) ∂(r, θ)
−r sin(θ) = r. r cos(θ)
i
i i
i
i
i
i
164
“muldown 2010/1/10 page 164 i
Chapter 5. Further Topics in Integration
Therefore, Z Z dx dy = φ(D)
Z 2π Z a+b cos(θ) ∂(x, y) dr dθ = r dr dθ 0 0 D ∂(r, θ) Z Z 1 2π 1 2π 2 = (a + b cos(θ))2 dθ = (a + 2ab cos(θ) + b2 cos2 (θ)) dθ 2 0 2 0 Z 1 1 1 2π a2 + b2 (1 + cos(2θ)) dθ = π a2 + b2 . = 2 0 2 2
Here again Jφ = 0 on the θ-axis (r R= 0) andRφ is not one-to-one on the lines θ = 0, θ = 2π, and r = 0. However, φ(D) and D both exist and Theorem 5.8 is applicable to regions which difffer from φ(D) and D respectively by regions which have arbitrarily small content. Example 5.11 The function (x, y) = φ(u, v) = (u2/3 v 1/3 , u1/3 v 2/3 ) maps the triangle D := {(u, v) : 0 ≤ u, 0 ≤ v, u + v ≤ 1} onto the area bounded by the loop of the curve x3 + y 3 = xy. Observe that Jφ = 31 and that φ maps the u and v axes 1
1.5
1 0.5 0.5 0 0
−0.5 −0.5
0
0.5
1
−0.5 −0.5
0
0.5
1
1.5
Figure 5.5. The triangle D in the (u, v) plane is transformed to the inside of the loop of the curve x3 + y 3 = xy in Example 5.11 onto (0, 0). If f is continuous on φ(D), then Z Z (f ◦ φ)|Jφ |. f= φ(D)
D
For example, if f (x, y) = xy, then (f ◦ φ)(u, v) = (u2/3 v 1/3 )(u1/3 v 2/3 ) = uv; hence Z Z Z Z 1 1 1−v 1 1 1 f= . (1 − v)2 dv = uv du dv = 3 0 0 6 0 72 φ(D) Why is this valid? φ is not C 1 at (0, 0) and φ is not one-to-one on the u and v axes. The first three examples illustrated how the change of variable formula may be used to simplify the region of integration. It may also be used to simplify the integrand, which was its basic role in one-dimension.
i
i i
i
i
i
i
5.1. Changes of Variables in Integrals Example 5.12 Find Z x−y dx dy exp x+y D∗
“muldown 2010/1/10 page 165 i
165
when D∗ := {(x, y) : 0 ≤ x, 0 ≤ y, x + y ≤ 1}.
Let (u, v) = (x − y, x + y) = φ−1 (x, y). Then
v
y
x+y=1
D u=v
u=−v D* x 0
u
0
Figure 5.6. The tip down triangle, D, in the (u, v)-plane is transformed to the right triangle D∗ in Example 5.12. D = φ−1 D∗ , D∗ = φ(D), u+v v−u = φ(u, v), , (x, y) = 2 2 1 1 ∂(x, y) 2 1 Jφ = = 1 12 = . ∂(u, v) 2 − 2
Alternatively,
Jφ−1
∂(u, v) 1 = = ∂(x, y) 1
2
1 −1 = 2 =⇒ Jφ = , 1 2
so in fact it was not necessary to find φ. Then Z
exp D∗
x−y x+y
dx dy =
Z
D
exp
u 1 v
2
du, dv =
1 2
Z 1 1 u/v u=v 1 = ve dv = 2 0 2 u=−v 1 1 1 = e− = sinh(1). 4 e 2
Z
1
0 Z 1 0
Z
v
exp
u
v 1 dv v e− e −v
du, dv
See also the examples in Buck pp 306-311 and the exercises on p311-313. The particular exercises 10-12 indicate a proof of Theorem 5.8 based on the Implicit Function Theorem when n = 2.
i
i i
i
i
i
i
166
“muldown 2010/1/10 page 166 i
Chapter 5. Further Topics in Integration
Exercises 5.1. For f1 (x, y) = Z
f1
x2 a2
+
and
D
y2 b2
and f2 (x, y) = x2 + y 2 , find Z y2 x2 f2 where D = (x, y) : 2 + 2 ≤ 1 . a b D
[Solution: πab/2, πab(a2 + b2 )/4.] 5.2. Let f be a real valued continuous function on R2 . Show that Z 1 Z x Z 1/2 Z 1−η f (x, y) dy dx = 2 f (ξ + η, ξ − η) dξ dη. 0
0
η
0
5.3. Do the following: (i) Show that the ball S = {(x, y, z) : x2 + y 2 + z 2 ≤ a2 } has content M = 4πa3 /3 in R3 . [Hint: Let x = r sin(φ) cos(θ), y = r sin(φ) cos(θ), and z = r cos(φ), for 0 ≤ r ≤ a, 0 ≤ θ ≤ 2π, 0 ≤ φ ≤ π.] (ii) Let S be as in part (i) and show Z 2M a2 Iz := (x2 + y 2 ) dx dy dz = . 5 S
[Iz is the moment of inertia of a uniform spherical p mass distribution (total mass = M ) about a diameter. The quantity 2a2 /5 is called the radius of gyration about a diameter.]
5.4. (The case of the vanishing Jacobian) Suppose φ : Rn 7→ Rn is of class C 1 on an open set G ⊂ Rn and that Jφ 6= 0 and φ is one-to-one except on a set K with content zero. Suppose D ⊂ G is compact and has content, and f is a real valued continuous function on φ(D). Show that φ(D) has content and Z Z f= (f ◦ φ)|Jφ |. φ(D)
D
[Hint: For a given ǫ, enclose K in the union U of a finite number of n-cubes such that µ(U ) < ǫ. Apply Theorem 5.8 to D\U.] 5.5. Show that Z −1/2 1 (x − y)2 + 2(x + y) + 1 dx dy = 2 log(2) − 2 K where K is the triangle bounded by x = 0, y = 2, and x = y. [Hint: x = u(1 + v), y = v(1 + u).] R R 5.6. Evaluate D1 f and D2 f where f (x, y) =
1 (1 + x2 + y 2 )2
i
i i
i
i
i
i
5.1. Changes of Variables in Integrals
“muldown 2010/1/10 page 167 i
167
and (i) D1 is the region bounded by one loop of the lemiscate (x2 + y 2 )2 − (x2 − y 2 ) = 0;
√ and (ii) D2 is the triangle with vertices (0, 0), (2, 0), (1, 3). [Solution: π4 − 12 √ and 23 arctan(1/2).] 5.7. Show that Z y2 z2 x2 a 2 b 2 c2 where K := (x, y, x) : 2 + 2 + 2 ≤ 1 . |xyz| dx dy dz = 6 a b c K 5.8. Let ψ(x, y) = Ax2 + 2Hxy + By 2 + 2Gx + 2F y + C. (i) Show Z 1 ψ = πab(Aa2 +Bb2 +4C) where 4 K
K :=
y2 x2 (x, y, ) : 2 + 2 ≤ 1 . a b
Save yourself some work by using symmetry to show Z Z Z y dx dy = 0. x dx dy = xy dx dy = (ii) Show
R
H
(iii) Show R
S
K
K
K
ψ = 14 πab(Aa2 + Bb2 + 4ψ(x0 , y0 )) where n o 2 2 0) 0) H := (x, y, ) : (x−x + (y−y ≤1 . a2 b2
(2x2 + y 2 + 3x − 2y + 4) dx dy = 2
57π 2
where
2
S is inside x + 4y − 2x + 8y + 1 = 0.
(iv) If the plane ℓx + my + nz = p intersects the elliptic paraboloid y2 z x2 + 2 =2 , 2 a b c show that the volume of the solid bounded by these two surfaces is πab 2 2 (a ℓ + b2 m2 + 2pcn)2 . 4c3 n4 5.9. Find Z
D
log(x2 + y 2 ) dx dy
where D := {(x, y) : b2 ≤ (x2 + y 2 ) ≤ a2 }
and show that the limit as b → 0 of this expression is 2πa2 (log(a) − 21 ).
i
i i
i
i
i
i
168
“muldown 2010/1/10 page 168 i
Chapter 5. Further Topics in Integration
R 5.10. Show that D x3 y 2 dx dy = 248 52 , where D is the region bounded by the lines y = ±3(x − 2), y = ±3(x − 4). [Hint: Move the origin to the center of the region and use symmetry as much as you can.] 5.11. Show that the area of the region in the first quadrant bounded by y 3 = a1 x2 , 2
b31 ,
y 3 = a2 x2 , 2
a1 > a2 > 0,
and
b32 ,
xy = b1 > b2 ≥ 0, is 7 15/7 −1/7 −1/7 15/7 . Area = a2 − a1 b1 − b2 5 xy =
5.12. Show that the area of the region bounded by the loop of the curve x3 + y 3 = 3axy is 3a2 /2. 5.13. The transformation x = um φ(v), y = un ψ(v) maps the rectangle [u1 , u2 ] × [v1 , v2 ] into a region in the (x, y)-plane. The area of this region is Rv ′ −nψφ′ (um+n if m 6= −n and − um+n ) v12 mφψm+n , 1 2
|m(φ(v1 )ψ(v1 ) − φ(v2 )φ(v2 )) log(u1 /u2 )|,
if m = −n.
What conditions must φ and ψ satisfy if this statement is correct? Prove the statement under your conditions. R 5.14. Find D xyz dx dy dz where D is the region in the first octant bounded by the cylinder x2 + y 2 = 16 and the plane z = 3, that is, D = (x, y, z) : 0 ≤ x, 0 ≤ y, 0 ≤ z ≤ 3, x2 + y 2 ≤ 16 .
5.15. Let G be the region in the first quadrant which R is bounded by the curves xy = 1, xy = 3, x2 − y 2 = 1, x2 − y 2 = 4. Find G F where F (x, y) = xy. 5.16. Show Z 1 φ = pq(ap2 + bq 2 ) + 2pqφ(x0 , y0 ) 3 D where D is the region bounded by the four straight lines y − y0 x − x0 ± = ±1 (p.q > 0) p q and φ(x, y) = ax2 + 2hxy + by 2 + 2gx + 2f y + c. 5.17. Show the following: (i) If X = x + y + z + u, XY = y + z + u, XY Z = z + u, XY ZU = u. show that ∂(x, y, z, u) = X 3 Y 2 Z. ∂(X, Y, Z, U ) (ii) Show that Z
0
1
Z
0
1−u
Z
1−z−u 0
Z
1−y−z−u
(x + y + z + u)n xyzu dx dy dz du =
0
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces Z
1
X n+7 dX
0
Z
1
0
169
Y 5 (1 − Y ) dY
Z
0
1
Z 3 (1 − Z) dZ
Z
0
1
U (1 − U ) dU = =
5.18. Which is larger
5.2
R1 0
xx dx or
R1R1 0
0
“muldown 2010/1/10 page 169 i
1 . (n + 8)7!
(xy)xy dx dy?
Integration on Curves and Surfaces
To extend the notion of integration on subsets of Rn as introduced in Chapter III to subsets of k-dimensional objects in Rn , it is necessary to introduce a definition of content for these objects. Rather than deal with integration on general k-dimensional objects, we will restrict our attention to 1- and 2-dimensional objects, i.e. curves and surfaces. First, a manifold segment is too general a concept for which to hope to define content; for example, the graph of f (x) = sin(1/x), x ∈ (0, 1), or the graph on the right in Figure 5.7, could not be expected to have finite length. So we should probably restrict our attention to the maps of fairly well-behaved compact subsets of the parameter space. On the other hand, the smoothness requirement on manifold segments is somewhat too restrictive in the sense that we should be able to assign a length to objects like those in the picture on the left in Figure 5.7. We shall therefore consider objects which behave like “nice” manifold
1
1
a
0
−1
Figure 5.7. The function on the right has graph that are equilateral triangles with base on [1/(k + 1), 1/k], k = 1, . . ., alternating up and down. Since the height is one, the length of the graph of each triangle is greater than 2. Hence, the total length cannot be finite. The graph on the left is piecewise smooth and intuitively does have lenght. segments “piecewise”.
5.2.1
Curves (1-surfaces)
Definition 5.13. Let U be an open connected subset of R and γ : U 7→ Rn ,
γ(t) = (x1 (t), . . . , xn (t)),
n ≥ 1.
i
i i
i
i
i
i
170
“muldown 2010/1/10 page 170 i
Chapter 5. Further Topics in Integration
(i) If γ ∈ C(U ), then γ is a curve on U in Rn . (ii) If γ ∈ C 1 (U ) and rank γ ′ (t) = 1 for all t ∈ U , then γ is a smooth curve on U in Rn : ′ x1 (t) n X .. ′ x′i (t)2 > 0, ∀t ∈ U. ⇐⇒ γ (t) = . , γ smooth x′n (t)
i=1
(iii) Two smooth curves γ and γ ∗ on U and U ∗ respectively are called parametrically equivalent if there is a function f : U ∗ 7→ U , f ∈ C 1 (U ∗ ), such that f ′ (t) > 0 and γ ∗ (t) = γ(f (t)) for all t ∈ U ∗ . (iv) γ(U ) is the trace of the curve γ in Rn . (v) A curve γ : U 7→ Rn is piecewise smooth if (a) γ is continuous on U (b) Each compact subinterval of U is the union of a finite number of intervals I such that γ : I ◦ 7→ Rn is a smooth curve and |γ ′ | is Riemann integrable on I. We shall refer informally to the “curve” and its “trace” as “the curve”. We will see that this is unambiguous in the integration theory for parametrically equivalent curves. It is perhaps even more precise to consider curves as equivalence classes of functions γ : R 7→ Rn , the equivalence relation being parametric equivalence. However, you might consider this idea of a curve as too eccentric on your first encounter. Example 5.14 A line in R3 through a point p0 = (x0 , y0 , z0 ) may be parametrized by ( x = x + at, a 0 γ : y = y0 + bt, t ∈ R γ ′ (t) = b . z = z0 + ct, c
If a2 + b2 + c2 > 0 this is a smooth curve. The numbers (a, b, c) are called the direction numbers of γ. They are not unique in the sense that λ(a, b, c) = (λa, λb, λc) define a parametrically equivalent line if λ > 0. The values (ℓ, m, n) = √
a2
1 (a, b, c), + b 2 + c2
with
ℓ2 + m2 + n2 = 1,
are the direction cosines of γ. The triple (ℓ, m, n) are the cosines of the angles determined by the line and the directions (1, 0, 0), (0, 1, 0), (0, 0, 1) respectively. A line is determined by (i) a point (x0 , y0 , z0 ) and a direction (a, b, c), or (ii) two points (x0 , y0 , z0 ) and (x1 , y1 , z1 ). In the latter case, a direction is determined as (a, b, c) = (x1 − x0 , y1 − y0 , z1 − z0 ).
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 171 i
171
p + (a,b,c) o c po
b
a Figure 5.8. The line through p0 = (x0 , y0 , z0 ) in the direction (a, b, c). The direction (a, b, c) is orthogonal to (α, β, ξ) if 0 = (a, b, c) • (α, β, ξ) = aα + bβ + cξ. The set of points (x, y, z) in R3 such that (x − x0 , y − y0 , z − z0 ) is orthogonal to (a, b, c) is a plane through (x0 , y0 , z0 ) with normal (a, b, c). The equation of this plane is 0 = (x − x0 , y − y0 , z − z0 ) • (a, b, c) or a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.
γ :(a,b,c)
p o
(x,y,z)
Figure 5.9. The normal to the plane through p0 = (x0 , y0 , z0 ).
Example 5.15 A portion of the parabola y = x2 can be parametrized by γ:
( x = t,
y = t2
,0 < t < 1,
γ ′ (t) =
1 . 2t
Then x′ (t)2 + y ′ (t)2 = 1 + 4t2 > 0 so γ is a smooth curve.
i
i i
i
i
i
i
172
“muldown 2010/1/10 page 172 i
Chapter 5. Further Topics in Integration
y (t,t 2 ) x Figure 5.10. A parametrization og the parabola in Example 5.15. Example 5.16 Another example we’ve seen before is the helix: ( − sin(t) x = cos(t) γ : y = sin(t), t ∈ R, γ ′ (t) = cos(t) . 1 z=t Again x′ (t)2 + y ′ (t)2 = 2 > 0, so γ is a smooth curve.
(cos(t),sin(t),t)
Figure 5.11. The spiral in Example 5.16. We know that if γ ∈ C 1 , then γ(t0 ) + Dγ(t0 )(t − t0 ) is the best affine approximation to γ(t) near t0 . This motivates the following definition. Definition 5.17. Let γ : U ⊂ R 7→ Rn be a smooth curve. (i) The tangent line to the smooth curve γ at a point t0 is p(t) = γ(t0 ) + Dγ(t0 )(t − t0 ). For example, in R3 , the tangent to x = x(t) γ : y = y(t) , is z = z(t)
the curve x = x0 + x′ (t0 )(t − t0 ) y = y0 + y ′ (t0 )(t − t0 ) , z = z0 + z ′ (t0 )(t − t0 )
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 173 i
173
where x0 = x(t0 ), y0 = y(t0 ), z0 = z(t0 ). (ii) The vector v = γ ′ (t0 ) is the tangent vector of a smooth curve γ at a point t0 . The direction of a smooth curve γ at the point t0 is v γ ′ (t0 ) = ′ . |v| |γ (t0 )|
For example in R3 , the tangent vector is
1 p (x′ (t0 ), y ′ (t0 ), z ′ (t0 )) . ′ 2 ′ x (t0 ) + y (t0 )2 + z ′ (t0 )2
(iii) The normal plane at the point t0 for a smooth curve in R3 is
x′ (t0 )(x − x0 ) + y ′ (t0 )(y − y0 ) + z ′ (t0 )(z − z0 ) = 0 or x − x0 [ x′ (t0 ) y ′ (t0 ) z ′ (t0 ) ] y − y0 = 0. z − z0
Question: What corresponds to this normal plane for a smooth curve in R2 ?
γ
γ γ’
Figure 5.12. The plane norm to a curve γ as determined by the tangent vector γ ′ .
Remark: Two parametrically equivalent curves have the same direction. That is, if γ ∗ (t) = γ(f (t)) with f ′ > 0, then γ ∗ (t) γ ′ (f (t))f ′ (t) γ ′ (f (t)) = ′ = ′ . ∗ ′ |γ (t)| |γ (f (t))f (t)| |γ (f (t))| Definition 5.18. Let [a, b] ⊂ U ⊂ R and let γ : U 7→ Rn be a curve.
(i) The length of a smooth γ([a, b]) is defined as the quantity Z b Z bq ′ x′1 (t)2 + . . . + x′n (t)2 dt. ℓ(γ([a, b])) := |γ (t)| dt = a
a
Symbolically, we shall write dℓ = |γ ′ (t)| dt.
i
i i
i
i
i
i
174
“muldown 2010/1/10 page 174 i
Chapter 5. Further Topics in Integration
(ii) The curve γ is piecewise smooth on U if for any [a, b] there are nonoverlapping intervals of [a, b], [ai , bi ], such that γ is smooth on (ai , bi ). In this case. X ℓ(γ([ai , bi ])). ℓ(γ([a, b])) = i
Remark: If γ ∗ and γ are parametrically equivalent curves with f (a∗ ) = a and f (b∗ ) = b, then ℓ(γ ∗ ([a∗ , b∗ ])) = ℓ(γ([a, b])) since Z b∗ Z b∗ ∗′ ∗′ ∗ γ (f (t)) f ′ (t) dt γ (t) dt = ℓ(γ ) = a∗ b
=
Z
a
a∗
|γ ′ (t)| dt.
Example 5.19 The curve y = f (x) can be parametrized simply by x=t γ := with γ ′ (t) = (1, f ′ (t)). y = f (t), Then ℓ(γ([a, b])) =
Z
b a
p 1 + f ′ (t)2 dt.
Example 5.20 The circumference of a circle of radius a is the length of a smooth curve: Z 2π q x = a cos(t) γ := a2 sin2 (t) + a2 cos2 (t) dt = 2πa. ℓ(γ([0, 2π])) = y = a sin(t) 0
NOTE: ℓ(γ([0, 4π])) = 2ℓ(γ([0, 2π])) even though these 2 curves have the same trace.
Motivation and Remarks: By way of motivation for the definition of the length of a curve, we offer the following: (a) The length of the line segment p(t) = p0 + q0 t for t ∈ [t1 , t2 ] is Z t2 |p′ (t)| dt. |p(t2 ) − p( t1 )| = |q0 (t2 − t1 )| = t1
(b) Now consider the segment γ[a, b] and a partition (tk )m k=0 of [a, b]. Suppose τk ∈ [tk−1 , tk ], k = 1, . . . , m. One expects the sum of the lengths of the line segment p(t) = γ(τk ) + γ ′ (τk )(t − τk ), t ∈ [tk−1 , tk ], k = 1, . . . , m, to approximate the ‘length’ of γ[a, b] as closely as one pleases provided only that the partition is fine enough. This sum is m X
k=1
which is a Riemann sum for
|γ ′ (τk )|(tk − tk−1 ),
Rb a
|γ ′ (t)| dt.
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 175 i
175
Alternatively, the length of γ[a, b] is often defined to be sup
m nX k=1
o |γ(tk ) − γ(tk−1 )| : (tk ) partitions of [a, b] .
It is not difficult to see by the Mean Value Theorem, that this is equivalent to the
Figure 5.13. The two means to approximate length, length of tangent lines (top), and lengths of secant lines (bottom). Rb definition ℓ(γ[a, b]) = a |γ ′ (t)| dt if γ is C 1 . However, this alternative definition is more general in that it pertains to any curve for which the supremum exists as a real number; in particular, γ need not be C 1 .
5.2.2
Surfaces (2-surfaces)
Definition 5.21. Let U be an open connected subset of R2 and σ : U 7→ Rn ,
σ(u, v) = (x1 (u, v), . . . , xn (u, v)),
n ≥ 2.
(i) If σ ∈ C(U ), then σ is a surface (2-surface) on U in Rn . (ii) If σ ∈ C 1 (U ) and rank σ ′ (u, v) = 2 for all (u, v) ∈ U , then σ is a smooth surface on U in Rn : ∂x1 ∂x1 2 n ∂u ∂v X ∂(xi , xj ) .. and σ is smooth ⇐⇒ > 0. σ ′ = ... . ∂(u, v) ∂x ∂x i,j=1 n
∂u
n
∂v
(iii) Two smooth surfaces σ and σ ∗ on U and U ∗ respectively are called parametrically equivalent if there is a function f : U ∗ 7→ U with f ∈ C 1 (U ∗ ), Jf (p) > 0 for all p ∈ U ∗ , f is one-to-one from U ∗ onto U and σ ∗ (p) = σ(f (p)) for all p ∈ U ∗. (iv) σ(U ) is called the trace of the surface σ in Rn . Again, we will not worry excessively about distinguishing between a surface and its trace.
i
i i
i
i
i
i
176
“muldown 2010/1/10 page 176 i
Chapter 5. Further Topics in Integration
(v) A surface σ : U 7→ Rn is piecewise smooth if (a) σ is continuous on U (b) each compact Jordan measurable (has content) subset of U is the union of finitely many Jordan measurable subsets D such that D = D◦ , σ : D◦ 7→ Rn is a smooth surface, and the quantity 2 1/2 n X ∂(x , x ) i j ∂(u, v) i,j=1
is Riemann integrable on D.
Example 5.22 The following surface is a plane in R3 . The parametrization ( 1 1 x=u+v σ : y = u − v, (u, v) ∈ R2 , σ ′ = 1 −1 , 1 0 z=u
describes a smooth plane since rank σ ′ = 2 which may also be described as the solution set of x x + y − 2z = 0, or [ 1 1 −2 ] y = 0. z
From the latter equation, we see that the direction numbers of the normal vector are (1, 2, −2). Notice also that these direction numbers are in fact given by ∂(y, z) ∂(z, x) ∂(x, y) . , , ∂(u, v) ∂(u, v) ∂(u, v)
Example 5.23 The unit sphere in R3 , x2 + y 2 + z 2 = 1, may be parametrized by x = cos(θ) sin(φ) σ : y = sin(θ) sin(φ), (θ, φ) ∈ R2 . z = cos(φ) − sin(θ) sin(φ) cos(θ) cos(φ) σ ′ = cos(θ) sin(φ) sin(θ) cos(φ) , 0 − sin(φ)
for which rank σ ′ = 2 unless φ is an integer multiple of π. Hence, σ is piecewise smooth in regions not containing mφ, m ∈ Z.
Example 5.24 In Example 5.22 we had a particular plane through the origin in R3 . In this example, we take a longer look at the general form for a plane. Let
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 177 i
177
(x0 , y0 , z0 ), (a1 , a2 , a3 ) and (b1 , b2 , b3 ) be given triples of real numbers, and consider the parametrization ( x = x0 + a1 u + b1 v σ : y = y0 + a2 u + b2 v, (u, v) ∈ R2 , or z = z + a u + b3 v 0 3 x x0 a1 b 1 y = y 0 + a2 b 2 u . v z z0 a3 b 3
In vector form, (x, y, z) = (x0 , y0 , z0 ) + u(a1 , a2 , a3 ) + v(b1 , b2 , b3 ), this is a plane through the point (x0 , y0 , z0 ) provided the vectors (a1 , a2 , a3 ) and (b1 , b2 , b3 ) are linearly independent. Looking at the determinants of certain 2 × 2 submatrices of σ ′ , we define ∂(z, x) a3 b3 ∂(y, z) a2 b2 = = A, = = B, a3 b 3 ∂(u, v) ∂(u, v) a1 b1 ∂(x, z) a1 b1 = C. = a2 b 2 ∂(u, v)
Therefore, σ is a smooth surface if and only if
A2 + B 2 + C 2 > 0 ⇐⇒ a = (a1 , a2 , a3 ) 6= const b = const(b1 , b2 , b3 ). To identify the plane in the form given in Example 5.14, consider A(x − x0 ) + B(y − y0 ) + C(z − z0 ) = A(a1 u + b1 v) + B(a2 u + b2 v) + C(a3 u + b3 v) a 1 u + b 1 v a1 b 1 = a2 u + b2 v a2 b2 a 3 u + b 3 v a3 b 3 a1 a1 b 1 b 1 a1 b 1 = u a2 a2 b2 + v b2 a2 b2 = 0. a3 a3 b 3 b 3 a3 b 3
This is the plane through (x0 , y0 , z0 ) with normal ∂(y, z) ∂(z, x) ∂(x, y) (A, B, C) = . , , ∂(u, v) ∂(u, v) ∂(u, v) Notice that the normal vector is given by e1 e2 e3 a a2 b 2 + e2 3 a × b := a1 a2 a3 = e1 a1 a b 3 3 b1 b2 b3
a b3 + e3 1 a2 b1
b1 . b2
i
i i
i
i
i
i
178
“muldown 2010/1/10 page 178 i
Chapter 5. Further Topics in Integration
The best affine approximation to any surface σ(u, v) for (u, v) near (u0 , v0 ) is σ(u0 , v0 ) + Dσ(u0 , v0 )(u − u0 , v − v0 ); hence the following definition. Definition 5.25. For 2-surfaces (i) the tangent plane for a smooth surface σ at the point (u0 , v0 ) is p(u, v) = σ(u0 , v0 ) + Dσ(u0 , v0 )(u − u0 , v − v0 ). For example, in R3 , the tangent to the surface ∂x ∂x x + (u − u ) + 0 0 ∂u p ∂v p (v − v0 ) 0 0 x = x(u, v) ∂y ∂y y0 + ∂u (u − u0 ) + ∂v (v − v0 ) σ : y = y(u, v) is p0 p0 z = z(u, v) ∂z ∂z z0 + ∂u (u − u0 ) + ∂v (v − v0 ), p0
p0
where p0 = (u0 , v0 ), x0 = x(u0 , v0 ), y0 = y(u0 , v0 ), z0 = z(u0 , v0 ). In Example 5.24, this is the plane ∂(y, z) ∂(z, x) ∂(x, y) (x − x0 ) + (y − y0 ) + (z − z0 ) = 0. ∂(u, v) p0 ∂(u, v) p0 ∂(u, v) p0
(ii) the normal line to the surface σ at p0 in R3 has the direction numbers ∂(u, v) ∂(z, x) ∂(x, y) n= . , , ∂(u, v) p0 ∂(u, v) p0 ∂(u, v) p0 The unit normal direction is n1 = n/|n|.
Question: In R4 , what corresponds to the normal line discussed above for the surface in R3 ? If you cannot figure it out, ask.
Exercises 5.19. Show that two parametrically equivalent surfaces in R3 have the same unit normal n1 . 5.20. Let σ : R2 7→ Rn , (xi = xi (u, v), i = 1, . . . , n), be a smooth surface. Show that the vectors ∂xn ∂xn ∂x1 ∂x1 , v= , ,..., ,..., u= ∂u ∂u p0 ∂v ∂v p0 are tangent vectors to certain smooth curves in σ. Check that the condition rank σ ′ (p0 ) = 2 simply requires that u and v be linearly independent. Check that in the case n = 3, n = u × v.
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 179 i
179
We discuss the concept of surface area only for 2 surfaces in R3 . It is hoped that the treatment of the problem of surface area in Rn should be clear from this. It should further indicate the procedure to be adopted in dealing with content of k-surfaces in Rn . Definition 5.26. (i) If D is a compact Jordan measurable (has content) subset of U ⊂ R2 and σ is a smooth surface on U in R3 , x = x(u, v) σ : y = y(u, v), (u, v) ∈ U z = z(u, v) then
A(σ(D)) :=
Z
D
|n|du dv =
Z
D
s
∂(y, z) ∂(u, v)
2
+
∂(z, x) ∂(u, v)
2
+
∂(x, y) ∂(u, v)
2
du dv.
A(σ(D)) is called the area of σ(D). Symbolically, dA = |n|du dv. (ii) If σ is a piecewise smooth surface on U , then X A(σ(Di )) A(σ(D)) := i
where Di are nonoverlapping Jordan measurable subsets of D such that σ is smooth on Di◦ . Remark: Two parametrically equivalent surfaces have the same area (more on this later). Example 5.27 Here we compute the surface area for a surface given as the graph of a function of two variable, namely, z = f (x, y), (x, y) ∈ D: (x = u σ:
Then
y=v z = f (u, v)
1 σ′ = 0 fu
that is
z = f (x, y),
0 1 , fv
and
f ∈ C 1 (D).
rank σ ′ = 2,
so σ is smooth. The area formula reads Z p Z q A(σ(D)) = fu2 + fv2 + 1 du dv = fx2 + fy2 + 1 dx dy. D
D
i
i i
i
i
i
i
180
“muldown 2010/1/10 page 180 i
Chapter 5. Further Topics in Integration
Example 5.28 The surface area of a sphere of radius a is easily computed when the sphere is described as the surface x = a sin(φ) cos(θ) σ : y = a sin(φ) sin(θ) on D : {(θ, φ) : 0 ≤ θ ≤ 2π, 0 ≤ φ ≤ π.}. z = a cos(φ) Then x2 + y 2 + z 2 = a2 can easily be checked, and −a sin(φ) sin(θ) a cos(φ) cos(θ) σ ′ = a sin(φ) cos(θ) a cos(φ) sin(θ) . 0 −a sin(φ)
It follows that
|n|2 = a4 sin4 (φ) cos2 (θ) + a4 sin4 (φ) sin2 (θ) A(σ(D)) = =
Z
2 +a4 sin(φ)(cos(φ) sin2 (θ) + cos(φ) sin(φ) cos2 (θ) ,
|n|
D Z π 0
Z
π
Z
2π
0
Z
2π
and
q a4 sin2 (φ) cos2 (φ) + a4 sin4 (φ) dθ dφ
a2 sin(φ) dθ dφ π = −2πa2 cos(φ) = 4πa2 .
=
0
0
0
z ϕ
y θ x Figure 5.14. The sphere in Example 5.20. Example 5.29 The surface area of a torus. A torus can be parametrized by ( 0 ≤ θ ≤ 2π x = (R − a cos(φ)) cos(θ) σ : y = (R − a cos(φ)) sin(θ), on D : , 0 < a < R, z = a sin(φ) 0 ≤ φ ≤ 2π
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 181 i
181
and
ϕ a θ
R
Figure 5.15. The torus in Example ??. −(R − a cos(φ)) sin(θ) a sin(φ) cos(θ) σ ′ = (R − a cos(φ)) cos(θ) a sin(φ) sin(θ) . 0 a cos(φ)
Then
|n|2 = ((R − a cos(φ)) cos(θ) cos(φ))2 + ((R − a cos(φ)) sin(θ) cos(φ))2 2 + (R − a cos(φ)) sin2 (θ) sin(φ) + cos2 (θ) sin(φ) = a2 (R − a cos(φ))2 .
Therefore, Z 2π Z 2π a(R − a cos(φ)) dθ dφ |n| = 0 D 0 Z = 2πa (R − a cos(φ)) dφ = (2π)2 aR.
A(σ(D)) =
Z
As motivation for the definition of A(σ) we offer the following two ideas. (a) The area of a plane segment (see the figure) Ayz = A cos(θ1 ) = Aℓ, =⇒ A2
Azx = a cos(θ2 ) = Am,
Axy = a cos(θ3 ) = An
= A2 (ℓ2 + m2 + n2 ) = A2yz + A2zx + A2xy .
(5.2) Thus, A2 is the sum of the squares of the areas of the projectons onto the coordinate planes (yeah Pythagoras!). Consider the affine function σ : R2 7→ R3 and onto the coordinate planes in R3 : x x0 a1 σ : y = y 0 + a2 z z0 a3
the projections σyz , σzx , σxy b1 u b2 v b3
i
i i
i
i
i
i
182
“muldown 2010/1/10 page 182 i
Chapter 5. Further Topics in Integration
z
Θ3 n =(l,m,n) A y A
x
xy
Figure 5.16. The plane segment area and its projection onto the xy-plane. u a b y y = 0 + 2 2 v a3 b 3 z z0 z z a b u : = 0 + 3 3 x x0 v a1 b 1 x x0 a b u : = + 1 1 y v y0 a2 b 2
σyz : σzx σxy
If D is a subset of R2 with content, then from Lemma 5.2 ∂(y, z) a2 b2 A(D), hbox A(σyz (D)) = det A(D) = a3 b 3 ∂(u, v)
etc.,
so, by (5.2) above,
q A(σyz (D))2 + A(σzx (D))2 + A(σxy (D))2 s 2 2 2 ∂(z, x) ∂(x, y) ∂(y, z) = + + A(D) ∂(u, v) ∂(u, v) ∂(u, v) Z = |n|A(D) = |n| du dv (since n is constant here).
A(σ(D)) =
D
(b) The area of a smooth surface. Let σ : D ⊂ R2 7→ R3 be a smooth surface. If D is partitioned into subsets Di , then the sum of the areas of tangent plane segments |n(pi )|A(Di ), pi ∈ Di . should be expected to approximate the “area” of σ(D) very closely, provided that R the partition is fine enough. But P |n(p )|A(D ) is a Riemann sum for i i i D |n| du dv.
Now that the notion of content or measure has been introduced for curves and surfaces it is a simple matter to extend the idea of integration to such objects. For example in R3 : If γ : R 7→ R3 is a smooth curve and I is a closed interval in R, then
i
i i
i
i
i
i
5.2. Integration on Curves and Surfaces
“muldown 2010/1/10 page 183 i
183
.
. .
Figure 5.17. The area of a surface as approximated by tangent plane areas. a partition of γ(I) is induced by a partition of I into subintervals {Ii }. So, if f is a real valued function on γ(I), we may define Riemann sums X X (f ◦ γ)(ti )ℓ(γ(Ii )), pi ∈ γ(Ii ), ti ∈ Ii , γ(ti ) = pi . f (pi )ℓ(γ(Ii )) = i
i
The resulting integral will be Z Z Z f == f dℓ := (f ◦ γ)|γ ′ | dt γ(I) γ(I) I Z p = = f x(t), y(t), z(t) x′ (t)2 + y ′ (t)2 + z ′ (t)2 dt. I
i
i i
i