VIGRE Working Group Symbolic Dynamics and

0 downloads 0 Views 4MB Size Report
Mar 4, 2004 - integers with a finite number of colors one can find arbitrarily long arithmetic ...... Thus, as there are at most n places to start, the maximum number of words is ... What is the minimum complexity function for an aperiodic u? ...... Exercise 4: Using Theorem 2, calculate the entropy of the edge shift given in.
VIGRE Working Group Symbolic Dynamics and Applications Winter 2004

Participants: Faculty Postdoctoral Fellows Graduate Students Undergraduate Students

Vitaly Bergelson, Gerald Edgar, Alexander Leibman, Bj¨orn Sandstede Jeffrey Humpherys, Larry Lindsay, Steven Miller, Landon Proctor, Tamar Ziegler Bruce Adcock, John Griesmer, Adam Hammett, Joon-Ku Im, Craig Lennon, William Mance, Ronnie Pavlov, Rafal Pikula, Man Tsoi Dean Eiger, Chris Hammond, Alex Ustian

Content: • Alexander Leibman: Symbolic Dynamics and van der Waerden’s theorem on arithmetic progressions • Joon-Ku Im: Symbolic Dynamics of the Pedal Mapping • Steven Miller: Notes from all meetings

Symbolic Dynamics and van der Waerden’s theorem on arithmetic progressions The symbolic space is the following object: you take a finite alphabet (= a finite set) A = {a1 , . . . , ad } and let Ω be the set of all infinite sequences of elements of A, © ª Ω = x1 x2 x3 . . . , x i ∈ A . In other words, Ω is the set of mappings N −→ A, Ω = AN . A metric on Ω may be introduced in different ways; only it is important to us that two sequences from Ω are assumed to be close to each other if their large initial intervals coincide. For x = (x 1 x2 x3 . . .) and y = (y1 y2 y3 . . .), let us simply put dist(x, y) = 1/k where k is the minimal integer for which xk 6= yk . Equipped with this metric, Ω becomes a compact space homeomorphic to the Cantor set. (Check this. Actually, any totally disconnected compact metric space without isolated points is homeomorphic to the Cantor set.) The shift T is the transformation of Ω defined by T (x1 x2 x3 . . .) = (x2 x3 x4 . . .). T may increase the distance between points, but is a continuous mapping. (Check this.) The space Ω with the transformation T is what is e = AZ of two-sided called the symbolic dynamical system. One may also consider the space Ω sequences of elements of A, then the shift Te defined by Te(. . . x−1 x0 x1 . . .) = (. . . x0 x1 x2 . . .) e is an invertible self-homeomorphism of Ω.

Actually, the symbolic dynamical system (Ω, T ) which we just described is nothing new or exotic. Though the space Ω is not homeomorphic to the unit interval [0, 1] (Ω is disconnected whereas [0, 1] is connected), it is almost the same. Namely, if we identify the sequence (x1 x2 x3 . . .) ∈ Ω with the point of [0, 1] whose expansion in the number system base d is 0.x1 x2 x3 . . . (pretending that the elements a1 , . . . , ad of A are the digits of our number system), we obtain a continuous mapping from Ω onto [0, 1]. Were this mapping one-to-one, it would be a homeomorphism; it is not one-to-one since some points in [0, 1] have two distinct expansions and so, two preimages in Ω. However, such points form a countable set, which can often be ignored. (Think about the Cantor function C −→ [0, 1] on the standard Cantor set C.) The natural© product measure on Ω (the measure which ª takes the value d−k on any set of the form ∗ . . . ∗ xi1 ∗ . . . ∗ xi2 ∗ . . . . . . ∗ xik ∗ ∗ ∗ . . . , where the entries at the positions i1 , . . . , ik are fixed and the other are arbitrary) is also the same as the usual Lebesgue measure on [0, 1]. If we interpret the elements of Ω as numbers in [0, 1], our transformation T is nothing more than the mapping x 7→ dx mod 1. (The two-sided shift on the space AZ does not have such a simple interpretation.) We are not, however, interested in the whole system (Ω, T ). We usually deal with a T -invariant closed subset of Ω (a closed subset X of Ω with the property T (X) ⊆ X), and the restriction of T on such a set. Ω has plenty of T -invariant closed subsets; here is how 2 they usually arise. Take a point x ∈ Ω and consider its orbit {T n x}∞ n=0 = {x, T x, T x, . . .} ∞ under the action of T . Let Ωx be the closure of this orbit, Ωx = {T n x}n=0 . Then Ωx is a closed T -invariant subset of Ω. For almost all points x ∈ X, Ωx = Ω. (Check that Ωx = Ω means that the expansion x1 x2 x3 . . . of x contains all possible finite words in the alphabet A.) But there also are points x ∈ X with Ωx 6= Ω (think about examples), and the dynamical system (Ωx , T |Ωx ) may be quite interesting and have unpredictable properties. In some sense, (Ω, T ) is a universal dynamical system, which contains all other dynamical systems as subsystems. 1

Van der Waerden’s theorem on arithmetic progressions is a “static” result and its formulation has nothing in common with any dynamics: For any coloring of the set of positive integers with a finite number of colors one can find arbitrarily long arithmetic progressions in one color. This theorem was proved in 1927 by (at that time) young mathematician B. L. van der Waerden; its direct combinatorial proof is long and complicated. (Try to prove it!) A symbolic dynamical approach to this (and similar) results was suggested by H. Furstenberg in the late 70s and has become the base of what is now called the Ergodic Ramsey Theory. Here is Furstenberg’s idea. Assume that A = {a1 , . . . , ad } is a set of “colors” and that z1 z2 z3 . . ., with zi ∈ A, is a coloring of N. Consider z = z1 z2 z3 . . . as a point of Ω = AN . Thus, Ω is the space of all colorings of N with colors from A; two colorings are assumed to be close to each other if they coincide on a large initial interval of N. In particular, for two colorings x = x1 x2 x3 . . . and y = y1 y2 y3 . . ., dist(x, y) < 1 if and only if x1 = y1 . The shift T acts on Ω so that, for any x = x1 x2 x3 . . . ∈ X and any m ∈ N, T m (x1 x2 x3 . . .) = xm+1 xm+2 xm+3 . . .. Hence, for two colorings x = x1 x2 x3 . . . and y = y1 y2 y3 . . . and nonnegative integers m, l, dist(T m x, T l y) < 1 if and only if xm+1 = yl+1 . Under the coloring z, the terms of a (k + 1)-term arithmetic progression m, m + n, m + 2n, . . . , m + kn have the same color if zm = zm+n = zm+2n = . . . = zm+kn , that is, if dist(T m−1 z, T m−1+in z) = dist(T m−1 z, T in (T m−1 z)) < 1 for i = 1, . . . , k. Let ∞ X = Ωz = {T n z}n=0 , then X is a compact metric space on which T acts as a continuous transformation, and Z = {T m z}∞ m=0 is a dense subset of X. Now van der Waerden’s theorem follows from the following very general result: Topological Multiple Recurrence Theorem (Furstenberg and Weiss, 1978). Let T be a continuous mapping from a compact metric space X into itself. For any k ∈ N and any ε > 0 there exist a point x ∈ X and an integer n ∈ N such that dist(T in x, x) < ε for all i = 1, . . . , k. If Z is a dense subset in X then such x can be found in Z. Proof. Note that the last statement is clear: if, for some k and ε, the first part of the theorem holds for a point x ∈ X, then it holds for all points in a neighborhood of x, and so, for a point from Z. Next, with the help of Zorn’s lemma, one easily proves that any compact space X with a self-mapping T has a minimal closed nonempty T -invariant subset. Let us replace X by such a minimal subset and assume that X is minimal, that is, has no nonempty proper closed T -invariant subsets. Then the orbit {T n x}∞ n=0 of any x ∈ X is dense in X; ∞ n otherwise {T x}n=0 would be a proper closed T -invariant subset of X. After this reduction is done, we proceed by induction on k. For k = 1 the statement n ∞ is clear: choose any x ∈ X; since X is minimal, the orbit {T n (T x)}∞ n=0 = {T x}n=1 of T x is dense in X and so, there exists n ∈ N such that dist(T n x, x) < ε. Now assume that the statement is true for some k ≥ 1, that is, that for any ε > 0 there exists a point x ∈ X with dist(T in x, x) < ε for some n ∈ N and i = 1, . . . , k. We then claim that the set of points with this property is dense in X. Indeed, let U be an open set in X, let B ⊆ U be an open ball of radius < ε in U , and for each m ∈ N let S∞ Bm = (T m )−1 (B). Then {Bm , m ∈ N} is an open cover of X; otherwise X \ m=1 Bm would be a nonempty proper closed subset of X. Since X is compact, we may choose a finite subcover {Bm1 , . . . , Bmr } of this cover. Let δ > 0 be such that any ball of radius δ in X is contained in one of Bm1 , . . . , Bmr . (A number δ with this property exists for any finite open cover of any compact metric space (prove this), and is called a Lebesgue 2

number of the cover.) Now, let x ∈ X and n ∈ N be such that dist(T in x, x) < δ for all i = 1, . . . , k. This means that the points T n x, T 2n x, . . . , T kn x are all contained in the ball of radius δ with center at x; denote this ball by D. Let j be such that D ⊆ Bmj . Then T mj (D) ⊆ B, which implies that the points T n (T mj x), T 2n (T mj x), . . . , T kn (T mj x) are all contained in the ball of radius < ε centered at the point T mj x ∈ U . We are now ready to start the proof of the inductive step. Fix an ε > 0 and find a point x0 ∈ X and an integer n0 ∈ N such that dist(T in0 x0 , x0 ) < ε/2, i = 1, . . . , k. Choose x1 ∈ (T n0 )−1 (x0 ), then T n0 x1 = x0 and dist(T (i+1)n0 x1 , x0 ) = dist(T in0 x0 , x0 ) < ε/2, i = 1, . . . , k. Thus, dist(T in0 x1 , x0 ) < ε/2, i = 1, . . . , k + 1. Since T is continuous, there exists a positive ε1 < ε such that dist(T in0 y, x0 ) < ε/2, i = 1, . . . , k +1, for all y in the ε1 -neighborhood of x1 . Using the induction hypothesis, find a point y1 in the ε1 /2-neighborhood of x1 and n1 ∈ N such that dist(T in1 y1 , y1 ) < ε1 /2, i = 1, . . . , k. Then the points and T in1 y1¢, i = 1, . . . , k, are in the ε1 -neighborhood ¡ in y1(i−1)n 1 of x1 , which implies dist T 0 (T y1 ), x0 < ε/2, i = 1, . . . , k + 1. Take any point n1 −1 x2 ∈ (T ) (y1 ), then dist(T in1 x2 , x1 ) < ε1 /2 < ε/2 and ¡ ¢ dist T i(n1 +n0 ) x2 , x0 < ε/2, i = 1, . . . , k + 1. Proceeding in this way, we find points x3 , x4 , . . . ∈ X and integers n2 , n3 , . . . ∈ N such that, for any l, ¢ ¡ dist T inl−1 xl , xl−1 < ε/2, ¡ ¢ dist T i(nl−1 +nl−2 ) xl , xl−2 < ε/2, .. . ¡ ¢ dist T i(nl−1 +...+n0 ) xl , x0 < ε/2, i = 1, . . . , k + 1. Since X is compact, there ¡exist integers m and¢ l > m such that dist(xl , xm ) < ε/2. For such m and l we have dist T i(nl−1 +...+nm ) xl , xl < ε, i = 1, . . . , k + 1, and we put x = xl and n = nl−1 + . . . + nm .

3

Symbolic Dynamics of the Pedal Mapping

March 4, 2004

Seemingly complicated problems can sometimes be easily solved if we investigate them via symbolic dynamics. For example, we can get the maximal invariant set in the dynamical system of Smale’s horseshoe by considering its isomorphic symbolic dynamical system. Here we discuss a simple, but interesting dynamic system of ’Pedal Mapping.’ From elementary plane geometry, we will define the pedal mapping of triangles, and then encode each triangle to an infinite obtuseness word in {0, 1, 2, 3}N . And then we will discuss its geometric, algebraic properties, and ergodicity.

1

Pedal Triangles and Pedal Mapping

Definition 1.1 (Pedal Triangle) Let T and T 0 be triangles. We call T 0 the Pedal Triangle of T if T 0 is formed from the feet of the altitudes of T . We label each angle ∠10 , ∠20 and ∠30 of T 0 to be the feet of the altitudes from ∠1, ∠2 and ∠3 respectably. And the mapping P from a tringle to its pedal tringle is called the Pedal Mapping. Also we have the sequence of pedal triangles of T , that is, T1 := T , T2 := P (T ), T3 := P 2 (T ), . . . , Tn := P n−1 (T ) and so on. Theorem 1.2 . Let T be a triangle with its angles ∠1, ∠2, ∠3 and T 0 be one with ∠10 , ∠20 , ∠30 . Then (1) If ∠1, ∠2, ∠3 ≤ π/2, that is, T is an acute triangle, then ∠10 = π −2·∠1, ∠20 = π −2·∠2, and ∠30 = π − 2 · ∠3. (2) If ∠1 π/2, that is, T is an obtuse triangle, then ∠10 = 2·∠1−π, ∠20 = 2·∠2, ∠30 = 2·∠3. The same things hold for obtuse ∠2 and obtuse ∠3. Proof. Using Cartesian coordinate. Note. There can be at most one angle which is π/2 since ∠1 + ∠2 + ∠3 = π, and ∠1, ∠2, ∠3 ≥ 0. 1

Definition 1.3 (Degenerate). (1) A triangle T is called degenerate if it has one or two angles of 0. (2) A triangle T is called eventually degenerate if there exists an integer n ≥ 0 such that n-th pedal triangle of T is degenerate. For example. The pedal triangles of right triangles are degenerate. Definition 1.4 (Obtuseness word). To any triangle T with its angles ∠1, ∠2 and ∠3, assign its obtuseness label a to T if there is its only obtuse angle ∠a. (otherwise assign 0) The obtuseness word a1 a2 a3 · · · of a triangle T is the sequence of obtuseness labels of its consecutive pedal triangles T1 , T2 , T3 , · · · . Now consider a map E from the set M of equivalence classes of all similar triangles to X, some subset of {0, 1, 2, 3}N corresponding a triangle T to its obtuseness word. Then we want E is an isomorphism, that is, E is bijective and the following diagram is commutative : P

M −−−→ M     Ey Ey P∗

X −−−→ X where P ∗ is a shift map induced from P .

If this is guaranteed, then we can answer the following questions almost trivially. Question 1.5 . (1) Which triangle has all its pedal triangles acute? How many of such triangles are there? (2) Which triangle has all its pedal triangles obtuse? How many of such triangles are there? (3) Are there any triangles whose sequence of pedal triangles come arbitrarily close to all triangles?

2

Geometric Understanding of the Pedal Mapping

Definition 2.1 (Geometric Representation). If T is any triangle with angles ∠1, ∠2, ∠3, then write (x, y, z) = (∠1/π, ∠2/π, ∠3/π). Since x + y + z = 1 and x, y, z ≥ 0, we can represent any equivalence class of similar triangles as a point on the equilateral triangle M = {(x, y, z) : x + y + z = 1, x, y, z ≥ 0}.(We call this M a Moduli Space.) This is the Geometric Representation of triangles. We usually identify a triangle T with its geometric representation. 2

Note. If we divide M into 4 subdivision (again equilateral triangles), label each of those Mi := M ∩ {(x, y, z) : x 1/2} for i = 1, 2, 3 and M0 := M \ (M1 ∪ M2 ∪ M3 ) naturally. Here we can easily see that T := (x, y, z) ∈ M0 , then T is the geometric representation of an acute triangle, and that if T ∈ Mi for any i = 1, 2 or 3, then T is that of an obtuse triangle with obtuse ∠i. Remark 2.2 . (1) By the pedal mapping, each of the three triangles M1 , M2 , M3 is dilated by a factor of 2 and laid back over M . But M0 is dilated by a factor of -2. (2) If P n−1 (T ) is in Mi , then n-th symbol of E(T ) = i. (3) Obviously the obtuseness words are not 4-nary expansions of numbers, but just words of 0,1,2 and 3. For example, 03 6= 10, 01 6= 10, 03 6= 30. (4) The pedal mapping seems not continuous in some sense, even if we didn’t yet define any topology on M. (5) We can visually understand that there must be prohibited points to guarantee the 1-1 correspondence between triangles and points on {0, 1, 2, 3}N , which terminate in i0i(i = 1, 2, 3) or ia (i = 1, 2, 3 and a is any word, not necessarily repeating, consisting of two nonzero symbols j, k other than i) since i0i and ia should be replaced by 00i and iac respectively where iac is a word obtained by interchanging j and k in ia.

Definition 2.3 (Set of Non-prohibited Obtuseness Words). Now we call the subset X of {0, 1, 2, 3}N the set of non-prohibited obtuseness words consisting of all obtuseness words in {0, 1, 2, 3}N after throwing the prohibited points mentioned above. Note. By throwing all the prohibited points away, we can restrict our range to X to achieve 1-1 correspondence to similar triangles. Now we verify it by constructing encoding and decoding algorithm explicitly by algebraic understanding.

3

Algebraic Understanding of the Pedal Mapping

Definition 3.1 (Binary Expansion of Angle Matrix). For any triangle T with its angles ∠1, ∠2 and ∠3, represent T by       x ∠1 .α1 α2 α3 · · · T =  y =1/π ·  ∠2  =  .β1 β2 β3 · · · , z ∠3 .γ1 γ2 γ3 · · · which we call the Binary Expansion of the Angle Matrix of T . We may identify the binary expansion of the angle matrix of a triangle T with T itself. Then we have two identified representations of a triangle : the geometric representation on the moduli space and the 3

binary expansion of angle matrix, and need to identify these to the set of non-prohibited obtuseness words.

Remark 3.2 . (1) The first digits in three rows of the angle matrix determines in which Mi the triangle T lies, that is, determines the first digit of its obtuseness word of T . That is to say, if the first digit of i-th row is 1 and the others are 0, then T has the angle ∠i obtuse, i.e. T is contained in Mi . (2) If there is 1 in the first digit of i-th row, then T has obtuse ∠i.( .1(mod 2), i.e. 1/2) Therefore the pedal mapping is just like a shift mapping from each component of the matrix because ∠i0 = 2 · ∠i − 1 and ∠j 0 = 2 · ∠j for j 6= i.) Otherwise, that is, if all the first digits are 0, the mapping is just like .1 - (the result of the shift map) from each component of the matrix. (3) If all three componets of the matrix have 1 in the first digits, the sum 1/π ·(∠1+∠2+∠3) exceeds 1. If two ofthem  have 1 in the first digits, the lower  shouldhave all  places   0. The .01 .01 .10 .10 only cases are like  .10  which can be indetified by  .01 ,  .10  and  .01 . .0 .0 .0 .0 Then we see the first two matrices represent the obtuseness words 12 and 21 respectively, but these are prohibited and should be replaced by 03. Therefore each angle matrix has at most one component with 1 in its first digits.

Algorithm 3.3 . Now we will construct an algorithm of encoding and decoding between a triangle T to an non-prohibited obtuseness word E(T ) in X ⊂ {0, 1, 2, 3}N explicitly. Then this guarantees the 1-1 correspondence of each other and we see the diagram above is well-defined and commutative explicitly. (1) Encoding E If we do the same thing as we mentioned in (1) and (2) in the preceding Remark repeatedly, then we get an non-prohibited obtuseness word E(T ) and EP = P ∗ E. (2) Decoding D := E −1 

 .α1 α2 α3 · · · Given a non-prohibited word a = a1 a2 a3 . . . we want to associate an angle matrix  .β1 β2 β3 · · · . .γ1 γ2 γ3 · · · Step 1. Find the first digit in each row of angle matrix, that is,

4



         (1) αi 0 1 0 0   define ai 7→  βi(1)  by 0 7→  0 , 1 7→  0 , 2 7→  1 , or 3 7→  0 . (1) 0 0 0 1 γi  (1) (1) (1)  .α1 α2 α3 · · ·   Then concatenate together to form an infinite 3-row matrix P1 :=  .β1(1) β2(1) β3(1) · · · . (1) (1) (1) .γ1 γ2 γ3 · · ·   (i) (i) (i) .α1 α2 α3 · · ·   Step 2. Find Pi =  .β1(i) β2(i) β3(i) · · ·  for every i > 0. (i) (i) (i) .γ1 γ2 γ3 · · ·  (i) (i) (i)  .α1 α2 α3 · · ·   Suppose we have Pi =  .β1(i) β2(i) β3(i) · · · . (i) (i) (i) .γ1 γ2 γ3 · · · Then for Pi+1 , let columns until i-th column remained, and look at the i-th column. (i) (i) (i) If αi , βi , andγi are all 0 or all 1, (i+1) (i) (i+1) (i) (i+1) (i) then define αj = 1 − αj , βj = 1 − βj , and γj = 1 − γj for every j > i. Otherwise, Define Pi+1 := Pi . Step 3. Since the i-th iteration does not change the columns until i-th column, we can (i) (i) (i+1) = αj , and so on for every i ≥ j. Therefore define αj := αj for any i ≥ j. say αj   .α1 α2 α3 · · · So we define D(a) :=  .β1 β2 β3 · · ·  which represent a triangle in M. .γ1 γ2 γ3 · · · After we check that P E = EP ∗ , and E and D are inverses to each other, we get the following theorem

Theorem 3.4 . The correspondence between a triangle and a non-prohibited obtuseness word is an isomorphism between the dynamical system of the pedal mapping and that of shift on four symbols.

Answer 3.5 . Now we can answer the questions given at the first section. (1) The only triangle having all its pedal triangles acute is just the equilateral triangle. (2) The triangles represented on the Sierpinski Gasket as a subset of M have all their pedal triangles obtuse. So the cardinality of such triangles is uncountable. (3) Every triangle terminating in 0 1 2 3 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 000 001 · · · has a sequence of pedal triangles arbitrarily close to all triangles. We call such a triangle is dense in the moduli space M .

5

Proof. (1) Such a triangle  musthave its encoded word  0, and  thus its binary expansion of the angle .01 1/3 matrix must be  .01  which is equal to  1/3  .01 1/3 (2) Such triangles must not have 0 in their encoded words.

4

Ergodic Property of the Pedal Mapping

(This section would be quite less rigorous than the previous ones.) Theorem 4.1 (Measure Preserving Property). Let m be the Lebesgue measure on the plane M . The pedal mapping is measure-preserving, that is, for any measurable subset G of M , m(G) = m(P −1 (G)). Proof. Since the inverse image P −1 (G) of G consists of 4 pieces Gi := P −1 (G) ∩ Mi for i = 0, 1, 2, 3, and to G diminished by a factor of 2. Therefore m(Gi ) = P3Gi ’s are congruent −1 1/4 · m(G). So i=0 m(Gi ) = m(P (G)) = m(G). Theorem 4.2 (Ergodic Property). The pedal mapping is ergodic, that is, any measurable subset of M that is invariant under P has measure 0 or full measure. Theorem 4.3 (Mixing Property). The pedal mapping has mixing property. In other words, for any subset S of M and for every point T ∈ M , denote by N (T, S, n) the number of points T, P (T ), P 2 (T ), · · · , P n−1 (T ) that lie in S. Then for almost all T and every S, m(S) N (T, S, n) = . n→∞ n m(M ) lim

5

Reference

J.C. Alexander. The Symbolic Dynamics of the Sequence of Pedal Triangles. Mathematics Magazine 66 (1993) 147–158. 6

Symbolic Dynamics and Applications 1 Vitaly Bergelson, Alexander Leibman, Bjorn Sandstede Notes LaTeX-ed in real-time by Steven Miller

1 Ohio

State, Winter 2004, Tuesdays 3:30 - 5:30

Abstract Possible topics include One-Dimensional Maps (Sharkovskii’s Theorem, Feigenbaum constant, Kneading sequences), Shift Dynamics in Dynamical Systems (Smale’s horseshoe, Smale-Birkhoff Theorem, Melnikov theory and periodically forced ODEs), Closure of Orbits with Prescribed Ergodic Properties, Number Theory and Symbolic Dynamics (automorphisms on intervals and tori, orbits and closures, normal numbers), Ramsey Theory, Billiards, Complexity and Entropy. All notes were taken in real-time; all mistakes should be attributed to the typist, not to the lecturer.

Contents 1

. . . . . .

6 6 6 8 8 9 10

2

Second Lecture: Tuesday, Jan 13th 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12 12 14

3

Third Lecture: Jan 20th 3.1 Measure Zero Examples . . . . . . . 3.1.1 Zero Measure on R . . . . . . 3.1.2 Smooth Dynamics and Cantor: 3.1.3 Measure Zero in R2 . . . . . . 3.2 Measure Preserving Maps . . . . . . . 3.3 Measure Zero in {0, 1}N . . . . . . . 3.4 Another Problem . . . . . . . . . . . 3.5 Minimal System . . . . . . . . . . . .

. . . . . . . .

18 18 18 20 20 21 22 22 23

. . . .

24 24 25 26 27

4

First Lecture: Jan 6th 1.1 Introduction . . . . . . . . . . . . . . . 1.2 Example on the Torus . . . . . . . . . . 1.3 Van der Waerden Example . . . . . . . 1.4 Shift Dynamics and Dynamical Systems 1.5 Continued Fractions . . . . . . . . . . . 1.6 Billiards . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . . . . . . . . . . . . . . Logistic Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fourth Lecture: Jan 27th 4.1 Horseshoes . . . . . . . . . . . . . . . 4.2 Pre-images . . . . . . . . . . . . . . . 4.3 Backwards Time . . . . . . . . . . . . 4.4 Dynamics on the Maximal Invariant Set

1

. . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . .

. . . . . . . .

. . . .

. . . . . .

. . . . . . . .

. . . .

. . . . . .

. . . . . . . .

. . . .

. . . . . .

. . . . . . . .

. . . .

. . . . . .

. . . . . . . .

. . . .

5

6

7

8

9

Fifth Lecture: February 3rd, 2004: Period 3 Implies Chaos 5.1 Preliminary Lemmas . . . . . . . . . . . . . . . . . . . 5.2 Proof of the Theorem of Li and Yorke . . . . . . . . . . 5.3 General Result . . . . . . . . . . . . . . . . . . . . . . 5.4 Handout from Bruce . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

29 29 30 31 32

Sixth Lecture: February 10th, 2004 6.1 Subshifts and Languages . . . . 6.2 Generating Subshifts . . . . . . 6.3 Minimal Subsets . . . . . . . . 6.4 Substitutions . . . . . . . . . . 6.5 Example: Morse Substitution . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

35 35 36 37 37 39

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Seventh Lecture: February 17th, 2004: An Introduction Frobenius Theory 7.1 Shift Spaces and Types . . . . . . . . . . . . . . . . . 7.2 Edge Shifts . . . . . . . . . . . . . . . . . . . . . . . 7.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . 7.4 Irreducibility and Aperiodicity . . . . . . . . . . . . . 7.5 Perron-Frobenius . . . . . . . . . . . . . . . . . . . . 7.6 Applications of the Perron-Frobenius Theorem . . . . 7.7 Entropy of Edge Shifts . . . . . . . . . . . . . . . . . 7.8 Example from Vitaly . . . . . . . . . . . . . . . . . . 7.9 Handout from Ronnie Pavlov . . . . . . . . . . . . . . Eighth Lecture: February 24th, 2004 8.1 Symbolic Dynamics and Fractals: Introduction 8.2 Notation . . . . . . . . . . . . . . . . . . . . . 8.3 Example: Eisenstein Fractal . . . . . . . . . . 8.4 Example: McWartor’s Pentigree . . . . . . . . 8.5 Example: Cantor Set . . . . . . . . . . . . . . 8.6 Dimension . . . . . . . . . . . . . . . . . . . . 8.7 Heighway’s Dragon . . . . . . . . . . . . . . . 8.8 References for the Talk . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . .

to Perron. . . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . .

. . . . . . . . .

. . . . . . . .

. . . . . . . . .

42 42 43 44 45 46 47 48 48 49

. . . . . . . .

56 56 57 59 60 60 62 65 67

Ninth Lecture: March 2nd, 2004: Symbolic Dynamics of the Pedal Mappin 69 9.1 Pedal Triangles and Pedal Mapping . . . . . . . . . . . . . . . . 69 2

9.2 9.3 9.4

Geometric Understanding of the Pedal Mapping . . . . . . . . . . Algebraic Understanding of the Pedal Mapping . . . . . . . . . . Ergodic Property of the Pedal Mapping . . . . . . . . . . . . . .

71 72 75

A Cantor Set Review A.1 Cantor Set . . . . . . . . . . . . . . . . . . . . A.1.1 Construction . . . . . . . . . . . . . . A.1.2 Non-Trivial Point in Cantor Set . . . . A.1.3 Alternate Formulation of C . . . . . . A.1.4 Another Formulation of the Cantor Set A.1.5 non-Cantor Sets . . . . . . . . . . . . . A.2 Uniqueness of sets under such constructions . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

76 76 76 77 78 79 80 81

B Algebraic and Transcendental Numbers B.1 Definitions and Cardinalities of Sets B.1.1 Definitions . . . . . . . . . B.1.2 Countable Sets . . . . . . . B.1.3 Algebraic Numbers . . . . . B.1.4 Transcendental Numbers . . B.1.5 Continuum Hypothesis . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

82 83 83 84 85 87 88

C Introduction to Continued Fractions C.1 Decimal Expansions . . . . . . . . . . . . . . . . . . . C.2 Definition of Continued Fractions . . . . . . . . . . . . C.2.1 Uses of Continued Fractions . . . . . . . . . . . C.2.2 Definition . . . . . . . . . . . . . . . . . . . . . C.2.3 Calculating Continued Fractions . . . . . . . . . C.2.4 Dynamical Interpretation of Continued Fractions

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

90 90 91 91 92 93 93

. . . . . . . .

95 95 96 96 99 99 100 102 103

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

D Properties of Continued Fractions D.1 Representation of Numbers by Continued Fractions . D.1.1 Elementary Properties of Continued Fractions D.1.2 Convergents to a Continued Fraction . . . . . D.1.3 Observation . . . . . . . . . . . . . . . . . . D.1.4 Continued Fractions with Positive Terms . . D.2 Uniqueness of Continued Fraction Expansions . . . . D.3 Positive, Simple Convergents . . . . . . . . . . . . . D.4 Convergence . . . . . . . . . . . . . . . . . . . . . 3

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

D.5 Uniqueness of Infinite Continued Fractions . . . . . . D.6 Periodic Continued Fractions and Quadratic Irrationals D.6.1 Period Continued Fractions . . . . . . . . . . D.6.2 Quadratic Irrationals . . . . . . . . . . . . . .

. . . .

105 106 106 108

E Continued Fractions and Approximations E.1 Convergents Give the Best Approximations . . . . . . . . . . . . E.2 Measures ¯ of Sets ¯ with Given Continued Fraction Approximations ¯ p¯ C E.2.1 ¯x − q ¯ ≤ q2+² Infinitely Often . . . . . . . . . . . . . . . ¯ ¯ ¯ ¯ E.2.2 ¯x − pq ¯ ≤ q21√5 . . . . . . . . . . . . . . . . . . . . . . . E.3 Convergents are the Best Rational Approximations . . . . . . . . E.4 Weaker Approximation Properties of Convergents . . . . . . . . . E.5 Exponent (or Order) of Approximation . . . . . . . . . . . . . . .

111 111 113 113

F Liouville’s Theorem Constructing Transcendentals F.1 Review of Approximating by Rationals . . . . F.2 Liouville’s Theorem . . . . . . . . . . . . . . . F.3 Constructing Numbers . . . . . P Transcendental −m! 10 . . . . . . . . . . . . . . . F.3.1 m F.3.2 [101! , 102! , . . . ] . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

124 124 125 128 128 129

G Distribution of Digits of Continued Fractions G.1 Introduction . . . . . . . . . . . . . . . . . . . G.2 Distribution of a1 (α) = k . . . . . . . . . . . . G.3 Bounds for an+1 (α) = k . . . . . . . . . . . . G.3.1 Prob(A1,...,n (a1 , . . . , an )) . . . . . . . . G.3.2 Prob(An+1 (k) ∩ A1,...,n (a1 , . . . , an )) . . G.3.3 Prob(An+1 (k)) . . . . . . . . . . . . . G.4 Distribution of an (α) = k . . . . . . . . . . . . G.4.1 Statement of Kuzmin’s Theorem . . . . G.4.2 Sketch of the Proof of Theorem G.4.1 . G.4.3 Preliminary Lemmas . . . . . . . . . . G.4.4 Proof of Theorem G.4.4 . . . . . . . . G.5 Distribution of an+1 (α) = k2 , Given an (α) = k1 G.5.1 Simple Example . . . . . . . . . . . . G.5.2 Sketch of Proof of Kuzmin’s Theorem . G.6 Kuzmin Experiments . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

131 131 132 132 133 133 134 135 135 136 138 141 145 145 147 149

4

. . . .

. . . .

. . . .

. . . .

. . . .

115 117 119 122

G.6.1 Direct Solution . . . . . . . . . . . . . . G.6.2 Solution via Linearity of Expected Values G.6.3 Generalization . . . . . . . . . . . . . . G.6.4 General Comments . . . . . . . . . . . . G.7 Research Problems . . . . . . . . . . . . . . . .

5

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

150 150 151 152 153

Chapter 1 First Lecture: Jan 6th 1.1

Introduction

What is symbolic dynamics? What is dynamics? We have a space X, and a map T : X → X. We often impose different structures on X (topological space, metric space, measure space, and so on). We look at orbits of points x ∈ X under T : T n x, n ∈ N or n ∈ Z. What makes it symbolic? This presupposes some coding using an alphabet (maybe 0-1). Partition space into pieces. Look at some x0 . Then T n x0 jumps from partition to partition. The orbit of this point becomes an infinite string of symbols. If we had four partitions, say {a, b, c, d}, it would be an infinite word in these four letters. In principle, the hope is that this information, the track of the travel, is useful. We hope that, given such strings, we can understand something about the system. We may gain the following: if a point is "typical", its behavior could give information about many other points. Could be a dull system where all points have similar behavior. Thus, we have a correspondence between T n x0 (n = 1, 2, . . . ) and that cell of a partition to which T n x0 belongs.

1.2

Example on the Torus

Consider the map T x = 2x mod 1. We study this on the interval [0, 1), a onen dimensional torus. The orbit x}n≥0 . Think of binary sequences. Every x ∈ P is {2 an [0, 1) is representable by n≥0 2n . For all but certain points (certain rationals),

6

the is unique, each anP∈ {0, 1}. Thus, T x = 2x mod 1, which is P representation 2an . We now have that this is n≥0 an+1 , or a1 + a22 + a223 + · · · . n≥0 2n 2n So, if we start with x ∼ (a0 , a1 , . . . ), we now have 2x ∼ (a1 , a2 , . . . ). Thus, this map is really the same as a shift. We took a nice map, but by using this partition (the binary expansion of numbers), for this faithful representation, our transformation becomes nothing but a shift! In many books, symbolic dynamics is defined as a study of shifts on symbols. Formally, we can write (T1 , 2x mod 1) is isomorphic to the space ({0, 1}N , σ), where σ is the shift operator σ(a0 , a1 , . . . ) = (a1 , a2 , . . . ). Exercise 1.2.1. Prove {0, 1}N is a compact space. To do this, we need to introduce what it means for two elements to be close. If a, b ∈ {0, 1}N , then dist(a, b) =

X |an − bn | 2n

n≥0

.

(1.1)

If a or b is a certain rational, need to show the above is well defined, as an and bn are not. Exercise 1.2.2. What is the orbital closure of (0, 1, 0, 1, 0, 1, . . . )? Hint: it is just two points. Can one have elements with countably many points in its orbital closure? Exercise 1.2.3. Can you give a point whose orbital closure is everything? A typical sequence will do? Why are (T1 , 2x mod 1) and ({0, 1}N , σ) not quite the same? If two elements have a property in one, then their associates must have the same property in the other. We have several problems. First, we have some problems with nonuniqueness (but we can handle this with some book-keeping). More seriously, the two spaces are different topologically. The second is topologically like the cantor set, which is very different than the torus. Consider X

−−→ T

φ↓ Y

X ↓φ

−−→ S 7

Y

We want the operations to commute. In our example, our sets are clopen. Consider all possible sequences starting with 0 (C0 ) and all sequences starting with 1 (C1 ). Consider the Cantor Set (see Appendix A). Let Σ P = {0, 1}NP . These n are clopen sets. We can relate these to the cantor set by mapping n a2nn to n 2a . 3n

1.3

Van der Waerden Example

Suppose we have a partition N =

r [

Ci .

(1.2)

i=1

We have the following Theorem 1.3.1 (Van der Waerden). For any r, and any partition of N into r cells, then one of the Ci contains arbitrarily long arithmetic progressions How is this related to Symbolic Dynamics? Start moving through this partition. As we move through N, partitioned into r pieces, to every integer we can assign a number from 0 to r − 1. Thus, every integer is coded from the set {0, . . . , r − 1}N . Take a partition. If we are looking for something monochrome, we are looking for some place in the sequence of integers in N such that, if we jump say d units, we get the same color, and then if we jump d units again, we get the same color again, and so on.

1.4

Shift Dynamics and Dynamical Systems

We’ve seen that, whatever we want to know about (T1 , 2x mod 1), we can study the isomorphic space ({0, 1}N , σ). In general, we try and map systems to shift spaces. Start with a pendulum hanging from the ceiling with a point mass weight. The pendulum can swing up and down, and can go above the point. It has two equilibrium points: directly down (stable equilibrium) and directly up (unstable equilibrium). By unstable we mean that, if we tip the pendulum infinitesimally in either direction, it will undergo very strong motion, returning in infinite time; in the stable case, infinitesimal perturbations cause small motions. Assume the pendulum is moving up and down with period equal to 1 second. Suppose we look at the position every 10 or 20 seconds (some multiple of the 8

period). What can happen? If it is down, it stays down. If it is exactly up, it will stay up. If the pendulum is in another position, we can see behavior (we’ll add some small damping). Now slightly perturb either up or down. Put a little box up near the top (down near the bottom). If the pendulum is in the top box at an observation time, write 1; if in the bottom box, write 0. Assume we have motion so that, at every time we look (fixed multiple of the period of oscillation), we are either in the top or bottom box. Thus, to each angle θ (satisfying this condition), we have a sequence of 0s and 1s. Thus, we have a map to symbols. Thus, we can encode some information about the pendulum in this sequence. If we want the pendulum to stay down for the first 100 seconds, then up for the next 20 seconds, and so on, we can find a sequence which will have this property. We claim any such sequence can be realized by this system.

1.5 Continued Fractions Consider 0+

1 a1 +

,

1 a2 + a

ai ∈ N.

(1.3)

1 3 +···

This is a continued fraction. Any number in (0, 1) can be so expressed. How are the digits distributed as we pass through different numbers? For example, in base 10, most numbers are normal. This means in their decimal expansion, the number 1 of 1s equals the number of 2s and so on, each occurring 10 . More generally, every 1 pair of digits has frequency 100 , and so on. Exercise 1.5.1. Give an example of a normal number (in base 10). What can we say about digits of continued fractions? For example, 1+

1 1+

is the golden mean, 1+

1 1 1+ 1+···

1 2+

1

1 2+ 2+···

(1.4)

(1.5)

√ is 2. A nice result is that a number has a periodic continued fraction if and only if x is a quadratic irrational. 9

So, for x ∈ (0, 1), we consider its continued fraction expansion x = a0 (x) +

1 . 1 a1 (x) + a2 (x)+···

(1.6)

If we looked at its decimal expansion, x =

X dn (x) n

10n

.

(1.7)

n (x) For decimal expansions, we can figure out what d1 (x)+···d tends to, as n → n ∞ (it is finite). However, for almost all x (in the sense of Lebesgue measure), we n (x) have that a1 (x)+···a tends to infinity. n p n If we study a1 (x)a2 (x) · · · an (x), this converges to Khinchin’s constant, 2.3 . . . . Returning to T1 and the map T x = 2x mod 1. The image of an interval is twice its original length. Thus, in this sense, length is not preserved. We are looking at things the wrong way: look at pre-images: the pre-image of an interval (in the y-axis) is two intervals on the x-axis, each of half the length. Thus, from this point of view, length is preserved. Consider the map T x = x1 mod 1. If we look at this from the continued fraction point of view:

T 2 ([a1 , a2 , . . . ]) = [an+1 , an+2 , . . . ]. (1.8) R b dx With respect to the measure ln12 a 1+x (this is the measure of [a, b]), then this map is measure preserving. Exercise 1.5.2. What would be the definition of a normal continued fraction? Most numbers’ continued fractions should be normal.

1.6 Billiards Consider some closed polygon. A billiard travels with elastic collisions (angle of incident equals angle of reflection). For irrational billiards, we mean the angles are irrational (with respect to π); for rational, each angle is a rational multiple of π. Much is known on rational billiards. The trajectory is not interesting between walls. If the polygon has finitely many sides, we can index each side with a number. Thus, similar to before, we have a sequence of numbers. 10

Consider two different trajectories and their orbital closures. When are those the same in the sense of Equation 1.2? Consider a triangle. We could have a periodic trajectory (because of some geometric theorem, see the works of Fagniano). If we perturb the orbit, we can get families of orbits. Consider x → x + α mod 1. Many things involving Diophantine Approximations are related to this. Every point will have similar behavior. Look (again) at x → 2x mod 1. Certain points will be periodic (which ones?). Other points will have dense orbits (points whose orbital closure is everything). If you look at a triangle, people thought that if an orbit wasn’t periodic, it would be dense. It turns out this need not be true: for special triangles with certain angles, there are trajectories which are dense in only parts of the triangle. What is chaos? When is a dynamical system chaotic? One belief is that if a system is chaotic, it has two types of points: dense set of periodic points (hard to study by computer – will that point show the complexity of the system), and other points (where slight differences quickly (exponentially) lead to very different behavior).

11

Chapter 2 Second Lecture: Tuesday, Jan 13th 2.1

Introduction

Lecturer: Vitaly Bergelson On the face of it, symbolic dynamics only involves topology, and is suitable for minimal backgrounds. We have a finite alphabet Λ = {0, 1, 2, . . . , r − 1} (r symbols starting at 0). We often consider ΛZ (two-sided sequences with values in our alphabet) and ΛN (one-sided sequences with values in our alphabet). For probabilistic reasons, we denote our space (our Outcome Space) as Ω. For example, say we are in the coin-flipping business, and we flip it infinitely many times. We may rename and call the outcomes of each toss 0 and 1, thus all possible outcomes are Ω = {0, 1}N . Thus, these are in a 1-1 correspondence with numbers in [0, 1). Thus, there is a connection between probabilistic trials and the numbers in [0, 1). Ramsey Theory: if we finitely sub-divide the integers into sets, then at least one set is rich (in terms of patterns). The integers are a lattice – in at least one of the finitely many sets, we will still have some rich properties. We’ve talked about numbers in arithmetic progression. Another example is x + y = z. Finitely color the integers: r−1 [ Ci . (2.1) N = i=0

Then Lemma 2.1.1 (Schur 1916). For any finite partition of N (see above), at least one 12

of the Ci contains x, y, z such that x + y = z. What is the connection to symbolic dynamics? This is applicable to many problems of number theory (Roth arithmetic progressions, Euler partition problems, and so on). Claim 2.1.2. The set of all partitions into r cells is {0, 1, . . . , r − 1}N . Proof. Take any number. It is in one of the r partitions. Color it with the appropriate color. Other direction is similar: given any sequence from such an alphabet, we can get a partition. Symbolic space is encoding. Next step is seeing how the dynamics helps you get to what you wanted to understand. For example, consider xn∈Z ∈ {0, . . . , r − 1}Z = Ωr .

(2.2)

Thus, we have a two-sided infinite sequence. We look at a shifting to the left operator. Starting with . . . x−2 x−1 x0 x1 x2 . . . (2.3) we have . . . x−1 x0 x1 x2 x3 . . . .

(2.4)

This is σx, where x = (xn )n∈Z . The orbit closure (σ n x)n∈Z ⊂ Ωr . Consider two different spaces Ω2 (space of all binary sequences) and Ω3 (ternary), each with its own shift operator. Are these the same system? Are they isomorphic? Suppose we do coin tossing with a fair coin (could be a fair two-sided or three-sided). Maybe instead of Ω3 we use Ω6 . Are these systems the same? In what sense? Only way to answer is to use entropy, which is a measure-theoretical creature. Want to emphasize some facts about these spaces. Ω = {0, . . . , r − 1}. Why is this P an the same as the Cantor Set (which is the space of all numbers of the form with an ∈ {0, 2}. One needs to show the two metrics are isomorphic. What 3n is less obvious, another exercise, is that ∀r1 , r2 : Ωr1 ≈ Ωr2 ≈ Cantor Set. P n| Recall the metric was n |anr|−b . n|

13

(2.5)

2.2

Complexity

Lecturer: Ronnie Pavlov Given a sequence u = (un )n∈N , we define a word of length n to be a string of n symbols from our alphabet (which in this case is Λ = {0, 1}). A word w = w1 . . . wn is a subword of a sequence u if and only if there exists an N such that uN uN +1 . . . uN +n−1 = w1 w2 . . . wn . Definition 2.2.1 (Complexity). cn (u) = # of distinct n-letter subwords of u.

(2.6)

1 ≤ cn (u) ≤ 2n .

(2.7)

Trivially, This follows from there must be at least one word (what starts the sequence), and there are at most 2n words of length n. Remark 2.2.2. When we say complexity, we often mean knowing all the cn s. Note that cn (u) is non-decreasing. Consider any n letter word. There is a letter after it, so there is at least one n + 1 letter word. Can do this for each of the cn (u) words. Similarly, each n letter word has at most two completions (our alphabet has just two letters). Thus, cn+1 (u) ≤ 2cn (u). Example 2.2.3. Suppose u is quasi-normal (this means u contains every finite word). We can form such a word by just listing all the n letter words: start with 1 letter, then 2 letters, and so on. (A normal word would mean each n letter word occurs with the same frequency). Here, cn (u) = 2n . Example 2.2.4. Suppose u is periodic, so u = wwww . . . for some word w. Computing the complexity up to the length of the period of w is hard (say it has period n). What if we have m ≥ n. A word only depends on where we start in a w. Thus, as there are at most n places to start, the maximum number of words is n; thus, for m ≥ n, cm (u) ≤ n. Thus, these have bounded complexity. In fact, Theorem 2.2.5 (Hedlund-Morse). If cn (u) ≤ n for any n, then u is eventually periodic.

14

Note this is stronger than the previous example (which only handles cm (u) ≤ n. Proof. Assume that cn+1 (u) = cn (u). Thus, every n letter word can be completed in exactly one way (or it violates Remark 2.2.2. This means that for all n letter subwords w of u, there exists a unique letter f (w) ∈ {0, 1} such that whenever w occurs in u, it is followed by f (w). This proves that u is eventually periodic. There are only finitely many n letter words, so there are only finitely many subwords, so some n letter word occurs at least twice (pidgeonhole principle). Thus, there exists a w of n letters appearing twice in u. We have [w] . . . [w]. Since the sequence is infinitely long, we can assume the two ws do not overlap. But the one on the left must be followed by f (w), which is then followed by a fixed letter, and so on. They are all uniquely determined. Now the pattern continues, and we get uniquely determined letters between the two ws. Now that we hit the second w, it is followed by f (w), and so on, and we get that new unique sequence of letters again, which is then followed by w, and so on. Thus, from this point onward, we are periodic. So we have (eventual) periodicity if and only if the complexity is bounded. We can also do this for cn (u) ≤ n for n large. Question 2.2.6. What is the minimum complexity function for an aperiodic u? Clearly c1 (u) = 2, c2 (u) ≥ 3, and so on and cn (u) ≥ n + 1. The best would be objects that have cn (u) ≥ n + 1. Definition 2.2.7 (Sturmian). A sequence u is Sturmian if cn (u) = n + 1 for all n. We have yet to show these exist. We have many equivalent definitions (some of these proofs are hard). Definition 2.2.8 (Rotational Sequence). Fix an α ∈ (0, 1), α 6∈ Q. Consider the unit circle, the one-dimensional torus. This is the same as the wrapped around unit interval [0, 1). Go up the circle to α, and take the interval I0 = [0, α); let I1 = [α, 1). Continually rotate 0 by α. If after n steps the point lies in I0 then define un = 0; otherwise we are in I1 and set un = 1. Definition 2.2.9 (Cutting Sequence). Take a grid of the two-dimensional plane, where the grid lines are at the integers. Thus, we have an infinite set of unit squares. Take an angle α that is not commensurable with 2π. Draw a line starting at the origin, going out at an angle of α. Every time we hit a vertical line write 1, every time we hit a horizontal line write a 0. 15

Claim 2.2.10. Cutting Sequences are Sturmian. We simply state that Cutting Sequences cannot be periodic. We just need to show the complexity is bounded by cn (u) ≤ n + 1. Definition 2.2.11. A subword w of a sequence u is called right special if w0 and w1 are subwords of u (ie, you can complete the word w with both options). To prove our claim, it is sufficient to show that there is only one right special word of any length n. Why? List the n level subwords of u, say 001, 010, 100, 000 for some u. What letters can we follow these words with? If you can only complete these one way, then the series is periodic. So at least one can be completed two ways. If two can be completed two ways, there would be at least two more four letter words than three letter words. Thus, the complexity would increase by 2 rather than by 1. The existence of one such right special word is guaranteed by the number being non-periodic; we must just show there is only one. To any word w, make a square patch. For example, say we have 10010. Whenever we have a 1, put a square to the right; whenever we have a 0, but a square above. Why is this useful? This tells us when a word is a subword. As you go through this, we cross a vertical line, then a horizontal line, then another horizontal, then a vertical, then a horizontal. This is the same as saying we can draw a line with slope α going through all of these boxes. So, a word w is in a cutting sequence with angle α if and only if there exists a line with angle α passing through its square path (enters in lower left square, exits in upper right, never leaves the square path). Have a similar definition of left special. When is a word w left special? It isn’t left special if there is more than one way to continue. Thus, the only way for a word to be left-special is if its square path may be entered (or traced out) by a line passing through the bottom left corner. Angle of line is fixed. The square path is dictated as soon as we start, and then can go backwards as well. There is one left special word of any length. This does show that any sequence we create in this way will be Sturmian. We will not discuss the other direction. Sturmian sequences are very interesting objects, can be generalized in many ways. 1. cn (u) = n + 2. Very close to minimal. These end up not being too interesting. 2. Rather than tiling the plane with unit squares, can consider n-dimensional cutting sequences, where we cover the cubes in three-space, say. Have to 16

worry about intersecting in xy, yz an xz-planes. For three dimensions, no matter how we choose the line (as long as it is irrational), we get cn (u) = n2 + n + 1 (note here the alphabet has 3 symbols and not 2). 3. Can take the circle rotation: take an n-dimensional torus, can do a similar coding. Generalized rotation sequences. Question 2.2.12. Why should we care about complexity? There are applications with computers. We can code any irrational number by a Sturmian sequence. Every real can also be represented by a binary expansion. Maybe Sturmians have advantages in sending: if we send digits in 1000 digit blocks, we have to remember 21000 blocks, but Sturmian numbers have only 1001 different words. We can also define entropy, which is lim

n→∞

log cn (u) . n

(2.8)

Two topological systems that are isomorphic to each other must have the same entropy. Thus, this can help us find out when two systems are not isomorphic. Note all of the systems we are looking at today have zero entropy!

17

Chapter 3 Third Lecture: Jan 20th 3.1

Measure Zero Examples

Vitaly Bergelson Symbolic Dynamics: encode from a finite alphabet, allow you to make statements about original system which had no discreteness / dynamics. Will use a little measure theory.

3.1.1 Zero Measure on R Definition 3.1.1 (Sets of Measure Zero). A set A ⊂ R has measure zero if for all ² > 0 there exists a finite or countable family of intervals In = (an , bn ) such that 1. A ⊂ ∪n In ; P P 2. n |an − bn | ≤ ². n |In | = Example 3.1.2. The set A = {1, 2, . . . , 17} has measure zero. For any ², consider ² ² In = (n − 2·17 , n + 2·17 ). Example 3.1.3. The set A = {1, 2, 3, 4, . . . } has measure zero. For about each ² integer n, make an interval of size 2n+1 centered at n. Example 3.1.4. The set of rational number Q has measure zero. For we can order the rationals, say by {x1 , x2 , . . . }, and then argue as before (drawing balls ² about xn . of radius 2n+1 18

Question 3.1.5. Is there an uncountable set A ⊂ R with measure zero? Yes: the classical Cantor Set (see chaptercantor). Any number in the Cantor Set P Appendix an can be written as x = n 3n , an ∈ {0, 2}. Exercise 3.1.6. Prove the Cantor Set has measure zero. Question 3.1.7. Consider decimal expansions which don’t have the number 7. Is this a set of measure zero? Yes, and clearly such a set is uncountable (it is the set of all words coming from an alphabet with 9 letters, which clearly contains all words coming from an alphabet with two letters). Let S7 be the set in the previous question. We claim S7 ≈ C, where C is the standard Cantor Set. By ≈ we mean there is a continuous map f : S7 → C that is 1-1 and onto, and f −1 is also continuous. (Note: do we need to assume f is continuous, or does this follow from the other assumptions?) For S7 , note that if we construct the set along the lines of the Cantor Set 9 1 construction, then the first level has length 10 (we lose 10 ); the next level will have 9 length 10 ¡ 9of ¢nthe first level, and so on. Thus, after n cuts, we will have something of size 10 , which shows it’s length tends to zero. Definition 3.1.8 (A Cantor Set). Anything that is homeomorphic to the Cantor Set is called a Cantor Set. Example 3.1.9. Sierpinski’s Carpet: Consider a square. It splits into 9 equal squares. Remove the middle. We now have 8 squares. Remove the middle squares from each of the 8 remaining, and so on. What we have here is basically C × C, where C is the Cantor Set. Thus, as each C ≈ {0, 1}N , what we have is like {0, 1}N × {0, 1}N ≈ {0, 1, 2, 3}N . Of course, we need to be careful in the above arguments, as I × I 6≈ I. One way to see this easily is topologically. Remove one point from I, call this I 0 . This will look topologically different than removing a point from I × I (one is connected, one isn’t). Question 3.1.10. Similarly, for Q finite n we have C × · · · × C = C n ≈ C. What about an infinite product? C ≈ ∞ i=1 Ci , where each Ci is the Cantor Set.

19

3.1.2

Smooth Dynamics and Cantor: Logistic Family

Cantor Sets appear naturally when one encodes numbers (sequences of binary expansions, ternary expansions, and so on). Occur also in smooth dynamics. Dynamics is a set and a map that we iterate. Smooth dynamics will be a nice, smooth subset of Rn and a diffeomorphism which we iterate. Consider a map fλ (x) = λx(1 − x). This is called the logistic family. This occurs naturally in mathematical biology: two types of fish, one eats the other, what are the populations as time passes. Fix the parameter for a moment, say λ = 5, and iterate the map. See which points stay in [0, 1]. In the first iteration, only points in the middle leave [0, 1]. Continuing this process, looking for what stays, we will find a Cantor Set. Denote such sets by Cλ . Exercise 3.1.11. Show the above Cλ are homeomorphic to the Cantor Set. Exercise 3.1.12. For λ ≥ 4, what is the measure of the Cλ s? Hint: it is always of zero measure! It is easier to first try λ ≥ 10.

3.1.3 Measure Zero in R2 Consider one-dimensional Lebsegue measure. We defined what it means to be of zero measure, λ1 (A) = 0. What would it mean to be of zero measure in R2 . How does an interval (in R) generalize in R2 ? Could use squares, rectangles, disks, even triangles! Definition 3.1.13 (Measure Zero). A set A ⊂ R2 has measure zero if for all ² > 0 there exists a finite or countable family of rectangles Rn = (an , bn )×(cn , dn ) such that 1. A = ∪n Rn ; P P 2. n Rn = n |an − bn | · |cn − dn | ≤ ². Exercise 3.1.14. Show that if we used disks instead of squares we would have the same sets having measure zero. Exercise 3.1.15. Show the Sierpinski Carpet has measure zero in R2 .

20

Exercise 3.1.16. Must any nice curve in the plane has measure zero in the plane? Let φ, ψ be continuous functions on [0, 1]. Consider the curve (φ(t), ψ(t)). If both functions are continuously differentiable, then the curve has measure zero in the plane. What if it is just differentiable, but not continuously differentiable? Is being Lipshitz enough?

3.2

Measure Preserving Maps

Definition 3.2.1 (Measure Preserving). A map f : [0, 1] → [0, 1] (or R → R) is measure preserving if for all (a, b) ⊂ [0, 1] (or a subset of R), then |f −1 (a, b)| = |a − b|. Here |f −1 (a, b)| denotes the length of the pre-image of (a, b). We could also consider the torus, where we identify 0 and 1. Example 3.2.2. Consider x → x + α mod 1. Is this measure preserving? If we look at it on the square, it is clear what the pre-images are. If we look at it as the torus, we are rigidly rotating by α. This is a translation of an Abelian group – does this preserve measure, and if so, what measure? Answer: Haar measure. Example 3.2.3. Consider x → 17x mod 1 or x → 2x mod 1? These are all measure preserving, but these are 17 to 1 or 2 to 1. It is important to take preimages, as the pre-image (for 17) has 17 pieces, each piece 171th the size of the original. Example 3.2.4. Consider x → (2x + 1) mod 1, y → x + y mod 1.

(3.1)

Is this measure ¶preserving? We can regard this by using a matrix action with µ 2 1 , which has determinant 1. Does this map preserve operations matrix 1 1 on a torus? If this map is φ, we have φ(a + b) = φ(a) + φ(b) mod 1. Example 3.2.5. Consider the P Hilbert Space l2 , the space of all infinite sequences (a1 , a2 , . . . ) with ai ∈ R and i a2i < ∞. We define addition of two elements by pointwise addition. There is no invariant measure (that is not identically zero). Consider x → 2x mod 1. The point x =

1 2

quickly µ becomes ¶ trivial as we 2 1 iterate (and periodic, of course). For the map from , can we find a 1 1 periodic point? If the point has rational coordinates, must it have a periodic orbit? 21

µ

2 1 Exercise 3.2.6. Describe all periodic points arising from 1 1 We will find periodic and non-periodic points here are dense.

¶ . Generalize!

Definition 3.2.7 (Ergodic). A dynamical system is ergodic if it has a dense set of periodic points. µ ¶ 2 1+² What if we ²-perturb the map to . The map is no longer measure 1 1 preserving, and it is no longer continuous. It is still piecewise continuous. Consider the map x → x − x1 , which maps R → R. Boole (from Boolian Algebras)was interested in the following formula: ¶ Z µ Z 1 dx = f (x)dx. (3.2) f x− x R R Take any interval, see what the pre-images are. This map is measure preserving?

3.3

Measure Zero in {0, 1}N

Equivalently, think of measure zero on the Cantor Set – how would we define measure zero? From Lebesgue point of view, the Cantor Set has zero measure, so any subset has measure zero (a la Lebesgue). It would be great to be able to put a measure on the Cantor Set, so we can have a more refined measure of subsets. Consider C0 = {0, x1 , x2 , . . . } and C1 = {1, x1 , x2 , . . . }. It is natural to suppose each set has measure 21 . If we consider the first five entries, there are 25 possibilities. And so on. Let us call such sets cylinders. These are clopen sets (both open and closed: if C1 is open, C2 is closed, and vice versa).

3.4

Another Problem

Consider the unit interval I and the unit square I × I. We can identify I with {0, 1}N . Under such an identification, the unit square becomes {0, 1}N × {0, 1}N , which we know can be identified with {0, 1}N . From a symbolic point of view, these two sets are now the same (might need to choose the terminating or nonterminating expansion for certain numbers). So, what did we neglect when we encoded? Dynamics in the two cases might be the same when we deal with the core of the two sets. 22

Exercise 3.4.1. Construct a measure preserving (pre-image of interval has the same length or area) between two different objects. We want a measure preserving map from the square to the interval. If you succeed, not only are topological dynamics the same between the two spaces, but now measurable dynamics would also be the same. In measure preserving ergodic theory, we may think of our system as the interval [0, 1] (if we can show a measure preserving map from the square to the unit interval, should be able to find a similar one from a cube to a square.

3.5 Minimal System Definition 3.5.1 (Minimal System). Assume that X is a compact metric space and T : X → X is continuous. We do not assume T is invertible (ie, inverse continuous), though it often helps to think of T invertible. The pair (X, T ) is called a minimal system if for all x ∈ X, the orbit closure (T n x) = X. Example 3.5.2. Consider three points a → b → c, and c → a. This is a minimal system. Also can modify to having n points. Example 3.5.3. Consider rotation on the unit circle by an irrational angle α 6∈ Q (or not in 2πQ, depending on how we have normalized the circle). Consider two sided sequences in {0, 1}Z , say a = (ai )i∈Z . Consider the map σ : (ai ) → (ai+1 ) (shifting by one digit). What is (σ n a)n∈Z ⊂ {0, 1}. If we just take some rich sequence (that has all finite configurations).... Consider instead x = . . . 010101010101 . . . . This has an orbital closure of just two points. What is a criteria such that an x has its orbit minimal. Exercise 3.5.4. Give a criterion for a ∈ {0, 1}Z such that (σ n a)n∈Z is minimal. Exercise 3.5.5. Can you put in this space finitely many disjoint minimal closures? Countable many? Uncountable many? Remark 3.5.6. To put in uncountably many Cantor Sets in the unit interval, do it on the plane, and use the equivalence.

23

Chapter 4 Fourth Lecture: Jan 27th Lecturer: Bjorn

4.1

Horseshoes

S. Smale, 1967 (approx). He considers a hyperbolic map with complicated dynamics. In the 60s, when dynamics started, people were interested in what happens with small perturbations, what kind of assumptions one needs on the maps, what is the structure of dynamics (what persists under small perturbation). This map exhibits very complicated dynamics that exist under small perturbations (one of the first maps to have this property). Start with a square Q. Stretch it vertically and compress horizontally. Let the sides be ABCD, like DC AB

(4.1)

Now we have a long, thin rectangle. Now bend into a large horseshoe, ∩, with the ends A0 B 0 on the first piece, and C 0 D0 on the second arm. Put this back inside the original square. Let λ be the compression factor going from the square to the rectangle. Similarly, let µ be the expansion factor in going from the square to the thin rectangle. Let 0 < λ < 1 < µ. Consider a small region in the middle of the the thin rod; we bend that to make the horseshoe – everything else is linear (expand / contract). 24

We want to iterate this map (forward, or maybe also backwards). Very important: since the horseshoe is inside the square ABCD, some points are not available to apply f again. For example, part of it will look like

/ / / | | | | | | | | 0 A − B0

D A

\ \ \ | | | | | | | | 0 C − D0

C B (4.2)

Here, all these points are inside the square; but the bend of the horseshoe, or the end of the two arms in ∩ will be outside the square. Definition 4.1.1 (Maximal Invariant Set). The maximal invariant set is the set of all points p ∈ Q such that f n (p) ∈ Q for all n ∈ Z: S = {p ∈ Q : ∀n ∈ Z, f n (p) ∈ Q}. Question 4.1.2. What does S look like?

4.2

Pre-images f −1 (Q) = {p ∈ Q : f (p) ∈ Q}.

(4.3)

Thus, we need the points in the square which give the two parts of ∩ inside the square. Call these pieces v0 and v1 (v0 is the left part of ∩). Each of these comes from a horizontal piece in the unit square, thus we have two connected components, ui = f −1 (vi ), and f −1 (Q) = u1 ∪ u2 . Say we want f −2 (Q). We need the pre-images of the strips u0 and u1 ; in other words, f −2 (Q) = f −1 (u0 ) ∪ f −1 (u1 ). First, we need to intersect the horseshoe. As the ui are two horizontal lines, we end up with four pieces. Now we do the same procedure as before. What we 25

have is four pieces on the unfolded, thin vertical strip. Rectangles on the vertical strip come from horizontal strips in the square. Thus, if instead of four boxes we had two boxes (as before), we’d have two horizontal strips. Now, we have four boxes – we end up with four horizontal strips. These are in the strips from before, with some of the middle removed. The thickness of these horizontal strips is like 1 . Call these pieces u10 , u11 , u01 , u00 . µ We took something that was connected, and to get f −1 (Q) we had two connected pieces; to get f −2 (Q), we have four connected pieces. As we continue, each horizontal strip is split in two. By induction, one can show that ua0 ,a1 ,...,an = ua0 ∩ f −1 (ua1 ) ∩ f −2 (ua2 ) · · · ∩ f −n (uan ), aj ∈ {0, 1}.

(4.4)

We have 2n+1 sets, horizontal strips of height approximately µ1n . For a specific label (as n → ∞), we get a horizontal line. Thus, there is an equivalence between these horizontal lines and the base two expansion of numbers (modulo writing certain numbers two ways), or a Cantor Set. Choose a sequence (an )n≥0 with each an ∈ {0, 1}. We have U(an )n≥0

=

\

f −1 (uan )

n≥0

= =

{p ∈ Q : ∀n ≥ 0, f n (p) ∈ uQn } a horizontal line.

(4.5)

Thus, the union over all sequences gives us a Cantor Set crossed with a horizontal line.

4.3 Backwards Time We just did n ≥ 0, the future. Now we want to look at the past. We will now get vertical strips, basically Cantor cross Vertical Sets. If we want to prescribe both past and future, it would look like vertical lines intersecting horizontal lines. These will be unique points of intersection. We now have Q ∈ {0, 1}Z , (Qn )n∈Z , Qn ∈ {0, 1}. We have a map τ Q 7→ τ (Q) ∈ Q, 26

(4.6)

where τ (Q) is the unique point p ∈ Q with f n (p) ∈ uQn for all n ∈ Z. Thus, τ maps {0, 1}Z into the maximal invariant set. A little work shows that we have found all the points in the maximal invariant set. Thus, τ is onto, one-to-one. Is it continuous? If two sequences are close, they will agree for the first n symbols forward and backward. Consider ua0 a1 ...an ; since a and b agree for the first n symbols (n + 1?), this equals ub0 b1 ...bn ; this gives a 1 ; similarly, look at negative times, and get a vertical strip of strip of thickness µn+1 1 thickness µn+1 . Thus, the map is continuous. To be safe, let us see if the inverse is continuous. Since we have a compact metric space, the inverse is automatically continuous (a 1-1 onto map from a compact metric space to a metric space has a continuous inverse). Thus, as τ −1 is continuous, the maximal invariant set is actually a Cantor Set, and is parametrized by {0, 1}Z . What do the dynamics on the maximal invariant set look like?

4.4 Dynamics on the Maximal Invariant Set On the space S (the maximal invariant set, equivalent to {0, 1}Z ), we have the shift operator σ. If a = (an )n∈Z , then σa = (an+1 )n∈Z . Are there periodic orbits? Yes. To get a periodic orbit of period 2, just look for any sequence periodic of period 2, and so on. We have complicated dynamics. Check and see that the shift operator commutes with f : {0, 1}Z ↓τ S

−→σ −→f

{0, 1}Z ↓τ S.

(4.7)

The above is a commuting diagram; we have f (τ (a)) = τ (σ(a)).

(4.8)

Writing everything carefully will lead to the above. Initially we had a very complicated set which we didn’t know much about. Now we know exactly what it is doing – action of shifts on sequences. Anything we want to know about S can be gleaned from studying shifts on {0, 1}Z . If we make a small perturbation in a C 1 way (change f to a new function fe which is close to f , and whose derivative is 27

close to f 0 ), then the new set S 0 will be close to S. We need to control contractions, which is why we need information on the derivatives.

28

Chapter 5 Fifth Lecture: February 3rd, 2004: Period 3 Implies Chaos 5.1

Preliminary Lemmas

We will prove on part of the theorem of Li and Yorke from their paper Period Three Implies Chaos, namely, that under a few fairly weak conditions, for any n ∈ N , there exists a periodic point (depending on n) that has period n. As it turns out, this is a special case of a more general theorem of Sharkovsky, and this too will be mentioned. Also, as an example of a function having a periodic point of period 5 but not of period 3 will be given. Theorem 5.1.1 (Li and York). Let J be an interval and let F : J → J be continuous. Assume there exists a point a ∈ J for which the points b = F (a), c = F 2 (a) and d = F 3 (a) satisfy d ≤ a < b < c or d ≥ a > b > c,

(5.1)

then for every k = 1, 2, . . . there is a periodic point in J having period k. Lemma 5.1.2. Let G : I → R be continuous, where I is an interval. For any compact interval I1 ⊂ G(I) there is a compact interval Q ⊂ I such that G(Q) = I. Proof. Let I1 = [G(p), G(q)], p, q ∈ I, suppose p < q. Let r ∈ I be such that G(r) = G(p) for all t ∈ I such that G(p) = G(t), t ≤ r. Let s ∈ I be such that G(s) = G(q) for all u ∈ I such that r ≤ u, G(u) = G(q), s ≤ u. Let Q = [r, s], then G(Q) = I1 . 29

Lemma 5.1.3. Let F : J → J be continuous and let {In }n≥0 be a sequence of compact intervals with In ⊂ J and In+1 ⊂ F (In ), and In+1 ⊂ F (In ) for all n. Then there exists a sequence of compact intervals Qn such that Qn+1 ⊂ Qn ⊂ I0 and F n (Qn ) = In for all n ≥ 0. For any x ∈ Q = ∩Qn we have Fn (x) ∈ In for all n. Proof. Set Q0 = I0 ; this proves the basis case. We proceed by induction. Assume Qn−1 is already defined. Then F n−1 (Qn−1 ) = In−1 , In ⊂ F (In−1 ) = F n (Qn−1 ). Define G : Qn−1 → R by G = F n . By the previous lemma, there is a compact interval Qn ⊂ Qn−1 such that F n (Qn ) = G(Qn ) = In . Lemma 5.1.4. Let G : J → R be continuous. Let I ⊂ J be a compact interval. Then there is a point p ∈ I such that G(p) = p. Proof. Assume I = [β0 , β1 ]. Let β0 = G(α0 ) and β1 = G(α1 ). As α0 ∈ I, we have α0 ≥ β0 = G(α0 ). This implies α0 − G(α0 ) ≥ 0. Similarly, α1 ≤ β1 = G(α1 ). Thus, α1 − G(α1 ) ≤ 0. By the intermediate value theorem, there exists a p ∈ I such that p − G(p) = 0, or G(p) = p.

5.2

Proof of the Theorem of Li and Yorke

Assume F 3 (a) = d



a


c is similar. We have a cycle x → F (x) → F 2 (x) → x → F (x) · · · . We have x < F 2 (x) < F (x) or x < F (x) < F 2 (x). Define the following sequence of intervals. Fix an integer k ≥ 1. For k > 1, let In Ik−1 In+k

= = =

L, n ∈ {0, 1, . . . , k − 2} K In ∀n > k − 1,

(5.3)

where K = [a, b] and L = [b, c].

(5.4)

For k = 1, take In = L for all n. We want to apply the previous lemmas; thus, we need to make sure the hypotheses are satisfied. Look at the square with d < a < b < c. Since F is 30

continuous, F (L) ⊃ K, L and F (K) ⊃ L. Hence {In }n≥0 satisfy In ⊂ F (In−1 ) for all n. Let Qn be as in the previous lemmas. Then F n (Qn ) = In for all n, Q0 = I0 = L, Qn+1 ⊂ Qn ⊂ Q0 = I0 for all n. In particular, F k (Qk ) = Ik = L = Q0 ⊃ Qk .

(5.5)

Thus the previous lemmas implies (since Fk is continuous, intervals compact) that there exists a pk ∈ Qk ⊂ L such that F k (pk ) = pk . We claim that pk point is not fixed by anything less than k; this will imply pk is periodic of period exactly k: Claim 5.2.1. F k−1 (pk ) 6= pk for all i ∈ {1, . . . , k − 1}. Proof.

( F k−i (Qk−i ) = Ik−i =

L K

if i < k − 1 if i = k − 1

(5.6)

Thus, pk ∈ Qk ⊂ Qk−i ⊂ L, i ∈ {1, . . . , k − 1}.

(5.7)

pk , F (pk ), F 2 (pk ), . . . , F k−2 (pk ) ∈ L.

(5.8)

Therefore, We have F k−1 (pq ) ∈ K and F k (pk ) ∈ L. If the period of pk is less than k, one of the earlier iterates would be in K; ie, F k−i (pk ) ∈ K for some i < k − 1. This implies F k−1 (pk ) = b = K ∩ L. This implies that F k (pk ) = F (b) = c.

(5.9)

F k+1 (pk ) = d < c.

(5.10)

Therefore, Now, d ∈ L, but clearly this point isn’t in L as L = [b, c]; contradiction. Hence pk has period exactly k.

5.3

General Result

Let f : I → R be a continuous function. Define an order on the natural numbers by 3 ¤ 5 ¤ 7 ¤ · · · . Then the next round is ¤2 · 3 ¤ 2 · 5 ¤ · · · . Then ¤22 · 3 ¤ 22 · 3 ¤ 22 · 5 ¤ · · · , and so on. If f has a point of period n ¤ k, then f has a point of order period k. 31

We won’t prove this, but we will describe how the proof goes. See Dynamical Systems by Clark Robinson (pages 66 to 70). We have an interval I. Within it, let x ∈ I be a period n point; we will assume n is odd. Define the interval J = [min{O(x)}, max{O(x)}],

(5.11)

where {O(x)} is the orbit of x. Thus, we have an ordering on the real-axis like f n−1 (x) < · · · < f 4 (x) < f 2 (x) < x < f (x) < f 3 (x) < · · · < f n−2 (x). (5.12) Why is f 2 (x) to the right of x? We choose a point in the orbit (from the set of n points, the iterates) with this property. Thus, we have a sequence of intervals I1 − [x, f (x)], I2 = [f 2 (x), x], I3 = [f (x), f 3 (x)], and so on. By I1 → I2 we mean that I2 ⊂ f (I1 ) (and so on). We have I1 covers itself, and then I1 → I2 → I3 , and so on, to In−2 → In−1 → I1 . We also have In−1 → In−2 , . . . , In−1 → I5 , In−1 → I3 . So suppose we have a setup like I1 → I2 → · · · → Ik → Im → I1 , such that no shorter loop works. Then there exists an x0 with period m. Suppose now want a period smaller than n and even. For period 2, use a loop starting at In−2 . For period n − 1, go from In−1 to I3 and then continue along. To get larger, start at I1 , go around to In−1 , then to I1 , then go through I1 as many times as needed. n For other numbers, we can look at f 2 or more generally f 2 instead of n.

5.4

Handout from Bruce

Lemma 5.4.1. Let F :J→J be continuous and let {In }∞ n=0 be a sequence of compact intervals with In ⊆ J and In+1 ⊆ F (In ), for all n. Then there is a sequence of compact intervals Qn such that Qn+1 ⊆ Qn ⊆ I0 and F n (Qn ) = In for n ≥ 0. n ∀x ∈ Q = ∩∞ n=0 Qn we have F (x) ∈ In for all n. Proof. Set Q0 = I0 , so that F 0 (Q0 ) = I0 . Now assume that Qn−1 has been defined, with F n−1 (Qn−1 ) = In=1 . Therefore, In ⊆ F (In−1 ) = F n (Qn−1 ). Define G:Qn → R by G = F n , and so by Lemma 0 there is a compact interval Qn ⊆ Qn−1 such that F n (Qn ) = In . Lemma 5.4.2. Let G:J→R be continuous. Let I ⊆ J be a compact interval. Assume I ⊆ G(I). Then there is a point p ∈ I such that G(p) = p. 32

Proof. Assume I = [β0 , β1 ]. Pick α0 , α1 ∈ I, such that G(α0 ) = β0 and G(α1 ) = β1 . Thusly, α0 − G(α0 ) ≥ 0, and α1 − G(α1 ) ≤ 0. So by the Intermediate Value Theorem, G(β) − β must be 0 for some β ∈ I. Example: Period 5 does not imply period 3 Define F :[1, 5]→[1, 5] be defined such that F (1) = 3, F (2) = 5, F (3) = 4, F (4) = 2, and F (5) = 1, and on each interval [n, n + 1] assume F is linear, n ∈ {1, 2, 3, 4}. As F 0 (1) = 1, F 1 (1) = 3, F 2 (1) = 4, F 3 (1) = 2, F 4 (1) = 5, and F 5 (1) = 1, we have period exactly 5 (ie, no smaller number works for the period of 1). r r r r r

We show there cannot be any points of period 3. We have F 3 ([1, 2]) = F 2 ([3, 5]) = F 1 ([1, 4]) = [2, 5],

(5.13)

and so there are no fixed points on [1, 2]. Similarly, F 3 ([2, 3]) = F 2 ([4, 5]) = F ([1, 2]) = [3, 5]

(5.14)

F 3 ([4, 5]) = F 2 ([1, 2]) = F ([3, 5]) = [1, 4],

(5.15)

and so there are no fixed points in these intervals. However, F 3 ([3, 4]) = F 2 ([2, 4]) = F ([2, 5]) = [1, 5],

(5.16)

and as [3, 4] ⊆ [1, 5] there must be a fixed point on [3, 4]. To see that we do not in fact have a point of period three, we will show that there is a single fixed point p, and it is also a fixed point of F (a no-no, as a fixed point is not a point of period 3). 33

Let p ∈ [3, 4] be a fixed point of F 3 . We will first show that F (p) and F 2 (p) are also in [3, 4]. Note that F (p) ∈ [2, 4]. If F (p) ∈ [2, 3] then F 3 (p) ∈ [1, 2], thus defeating the definition of “fixed point”. Ergo F (p) ∈ [3, 4]. From that, we know F 2 (p) ∈ [2, 4]. If F 2 (p) ∈ [2, 3], then F 3 (p) ∈ [4, 5], which again makes no sense. We defined F linearly on [3, 4], and in fact restricted to [3, 4] we see that F (x) = 10 − 2x. Looking at F 3 (x) restricted to the set of all fixed points, we get F 3 (x) = 30 − 8x. Solving for the fixed point here we get a unique fixed point of 10 . However, when we look at F itself we get a unique fixed point of 10 also. 3 3 Indubitably, we cannot have period three.

34

Chapter 6 Sixth Lecture: February 10th, 2004 Lecturer: Sasha Leibman

6.1

Subshifts and Languages

Let A Ω

= =

{a1 , . . . , ar } AN = {x : x = x1 x2 x3 . . . , xi ∈ A}.

(6.1)

Let

1 , k smallest integer such that xk 6= yk . (6.2) k For example, let A = {a, b, c}. Consider three circles xa , xb , xc . Inside each have three more circles (in xa would have xaa , xab , xac , and so on), and we have maps from circles to circles. Let dist(x, y) =

T (x1 x2 x3 . . . ) = x2 x3 . . .

(6.3)

This is a shift operator. Definition 6.1.1 (Subshift). A subshift is a closed T -invariant subset X of Ω: T (X) ⊂ X, or T |X : X → X. Let A+ = {finite words over A}.

35

(6.4)

For w ∈ A+ , let w = x1 x2 . . . xk ; we say the length of w is k, denoted |w| = k. For x ∈ Ω, let L(x) = {all subwords of x} ⊂ A+ . (6.5) Definition 6.1.2 (Language of X). For a subshift X, denote the Language of X by [ L(X) = L(x). (6.6) x∈X

Claim 6.1.3. X is determined by its language, L(X). Proof. Thus, x ∈ X if and only if L(x) ⊂ L(X). One direction is trivially: clearly if x ∈ X, then L(x) ⊂ L(X). For the other direction, assume L(x) ⊂ L(X). Let x = x1 x2 . . . xi . . . . For w ∈ L(x), there exists a y ∈ X such that w ⊂ y (by w ⊂ y we mean w is a subword of y). Thus, if w = x1 . . . xi , we can write y = y1 . . . yk x1 . . . xi yk+i+1 . . . . Then T k y = x1 . . . xk yk+i+1 . . . and 1 dist(x, T k y) ≤ i+1 . As X is a subshift, T k y ∈ X, so x ∈ X.

6.2

Generating Subshifts

Let x ∈ X, let Orb(x) = {T n x}∞ n=0 , X = Orb(x).

(6.7)

We have L(X) = L(x) in this case. Thus, y ∈ X if and only if L(y) ⊂ L(x). Corollary 6.2.1. If X is a subshift, and x ∈ X, then Orb(x) is dense in X if and only if L(x) = L(X). Let x = abaabaaabaaaab . . .

(6.8)

What is X = Orb(x)? Clearly, the sequence of all as is in it, all as and one b, and the shifts of the original sequence. If there are two bs, we must be a shift of the original x. Thus, X = Orb(x) equals Orb(x) union all sequences with exactly one b.

36

6.3

Minimal Subsets

Let X be a topological space that is a compact space as well, with operator T as before. Consider X1 = Orb(x) for x ∈ X. Then X1 is a closed, T -invariant subset. It may be the case that X1 contains a smaller closed T -invariant subspace X2 , and perhaps this process continues infinitely: · · · X3 ⊂ X2 ⊂ X1 ⊂ X.

(6.9)

As the sets are compact, ∩Xi 6 φ. There thus exists a minimal closed T -invariant subset Z ⊂ X. It is easier to deal with minimal sets, as they have good properties. If X is minimal (no proper closed T -invariant subsets), then for any x ∈ X, Orb(x) = X. In our situation, how can we determine if its subshift is minimal? We can look at its language. Lemma 6.3.1. X is minimal if for any word w ∈ L(X), there exists k such that for all u ∈ L(X) with |u| ≥ k, w ⊂ u; ie, if we fix a word w, any long word u contains w somewhere. Alternatively, if x ∈ X, then the word w occurs in x with bounded gaps between occurrences. Proof. Call the above condition (∗). If (∗) holds, let w ∈ L(X). Then for all x ∈ X, w ∈ L(x). So, L(x) = L(X). If (∗) doesn’t hold, then there exists w ∈ L(X) such that ∃u1 , u2 , u3 , · · · ∈ L(X) with |un | → ∞ such that w 6⊂ un for all n. We use a diagonal process: u1 u2 u3

... ... ...

(6.10)

and so on. Choose a subsequence unk → x such that w 6∈ L(x)

6.4

Substitutions

We would like a method to produce in a natural way minimal subshifts. We can try to construct a language satisfying this property, but the construction will not be natural; there is, however, a natural way to construct. Definition 6.4.1 (Substitutions). A substitution is a mapping S : A → A+ such that s(a) = wa . 37

For example, if A = {a, b, c}, we consider maps a b c

7→ 7→ 7→

ab caa bcb.

(6.11)

We can get a map S : A+ → A+ by S(x1 , . . . , xk ) = wx1 wx2 . . . wxk .

(6.12)

This map interacts nicely with distance: the image of two words, under the substitution map, are at least as far apart as the initial two words. Assume for some a ∈ A, wa > 1 (otherwise the substitution map is just a permutation). Consider a, S(a), S 2 (a), S 3 (a), . . .

(6.13)

This gives a language L. It is enough to consider irreducible substitutions: Definition 6.4.2 (Irreducible Substitutions). S is an irreducible substitution if for all a, b ∈ A, there exists an n such that b ∈ S n (a). If S is irreducible, then the language L doesn’t depend on a (the choice of the first letter). Assume: There exists an a ∈ A such that S(a) = wa = aai1 . . . aik ; in other words, that there is a letter such that wa starts with a. Consider a letter b, say we have a substitution such that b →S c . . . →S d · · · →S · · ·

(6.14)

As we have only finitely many letters, eventually get some letter twice (say S k is the first that has one letter occurring twice as starting letter). Then we study S k . So, we assume S is an irreducible substitution and there exists an a such that S(a) = aw0 , with w0 a finite word. Then there is a language L that gives us X (substitution subshift). Claim 6.4.3. S has a fixed point in X. Proof. We have operators T : X → X (shift) and S : X → X (substitution). Consider a, we have the chain a, aw, S(aw) = S(a)S(w) = awS(w), then S(S(aw)) = awS(w)S 2 (w). The sequence stabilizes, and we have a fixed point p = awS(w)S 2 (w)S 3 (w) . . . ∈ X. Applying S we see S(p) = p. 38

(6.15)

6.5

Example: Morse Substitution

Example 6.5.1 (Morse Substitution). Let S(a) = ab, S(b) = ba. Then we have a, ab, abba, abbabaab, and so on. Let the limit point be p. Claim 6.5.2. From the definition of our language, X = Orb(p). We claim X is minimal. Proof. We need to show any word occurs uniformly. If a ∈ S k (b) for b ∈ A, then a ∈ S(a) ⊂ S k+1 (b). So, there exists k such that a ∈ S k (b) for all b ∈ A. Consider p = S k (p) = S k (p1 )S k (p2 ) . . . , pi ∈ A, (6.16) where p = p1 p2 p3 . . . and in each S k (pi ), we have a ∈ S k (pi ). Thus, a occurs in S k (p) with bounded gaps (syndetically). Take any w ∈ L(X). Then w ⊂ S m (a) for some m, and we have p = S m (p) = S m (p1 )S m (p2 ) . . .

(6.17)

Therefore, w also occurs with bounded gaps. Claim 6.5.3. X is uniquely ergodic. Proof. For all w ∈ L(X), there exists a dw ∈ [0, 1] such that #(occurrences of w in u) → dw as u ∈ L(X), |u| → ∞. |u|

(6.18)

Can we produce all examples (minimal ergodic) this way? The entropy 1 1 log (#{w ∈ L(X) : |w| = n}) = lim log pn (X). n→∞ n n→∞ n (6.19) For all substitution subshifts, there exists a c such that pn (X) < cn; thus, h(X, T ) = 0. h(X, T ) = lim

Example 6.5.4. Consider the Morse substitution a 7→ ab,

b 7→ ba.

(6.20)

This has fixed point p = abbabaabbaababba . . . How does S act on X? 39

(6.21)

Consider a word w. We have S(w) = w0 , where w0 is a word starting in an even position of p (we assume the first position of p is labeled as the zeroth). We have S(X) = X0 , the even points (all start at an even position). Then T (X0 ) = X1 is the odd points, and T (X1 ) = X0 . Here, X1 6= X0 – if we know the first four points, can determine which: even sequences have ab or ba in the first four; odd sequences have aa or bb. In general, compact minimal system X with map T . Consider action of T 2 . There are two possibilities. Either T 2 is minimal on X or X = X0 ∪ X1 such that T (X0 ) = X1 and T (X1 ) = X0 . Here, X0 , X1 are minimal subsets for T 2 . Consider T 2 : X0 → X0 . Replace ab 7→ c and ba 7→ d. Then we can write p = cddcdccddccdcddc . . .

(6.22)

It is the same sequence, the action of T 2 is the same as the action of T on the initial space. We have a commutative diagram X →T ↓S X0 →T 2 ↓S X00 →T 4

X ↓S X0 ↓S X00

(6.23)

and so on, where X0 ↓T X1

=

X00 ∪ X01

=

X10 ∪ X11 .

(6.24)

What is X0 ∩ X00 ∩ X000 ∩ · · · = S(x) ∩ S 2 (x) ∩ S 3 (x) · · ·

(6.25)

This equals {fixed points of S}, there are two fixed points, p and q = baababba . . .

(6.26)

We have X → Y , with two points in X fibered over one point in Y . On Y , S acts as a right shift: y1 y2 y3 . . . 7→ 0y1 y2 y3 . . . (6.27) 40

Considering y as a two-adic number (sequence of 0s and 1s), then T adds 1 to the sequence. For example, if we had 1110101 . . . , we would now have 0001101 . . . (carrying things to the right). We have X = Y × {a, b} (words starting with a, words starting with b). We can write any point in X as a pair x = (y, e) with e ∈ {a, b}. The action of T is T (y, e) = (T y, e + φ(y)),

(6.28)

where we are assuming a + 1 = b and b + 1 = a, φ : Y → {0, 1} is measurable.

41

Chapter 7 Seventh Lecture: February 17th, 2004: An Introduction to Perron-Frobenius Theory Lecturer Ronnie Pavlov

7.1

Shift Spaces and Types

Definition 7.1.1 (Shift Space). A shift space over A is a closed, shift-invariant set in AN . Example 7.1.2 (Golden Mean Shift). The golden mean shift space is the set of all 0 − 1 sequences without two consecutive 1s. This is an example of a shift of finite type (generalization of this idea. Given a set of finite words, we define our space to be all sequences which don’t include any of the forbidden words anywhere). Here, the forbidden word is 11; in general, we specify the forbidden words to describe a shift of finite type. It’s possible that many different words induce the same space. Consider instead the set of forbidden words F = {110, 111}. A little work shows this is the same as the golden mean shift. For any finite type shift, look at all possible sets of forbidden words. For each, look at the length of the largest word, and take the minimum over all sets of forbidden words that generate the same finite type shift. We call this number the

42

type of the finite type shift. For example, for the golden mean shift, we see it is of type 2. For a shift of type 2, for every letter we define what letters may follow it – we’ll primarily talk about shifts of type 2 today.

7.2 Edge Shifts Definition 7.2.1 (Edge Shift). Take a directed graph (set of vertices and edges, each edge has a direction from one vertex to another). The edge shift is the set of all words starting at some vertex and following the graph to a legal vertex. For example, the following is the golden mean shift: −→0 ·

· ª0 ←−1

(7.1)

Thus, we can get from one vertex to another, but will never have two ones in a row. Definition 7.2.2 (Topologically Conjugate). Two topological systems (X, T ) and (Y, S) (the first is the space, the second is a continuous self map, T : X → X and S : Y → Y ) are said to be topologically conjugate conjugate if there exists a φ : X → Y homomorphism onto such that the following diagram commutes X ↓φ Y

−→T −→S

X ↓φ Y

(7.2)

Claim 7.2.3. Any shift of finite type is conjugate to a Markov shift. Let F = {111}. It induces XF of finite type. Look at the three letters words

43

in XF , which we will label 000 001 010 011 100 101 110

= = = = = =

a b c d e f g.

(7.3)

Given word 000110010100 . . . , we don’t break into blocks of length 3, but rather do rolling. Thus, we get the word abdgebcf . . . . If we shift the first 000110010100 . . . , its the same as shifting abdgebcf . . . . Thus, we see it is sufficient to look at words of two symbols / shifts of type two.

7.3 Complexity Definition 7.3.1 (Complexity). The complexity of a shift space X is the number of n-letter words in L(X), the language of X. We denote the complexity by {cn (x)}n∈N . Definition 7.3.2 (Entropy). The entropy of a shift space X is log cn (X) . n→∞ n

h(X) = lim

(7.4)

This is an invariant of topological dynamical systems (ie, two topologically conjugate spaces have the same entropy). This measures the rate of growth of a dynamical system (measures chaosisity). We will discuss other notions of entropy later (measure theoretic). Suppose X is on some alphabet with k letters {1, . . . , k}. Then 1 ≤ cn (X) ≤ n k . Therefore, the entropy is in [0, log k]. Given a Markov shift (X a shift of finite type 2) on an alphabet A = {1, 2, . . . , n}, define its transition matrix B as a 0 − 1 matrix such that ( 1 if ij ∈ L(X) bij = (7.5) 0 otherwise 44

For the golden mean shift, we have B =

µ

1 1 1 0

¶ ,

(7.6)

which we recognize from studying Fibonacci numbers. Let bnij denote the entry in the ith row and j th column of B n . This equals the number of ways to get from i to j in n steps, or explicitly the number of n + 1 words in the langauge L(X) that begin with i and end with j. We find cn+1 (X) =

k X

bnij .

(7.7)

i,j=1

7.4

Irreducibility and Aperiodicity

Definition 7.4.1 (Irreducible). A matrix B ≥ 0 (non-negative) if for all i, j, there exists an n (which may depend on i, j) such that bnij > 0. This means that, starting from some letter i, we can get to some letter j. Consider the following transition matrix   0 1 0 0  1 0 0 0    (7.8)  0 0 0 1  0 0 1 0 A little work by looking at powers shows this is not irreducible. Definition 7.4.2 (Aperiodic). A transition matrix B ≥ 0 is aperiodic if for all i, gcd{n : bnii > 0} = 1. Claim 7.4.3. If B is aperiodic AND irreducible, then there exists an n such that bnij > 0 for all i, j in the alphabet and for all N > n. We don’t lose much by dealing with aperiodic and irreducible matrices. Why can we assume this? Suppose we are given a matrix which isn’t irreducible, for example,   1 1 1 0 1  1 1 0 1 1     0 0 1 1 0  (7.9)    0 0 1 1 0  1 1 1 0 1 45

Starting from states 3 or 4, can only go to states 3 or 4. This has classes like {1, 2, 5} (a class is a maximal set such that can get from any one to any other one by taking high powers). Given any matrix like this, can look at classes, look at entropy of each class. Consider µ ¶ µ ¶ 0 1 1 0 2 B = , B = . (7.10) 1 0 0 1 This is irreducible but not aperiodic. We are cycling back and forth between two vertices, like the following graph: −→ ·

· ←−

If the matrix is periodic, why doesn’t it have entropy zero? Consider   0 0 1 1  0 0 1 1     1 1 0 0  1 1 0 0

(7.11)

(7.12)

Here is a block periodic matrix – states 1 and 2 are forced to go to states 3 and 4, and states 3 and 4 are forced to go to states 1 and 2 (but have choice as to how you go to these).

7.5

Perron-Frobenius

Theorem 7.5.1 (Perron-Frobenius). If B is a non-negative, irreducible, aperiodic square matrix, then B has a real positive eigenvalue λ (called the PerronFrobenius eigenvalue) with the following properties: − → − 1. λ has strictly positive left and right eigenvectors l and → r. 2. |λ| > |µ| for any other eigenvalue µ of B. − → → r. 3. Any positive eigenvector of B is a multiple of l or − 4. For any matrix B 0 ≥ B coordinatewise, where B 0 has Perron eigenvalue λ0 , then λ0 ≥ λ with equality if and only if B 0 = B. 46

5. limn→∞

1 Bn λn

− → − →− → =− r l , where the vectors are normalized so that l · → r = 1.

λ is algebraically and geometrically simple. For a proof, see Introduction to Dynamical Systems by Brin and Stuck, or Symbolic Dynamics by Kitchens or An Introduction to Symbolic Dynamics and Coding by Lind and Marcus

7.6

Applications of the Perron-Frobenius Theorem

We will compute the entropy of shifts of finite type. We can compute the complexity of the golden mean shift by using elementary means. We have cn (X) is the n + 2 Fibonacci number (which is why we call this the golden mean shift). √ From the definition of the entropy, we find h(X) = log 1+2 5 . Consider the golden mean shift, X. It has the transition matrix µ ¶ 1 1 B = . (7.13) 1 0 The complexity is cn+1 (X) =

X

bnij .

(7.14)

By the Perron-Frobenius theorem, we know how these numbers behave. The √ 1± 5 eigenvalues of B satisfy √ det(B − xI) = 0, which gives 2 . Thus, the Per1+ 5 ron eigenvalue is λ = 2 . We can find the µ left and ¶ right eigenvectors as well a b (but won’t do in this case). Thus, λ1n B n → . Thus, c d cn+1 (X) Ã (a + b + c + d)λn ,

(7.15)

which gives us that n log λ + log(a + b + c + d) = log λ. n→∞ n+1

h(X) = lim

(7.16)

All we need here is that a + b + c + d > 0. Consider the substitution that takes 0 → 0010 and 1 → 1010. Starting with 0, consider iterates: 0, 0010, 00100001010100010, and so on. This string is 0, sigma(0), sigma2 (0), and so on. We won’t justify that the limit has the following property, but we will prove something about the outcome at each finite 47

stage. 0 is balanced towards 0s, and 1 is mapped to something with as many 0s as 1s. Thus, in the limit we expect there to be more 0s than 1s. We have µ ¶ 3 1 B = , (7.17) 2 2 where the 3 is the number of 0s in what 0 is mapped to; the 1 is the number of 1s in what zero is mapped to, and so on. We have bij is the number of occurrences of j in σ(i). − If we look at → v B n = (ab), then a is the number of 0s in σ n (0 and b is the n number of 1s in σ (0). We look at the characteristic polynomial, det(B − xI) = 0, or ¯ ¯ ¯ 3−x ¯ 1 ¯ = 0. ¯ (7.18) ¯ 2 2−x ¯ We find the eigenvalues are 4 and 1 (so the Perron eigenvalue is λ = 4). The ¡ ¢ − → → eigenvectors are l = ( 32 13 ) and − r = 11 (we normalized the first eigenvector so that the dot product is 1). We now have µ 2 1 ¶ Bn 3 3 = lim . (7.19) 2 1 n→∞ 4 3 3 Thus, we have

B 21 → lim − v · n = ( ), n→∞ 4 33 so about two-thirds of the entries are 0.

7.7

(7.20)

Entropy of Edge Shifts

Say all the edges have different labels – finite type, not a problem. What is far more interesting is to ask something slightly weaker. Say it is right-resolving (all the edges leaving each vertex must have different labels). The entropy will be the logarithm of the Perron eigenvalue.

7.8

Example from Vitaly µ

¶ 1 1 Consider the map from the 2-torus to itself given by . In this case, we get 1 0 the same value as the golden mean system. Are these two systems in some sense 48

isomorphic? They have similar matrices, same entropy. The 2-torus is a connected space, but the golden mean shift is on a Cantor set. Earlier in the year we talked about symbolic dynamics and encoding. Measure Theoretical formulations would allow us to have the two spaces equivalent up to sets of measure zero. Thus, we might find these two systems are so equivalent upon µ removing ¶measure zero sets. 2 1 Consider now the square of the previous matrix, . Can we establish 1 1 a correspondence with this on the 2-torus and the square of the shift operator in the golden mean shift space?

7.9

Handout from Ronnie Pavlov An Introduction to Perron-Frobenius Theory

Definition 1: A subshift or shift space on an alphabet A = {0, 1, . . . , n − 1} is a shift-invariant, closed subset of the full shift space AN . Example 1: One example of a subshift is the so-called golden mean subshift X on the alphabet {0, 1}, which is defined to be the set of all sequences which do not contain two consecutive ones. Definition 2: The language of a subshift X, denoted by L(X), is defined to be the set of finite words which appear as subwords of elements of X. Definition 3: A shift of finite type is any subshift on an alphabet A defined by specifying a finite collection of words on A, call this collection F , and then defining X to be the set of sequences on A which do not contain any member of F . Different collections F could induce the same shift of finite type X; the minimum over all F which induce X of the maximum length of an element of F is said to be the type of X. Example 2: Any of the collections F1 = {1100, 1101, 1110, 1111}, F2 = {011, 110}, or F3 = {11} would induce the golden mean shift defined in Example 1, implying that the golden mean shift is a shift of finite type. However, one sees with a moment’s thought that any collection F with this property must 49

have some word of length at least 2, and F3 provides a collection in which all words have length at most 2. This shows that in fact the golden mean shift is a shift of type 2. Definition 4: Given a directed graph G with edges taking labels in the alphabet A, we define the edge shift induced by G to be the set of all sequences e1 e2 e3 . . . where ei is the label of the ith edge traversed in some infinite walk on G, beginning at any vertex.

50

Example 3: One sees after a moment that the edge shift induced by the following directed labeled graph G is again the golden mean shift:

Exercise 1: Check that any edge shift is in fact a shift space. Exercise 2: Describe the edge shift induced by the following directed labeled graph. Is it a shift of finite type? Definition 5: The complexity of a shift space X is the set of numbers c1 (X), c2 (X), . . ., where ci (X) is the number of n-letter words in L(X). Definition 6: The topological entropy (hereafter just referred to as entropy) of a shift space X, denoted by h(X), is defined by log cn (X) n→∞ n That this limit always exists will not be proven here, see Peter Walters’ “An Introduction to Ergodic Theory” or Karl Petersen’s “Ergodic Theory” for a proof. h(X) = lim

Example 4: The entropy of any shift space X on an alphabet A of size k will be at most log k: there are only k n possible words of length n on A, so cn (X) ≤ k n . n) = log k for all n, and so that h(X) ≤ log k. This implies that log cnn (X) ≤ log(k n How does one go about computing the entropy of a particular shift of finite type? For this, we introduce some linear algebra. Definition 7: The transition matrix B of a shift X of finite type 2 on an alpabet A is defined as follows: if |A| = n, then B is an n × n matrix, where ½ 1 if ij ∈ L(X) bij = 0 otherwise Example 5: For the golden mean shift on the alphabet {0, 1}, the transition matrix is " # 1 1 B= 1 0 51

Exercise 3: Check that for a shift X of finite type 2, the number of words of length n beginning with i and ending with j is just the coordinate in the ith row and jth column of An . As a corollary, you get that cn (X) is the sum of the entries of An . We will momentarily state the main theorem in this talk, the Perron-Frobenius theorem on nonnegative matrices. First, we need two more definitions. Definition 8: A shift of finite type 2 is called irreducible if for any letters i and j in A, there exists some word w ∈ L(X) beginning with i and ending with j. Alternately, X is irreducible if for any i, j ∈ A there exists n such that the entry in the ith row and jth column of B n is nonzero. We call any nonnegative matrix B with this property irreducible as well.

52

Example 6: The matrix 

1 1 0 0 1



   1 0 1 0 0      B= 0 0 1 1 0     0 0 1 1 0    1 1 0 1 0 is not irreducible. Definition 9: A shift X of finite type 2 is called aperiodic if for any letter i in the alphabet A, gcd{n : there exists an n + 1-letter word beginning and ending in i} = 1. Alternately, one can define this in terms of the transition matrix: the matrix B is aperiodic if for any i, gcd{n : the entry in the ith row and ith column of B n is positive} = 1. Example 7: The matrix

" B=

0 1

#

1 0

is irreducible, but not aperiodic. Theorem 1: (Perron-Frobenius) If B is a nonnegative, irreducible, aperiodic, square matrix, then B has a real positive eigenvalue λ (called the Perron eigenvalue of B) with the following properties: (i) λ has strictly positive left and right eigenvectors l and r. (ii) |λ| > |µ| for any other eigenvalue µ of B. (iii) Any positive eigenvector of B is a multiple of ~l or ~r. (iv) For any matrix B 0 ≥ B coordinatewise, where B 0 has Perron eigenvalue λ0 , λ0 ≥ λ, with equality if and only if B 0 = B. n (v) limn→∞ Bλn = ~r ~l.

53

Applications of the Perron-Frobenius Theorem to symbolic dynamics: One of the main applications of the Perron-Frobenius theorem is computing the entropy of shifts of finite type. By Exercise 3 and (v) of the Perron-Frobenius theorem, one can easily see that the topological entropy of any irreducible aperiodic shift of finite type is just log λ, where λ is the Perron eigenvalue of the transition matrix. Example 8: Suppose that we want to compute the entropy of the golden mean shift. We have seen that the transition matrix for the golden mean shift is # " 1 1 B= , 1 0 √

which has Perron eigenvalue equal to the golden ratio γ = 1+2 5 . Therefore, we see immediately that the entropy of the golden mean shift is log γ. Example 9: This example deals with substitutions. Suppose I am given a substitution, such as σ:

0 −→ 0010 , 1 −→ 1010

and we wish to examine the words 0, σ(0), σ 2 (0), . . ., which approach a fixed point u of σ. One might want to know the frequency of zeroes and ones in the word σ n (0) for large n. Do they approach a limit? This question is also answerable by Perron-Frobenius. We create a matrix B where Bij is just the number of occurrences of the letter j in σ(i). Here, " # 3 1 B= . 2 2 With a moment’s thought, one sees that if we define the row vector ~v = (1 0), that the vector ~v B n has as its entries the number of zeroes and ones in σ n (0), respectively. However, we know by the Perron-Frobenius theorem that the asymptotic behavior of B n is easily determinable. Some computation shows that for the ma¡1¢ ~ trix B, λ = 4, l = (2 1), ~r = 1 .

54

So, by (v) of the Perron-Frobenius theorem, we see that limn→∞ " # 2 1 B= , 2 1

Bn 4n

=

and so that limn→∞ 4−n~v B n = (2 1), meaning that asymptotically, there are twice as many zeroes as ones in σ n (0). Example 10: We return to the edge shifts described by Definition 4. It will not be shown here, but the class of edge shifts strictly contains the class of shifts of finite type: every shift of finite type may be realized as an edge shift, but not every edge shift is a shift of finite type. The Perron-Frobenius theorem can prove helpful for calculating the entropy of an edge shift as well. Firstly, we note that if all of the labels of edges of our directed graph G are distinct, then the edge shift is just a shift of finite type 2, and our previous analysis will suffice to calculate the entropy. However, more is true: Theorem 2: If an edge shift has the property that all edges leaving each vertex have different labels (such an edge shift is called right resolving), then if we construct a transition matrix B where the states are the vertices of G, (in other words, if we label the vertices by 1, 2, . . . , n, then bij is just the number of edges from i to j), and if this matrix is irreducible and aperiodic, then the entropy of the edge shift is again just the logarithm of the Perron eigenvalue of B. Exercise 4: Using Theorem 2, calculate the entropy of the edge shift given in Exercise 2.

55

Chapter 8 Eighth Lecture: February 24th, 2004 Lecturer: Gerald Edgar

8.1

Symbolic Dynamics and Fractals: Introduction

Consider a Julia set. We have a transformation of C given by f (z) = z 2 + c, with c in the main cardiod of the Mandelbroit set. We can consider other quadratic polynomials, but they are conjugate to this. We can iterate, and see what happens. When c is close to 0, there will be an attracting fixed point (that is what it means to be in the main cardiod of the Mandelbroit set). There is a simple closed curve around the fixed point: if you are inside, you iterate to the fixed point; outside, you escape to infinity. If you are on the simply closed curve (the Julia set), interesting things happen. If we do f (z) = z 2 , the attracting point is the origin, boundary is a circle (and well understood!). For general c, what happens on the boundary is similar to what happens for f (z) = z 2 . If on the boundary (the Julia set), in any neighborhood you’ll have some points that iterate to the attractor, others that iterate to infinity. Can color all points going to infinity one color. Another scheme (less computer√intensive) is to consider the inverse map. Instead of f (z) = z 2 + c, look at ± z − c. Thus, we have two inverse maps (depending on how we choose the square-root). The situation is now reversed. If you are inside the Julia set, you should get closer to the boundary as you iterate. 56

We know a point on the Julia set – it has a repelling fixed point on the Julia set. Once we have a point, we choose one of the two inverse maps at random, and apply to the repelling fixed point. We keep iterating. Another thing we can do is to consider the symbolic dynamics. If we choose the square-root properly, we can get the dynamics we want. Take one square-root to be in the upper-half plane, the other in the lower-half plane (sometimes do right and left plane). Consider the circle. Start with a point, square, stay on circle. Can ask if still on top half or on bottom half, encode as 0 (top) and 1 (bottom), gives a symbolic dynamics. Similarly, for Julia sets we have a symmetric set. If we are in the upper half plane, encode with 0 else encode with 1. Again, get a string of 0s and 1s. When we have one map that we iterate, we call this a dynamical system. Using the two inverses gives what we call an Iterated Function System. For many purposes, can work with the map or the IFS interchangeably, though there are often advantages of working with one over the other.

8.2

Notation

More generally, consider a finite alphabet E. If n is a positive integer, let E (n) denote the words in the language of length n, and let E (ω) be the infinite onesided words in the language. E (0) = {Λ}, the empty word. Let E (∗) =

∞ [

E (n) ,

(8.1)

n=0

the set of all finite words (we do not include E (ω) . If we have a finite word α, we let |α| denote the number of letters of α. If we have a word of at least n letters, let α|n be the restriction (the first n letters). We have an ordering: α ≤ β if and only if α is an initial segment of β. E (∗) becomes a tree. For example, we would have a binary tree if E = {0, 1}. As we go down, we add more letters. In this case, E (ω) is made of two parts, the part below 0 and the part below 1; the part below 0 is made up of two parts, the part below 00 and the part below 01; the part below 01 is made up of two parts, the part below . . . . We have cylinders [α] = {ασ : σ ∈ E (ω) } for α ∈ E (∗) . This is a base for the topology. We have a metric ρ(σ, τ ) = 2−k where |σ ∧ τ | = k (the first place they disagree). We have a shift T which corresponds to T (eσ) = σ where e ∈ E is one letter, σ ∈ E (ω) . We have a family of right shifts Se (σ) = eσ, again e ∈ E a letter and σ ∈ E (ω) . 57

We have an Iterated Function System: we have a complete metric space X (often it will be a Euclidean space), with maps fe : X → X for each letter e; this collection of maps is called an IFS with respect to X. We will have a contraction map ρ with ρ(f (x), g(x)) ≤ rρ(x, y) with r ∈ (0, 1); it is a similarity if instead we have ρ(f (x), g(x)) = rρ(x, y) with r ∈ (0, 1). Let [ F (E) = fe (E), (8.2) e∈E

where F : K(X) → K(X). We have an attractor or invariant set [ K = fe (K), e∈E

which is non-empty and compact. This attractor set exists and is unique.

58

(8.3)

8.3

Example: Eisenstein Fractal

Eisenstein Fractions: four maps from the plane to itself. Below is the unique attractor set. This is symbolic dynamics on four symbols: doing an expansion in base −2 with four digits (0, 1, a, b). Some points will have three different expansions in this base (similar to real numbers where we have some numbers with two expansions, in the plane will have some numbers with three). The translates of this tile the plane (but with a fractal boundary!).

59

8.4

Example: McWartor’s Pentigree

Another example is McWarter’s Pentigree. Need to shrink by just enough so that they touch. Empty spaces, doesn’t tile the plane. Can define six maps of the plane to itself, this is the unique non-empty compact set such that equal to the union.

8.5

Example: Cantor Set

The Cantor set is a subset of the real line. We’ll consider subsets of [0, 1]: [0]−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−[1] Will look at a sequence of approximations to the Cantor set. Let C0 = [0, 1]. Remove the middle third, and get C1 = [0, 13 ]∪[ 23 , 1]. Is a closed interval, keep endpoints.

60

[0] − − − − − − − −[1/3]

[2/3] − − − − − − − −[1]

Continue the process. Next is · ¸ · ¸ · ¸ · ¸ 1 2 1 2 7 8 C2 = 0, ∪ , ∪ , ∪ ,1 . (8.4) 9 9 3 3 9 9 In general, Cn is the union of 2n closed intervals, each of size 3−n . Note C0 ⊃ C1 ⊃ C2 ⊃ · · ·

P∞

(8.5)

a We have two maps: x3 and x+2 . Consider a sequence (aj 0 ↔ j=1 3jj with 3 aj ∈ {0, 2}. For each aj , we either take the left map x3 or the right map x+2 . 3 . Every number has an expansion, only a few points Consider now x2 and x+1 2

have two expansions. A square in the plane can be thought of as the union of four parts, each shrunk by a factor of 12 . The square is the unique compact set, image of the four sets. We have a language with four symbols to represent the IFS. Another example is the Sierpinski gasket.

We have three maps, with the mapping not well defined at some points. The IFS is fine, but the corresponding dynamical system requires some adjustment. Let J = E (ω) . we h ave a map from J →θ K, where for α = e1 e2 · · · en we have the composition fα = fe1 ◦ fe2 ◦ · · · ◦ fen , and we have θ(σ) =

∞ \ n=0

61

fσ|n (K).

(8.6)

Thus, we have the situation (commutative diagram) σ∈J

−→Se

↓θ

J ↓θ

θ(σ) ∈ K

−→fe

(8.7)

K

We call K the attractor or invariant set of the IFS. We can call the string the address of the point. Some points may have more than one address (similar to how some numbers have multiple decimal expansions). Note T ◦ Se is the identity (we return to where we started).

8.6

Dimension

There are two types of dimensions that we will discuss; the IFS are so nice that these two will be the same. For more detail, see the Fractal Working Group (Ohio State, Fall 2003). The first is the box dimension. Let N² (K) be the number of boxes in an ²-grid that meet K. We define the Box Dimension by log N² (K) , ²→0 log 1²

lim

(8.8)

when the limit exists of course. More generally, for any ² > 0, let N² (K) = minimum number of sets of diameter ≤ ² required to cover K. (8.9) As ² decreases, this will usually tend to infinity. We consider the same limit as before (when it exists), and call this the box dimension. Note that, as K is compact, we always have such a covering for each ². See also the works of Bouligand. Hausdorff Dimension Consider a set E ⊂ Rd . We will consider k ≥ 0 (not necessarily an integer). Given an ² > 0, we want E ⊂

∞ [

Ai ,

(8.10)

i=1

where each set Ai has diameter at most ². Consider ∞ X

(diam(A))k .

i=1

62

(8.11)

We let H²k (E) be the infimum. We are interested in the case ² → 0, and call this the k-dimensional Hausdorff measure. If k is too small, get ∞; if k is too big, get 0. There should be some critical value where it jumps from 0 to ∞, and we call this the Hausdorff dimension. This k-dimensional measure should be good if we have a k-dimensional set. Consider the following: for Eisenstein fractions, we have µ ¶d 1 4· = 1. (8.12) 2 Then d = 2, tiles the plane. This comes from four maps, each having ratio 12 . We want an exponent d such that we have the equality above, and in this sense (similarity dimension), we have dimension 2. Anything that tiles the plane should be 2-dimensional. If it has dimension less than 2, countably many copies shouldn’t tile. For the McWorter pentigree, we have à √ !d 3− 5 6 = 1. (8.13) 2 We get d = 1.86 . . . . The space J is the union of Se (J) for each e ∈ E: [ J = Se (J).

(8.14)

e∈E

If we have an IFS with ratio r, instead we should use rk : ρ(σ, τ ) = rk . So, shift multiplies all distances by multiple r (putting same letter in front, if before agreed for first m digits, now agree for the first m + 1 digits). The point is the space J can be covered by cylanders [α] each of diameter rn . If E has s letters, and |α| = n, how many cylanders do we need to cover all of J? It is sn , so for the box dimension we have log sn log s lim = = d. (8.15) −n log r log 1r We now have a compatible system (commutative diagram) J

−→Se

↓θ K

J ↓θ

−→fe 63

K

(8.16)

We now want to solve the similarity dimension problem, X res = 1.

(8.17)

e∈E

The two calculations could be different, but we get the same answer this time (for box dimension and Hausdorff dimension). If we have a finite string α = e1 · · · en , we get r(α) = re1 · · · ren . Let α be the longest common prefix of two strings σ and τ : α = σ ∧ τ , with ρ(σ, τ ) = r(α). Go down binary tree, get a set of cylinders that cover space, see how many cylinders of size ² are needed. This will give us the box dimension. To get the Hausdorff dimension, can use sets of different dimension. A row in the binary tree gives us a way to cover with cylinders. Will get 1 for any row (the way the diameters work: diam(αe) = diam(α) · re .

64

8.7

Heighway’s Dragon

We can analyze this in a way like self-similar. Think of the boundary as being made up of a left (A) and a right (B) part. The boundary is a closed curve, but not simple closed curve (necks). We can find the dimensions of A and B.

65

For B, it turns out that it is made up of two shrunken As, rotated, shrunk by a half. Looked at in another way, can get that B is made up of a B and an A, each shrunk by √12 . Can look as a multi-graph: Have nodes at A and B, have an arrow from A to itself of weight √12 ; an arrow from A to B of weight √12 , and then arrows from B to A of weight 12 . Not self-similar in the usual sense; often called graph self-similar. Our matrix has two rows and two columns:  ³ ´ ³ ´  d d A(d) = 

√1 ¡ 21 ¢d 2 2

√1 2



(8.18)

0

here A(d) is the adjacency matrix, dimension d; do not confuse A(d) with node A. In the above, 0 < d. In some sense, we want the above matrix to equal 1. This is a non-negative matrix. To such matrices, there is a nice theory due to Perron-Frobenius, which describes how the eigenvalues of such a matrix can behave. We have a positive eigenvalue large that the absolute value of everything else. Often the largest eigenvalue is called the spectral radius; all the eigenvalues are often called the spectrum. We want to find d so that the spectral radius is 1; thus, the other eigenvalues will be less than 1. The similarity dimension is always greater than or equal to the Hausdorff dimension For the Barnsley Wreath, the self-similar dimension is 1.9227 (approx), and the Hausdorff is 1.8459 (approx).

66

We have a directed multi-graph (V, E) with V the vertices and E the edges (n) Euv (the edges from u to v). We have paths Euv will be paths of length n in the directed graph (not all possible strings of length n) that start at u and end at (n) v. If we don’t care where we end, only that we start at u, we write Eu . Let (0) Eu = {Λu }. We will have a forest E (ω) , where we have one tree for each node, at the root of each tree is the empty Λu . We have right shifts: for e ∈ Euv (from (ω) (ω) the right node to the right node) by Se : Ev → Eu by Se (σ) = eσ. This is an IFS directed by the graph (V, E). For each node u, we have a complete metric space Xu (for each u ∈ V ; often have all spaces the same, but not required). We have maps fe : Xv → Xu for e ∈ Euv . We have invariant lists (Ku )u∈V of nonempty compact sets (one for each node), but up of images of these sets in the list according to the edges in the graphs, with [ [ fe (Kv ). Ku = (8.19) u∈V e∈Euv (∗)

We have string models Ju = Eu , with address maps θu : Ju → Ku and a commutative diagram Jv −→Se Ju ↓θv Kv

↓θu −→fe

(8.20)

Ku

with u, v ∈ V , e ∈ Euv .

8.8 References for the Talk Below are references for today’s lecture. Additionally, many lectures from The Ohio State VIGRE Working Group (Fall 2003) are relevant. Those are available online at http://www.math.ohio-state.edu/∼sjmiller/math/classes/fractal.pdf as well as http://www.math.ohio-state.edu/vigre/pdf/activity-38.pdf

67

1. M. F. Barnsley, Fractals Everwhere! Academic Press, 1988. 2. L. M. Blumenthal & K. Menger, Studies in Geometry. Freeman, 1970. 3. R. L. Devaney, An Introduction to Chaotic Dynamical Systems. Benjamin/Cummings, 1986. 4. V. Drobot & J. Turner, “Hausdorff dimension and Perron–Frobenius Theory.” Illinois J. Math. 33 (1989) 1–9. 5. G. Edgar, Measure, Topology, and Fractal Geometry. Springer-Verlag New York, 1990. 6. K. Falconer, The Geometry of Fractal Sets. Cambridge Univ. Pr., 1985. 7. F. Hausdorff, “Dimension und äußeres Maß.” Math. Ann. 79 (1918) 157–179 8. B. B. Mandelbrot, The Fractal Geometry of Nature. Freeman, 1982. 9. R. D. Mauldin & S. C. Williams, “Hausdorff dimension in graph directed constructions.” Trans. Amer. Math. Soc. 309 (1988) 811–829. 10. H. Minc, Nonnegative Matrices. Wiley, 1988. 11. P. A. P. Moran, “Additive functions of intervals and Hausdorff measure.” Proc. Cambridge Phil. Soc. 42 (1946) 15–23. 13. S. J. Taylor & C. Tricot, “Packing measure, and its evaluation for a Brownian path.” Trans. Amer. Math. Soc. 288 (1985) 679–699.

68

Chapter 9 Ninth Lecture: March 2nd, 2004: Symbolic Dynamics of the Pedal Mappin Lecturer and typist: Joon-Ku Im Seemingly complicated problems can sometimes be easily solved if we investigate them via symbolic dynamics. For example, we can get the maximal invariant set in the dynamical system of Smale’s horseshoe by considering its isomorphic symbolic dynamical system. Here we discuss a simple, but interesting dynamic system of ’Pedal Mapping.’ From elementary plane geometry, we will define the pedal mapping of triangles, and then encode each triangle to an infinite obtuseness word in {0, 1, 2, 3}N . And then we will discuss its geometric, algebraic properties, and ergodicity.

9.1

Pedal Triangles and Pedal Mapping

Definition 9.1.1. (Pedal Triangle) Let T and T 0 be triangles. We call T 0 the Pedal Triangle of T if T 0 is formed from the feet of the altitudes of T . We label each angle ∠10 , ∠20 and ∠30 of T 0 to be the feet of the altitudes from ∠1, ∠2 and ∠3 respectably. And the mapping P from a tringle to its pedal tringle is called the Pedal Mapping. Also we have the sequence of pedal triangles of T , that is, T1 := T , T2 := P (T ), T3 := P 2 (T ), . . . , Tn := P n−1 (T ) and so on. Theorem 1. . 69

Let T be a triangle with its angles ∠1, ∠2, ∠3 and T 0 be one with ∠10 , ∠20 , ∠30 . Then (1) If ∠1, ∠2, ∠3 ≤ π/2, that is, T is an acute triangle, then ∠10 = π −2· ∠1, ∠20 = π − 2 · ∠2, and ∠30 = π − 2 · ∠3. (2) If ∠1 π/2, that is, T is an obtuse triangle, then ∠10 = 2 · ∠1 − π, ∠20 = 2 · ∠2, ∠30 = 2 · ∠3. The same things hold for obtuse ∠2 and obtuse ∠3. Proof. Using Cartesian coordinate. Note. There can be at most one angle which is π/2 since ∠1 + ∠2 + ∠3 = π, and ∠1, ∠2, ∠3 ≥ 0. Definition 9.1.2. (Degenerate). (1) A triangle T is called degenerate if it has one or two angles of 0. (2) A triangle T is called eventually degenerate if there exists an integer n ≥ 0 such that n-th pedal triangle of T is degenerate. For example. The pedal triangles of right triangles are degenerate. Definition 9.1.3. (Obtuseness word). To any triangle T with its angles ∠1, ∠2 and ∠3, assign its obtuseness label a to T if there is its only obtuse angle ∠a. (otherwise assign 0) The obtuseness word a1 a2 a3 · · · of a triangle T is the sequence of obtuseness labels of its consecutive pedal triangles T1 , T2 , T3 , · · · . Now consider a map E from the set M of equivalence classes of all similar triangles to X, some subset of {0, 1, 2, 3}N corresponding a triangle T to its obtuseness word. Then we want E is an isomorphism, that is, E is bijective and the following diagram is commutative : P

M −−−→ M

 

 

Ey

Ey P∗

X −−−→ X ∗ where P is a shift map induced from P . If this is guaranteed, then we can answer the following questions almost trivially. Question 9.1.4. . (1) Which triangle has all its pedal triangles acute? How many of such triangles are there? 70

(2) Which triangle has all its pedal triangles obtuse? How many of such triangles are there? (3) Are there any triangles whose sequence of pedal triangles come arbitrarily close to all triangles?

9.2

Geometric Understanding of the Pedal Mapping

Definition 9.2.1. (Geometric Representation). If T is any triangle with angles ∠1, ∠2, ∠3, then write (x, y, z) = (∠1/π, ∠2/π, ∠3/π). Since x + y + z = 1 and x, y, z ≥ 0, we can represent any equivalence class of similar triangles as a point on the equilateral triangle M = {(x, y, z) : x + y + z = 1, x, y, z ≥ 0}. (We call this M a Moduli Space.) This is the Geometric Representation of triangles. We usually identify a triangle T with its geometric representation. Note. If we divide M into 4 subdivision (again equilateral triangles), label each of those Mi := M ∩ {(x, y, z) : x 1/2} for i = 1, 2, 3 and M0 := M \ (M1 ∪ M2 ∪ M3 ) naturally. Here we can easily see that T := (x, y, z) ∈ M0 , then T is the geometric representation of an acute triangle, and that if T ∈ Mi for any i = 1, 2 or 3, then T is that of an obtuse triangle with obtuse ∠i. Remark 9.2.2. . (1) By the pedal mapping, each of the three triangles M1 , M2 , M3 is dilated by a factor of 2 and laid back over M . But M0 is dilated by a factor of -2. (2) If P n−1 (T ) is in Mi , then n-th symbol of E(T ) = i. (3) Obviously the obtuseness words are not 4-nary expansions of numbers, but just words of 0,1,2 and 3. For example, 03 6= 10, 01 6= 10, 03 6= 30. (4) The pedal mapping seems not continuous in some sense, even if we didn’t yet define any topology on M. (5) We can visually understand that there must be prohibited points to guarantee the 1-1 correspondence between triangles and points on {0, 1, 2, 3}N , which terminate in i0i(i = 1, 2, 3) or ia (i = 1, 2, 3 and a is any word, not necessarily repeating, consisting of two nonzero symbols j, k other than i) since i0i and ia should be replaced by 00i and iac respectively where iac is a word obtained by interchanging j and k in ia.

71

Definition 9.2.3. (Set of Non-prohibited Obtuseness Words). Now we call the subset X of {0, 1, 2, 3}N the set of non-prohibited obtuseness words consisting of all obtuseness words in {0, 1, 2, 3}N after throwing the prohibited points mentioned above. Note. By throwing all the prohibited points away, we can restrict our range to X to achieve 1-1 correspondence to similar triangles. Now we verify it by constructing encoding and decoding algorithm explicitly by algebraic understanding.

9.3

Algebraic Understanding of the Pedal Mapping

Definition 9.3.1. (Binary Expansion of Angle Matrix). For any triangle T with its angles ∠1, ∠2 and ∠3, represent T by       x ∠1 .α1 α2 α3 · · · T =  y =1/π ·  ∠2  =  .β1 β2 β3 · · · , z ∠3 .γ1 γ2 γ3 · · · which we call the Binary Expansion of the Angle Matrix of T . We may identify the binary expansion of the angle matrix of a triangle T with T itself. Then we have two identified representations of a triangle : the geometric representation on the moduli space and the binary expansion of angle matrix, and need to identify these to the set of non-prohibited obtuseness words. Remark 9.3.2. . (1) The first digits in three rows of the angle matrix determines in which Mi the triangle T lies, that is, determines the first digit of its obtuseness word of T . That is to say, if the first digit of i-th row is 1 and the others are 0, then T has the angle ∠i obtuse, i.e. T is contained in Mi . (2) If there is 1 in the first digit of i-th row, then T has obtuse ∠i. ( .1(mod 2), i.e. 1/2) Therefore the pedal mapping is just like a shift mapping from each component of the matrix because ∠i0 = 2 · ∠i − 1 and ∠j 0 = 2 · ∠j for j 6= i.) Otherwise, that is, if all the first digits are 0, the mapping is just like .1 - (the result of the shift map) from each component of the matrix. (3) If all three componets of the matrix have 1 in the first digits, the sum 1/π · (∠1 + ∠2 + ∠3) exceeds 1. If two of them have the first digits, the lower  1 in  .10 places should have all 0. The only cases are like  .10  which can be indetified .0 72

     .10 .01 .01 by  .01 ,  .10  and  .01 . Then we see the first two matrices rep.0 .0 .0 resent the obtuseness words 12 and 21 respectively, but these are prohibited and should be replaced by 03. Therefore each angle matrix has at most one component with 1 in its first digits. 

Algorithm 9.3.3. . Now we will construct an algorithm of encoding and decoding between a triangle T to an non-prohibited obtuseness word E(T ) in X ⊂ {0, 1, 2, 3}N explicitly. Then this guarantees the 1-1 correspondence of each other and we see the diagram above is well-defined and commutative explicitly. (1) Encoding E If we do the same thing as we mentioned in (1) and (2) in the preceding Remark repeatedly, then we get an non-prohibited obtuseness word E(T ) and EP = P ∗ E. (2) Decoding D := E −1 Given a non-prohibited word a = a1 a2 a3 . . . we want to associate an angle matrix   .α1 α2 α3 · · ·  .β1 β2 β3 · · · . .γ1 γ2 γ3 · · · Step 1. Find the first digit in each row of angle matrix, that is,  (1)        αi 0 1 0  (1)       1 , or 0 , 2 7→ 0 , 1 7→ define ai 7→  βi  by 0 7→ (1) 0 0 0 γi   0  0 . 3 7→ 1   (1) (1) (1) .α1 α2 α3 · · ·   Then concatenate together to form an infinite 3-row matrix P1 :=  .β1(1) β2(1) β3(1) · · · . (1) (1) (1) .γ1 γ2 γ3 · · ·

73

 (i) (i) (i) .α1 α2 α3 · · ·   Step 2. Find Pi =  .β1(i) β2(i) β3(i) · · ·  for every i > 0. (i) (i) (i) .γ1 γ2 γ3 · · ·   (i) (i) (i) .α1 α2 α3 · · ·   Suppose we have Pi =  .β1(i) β2(i) β3(i) · · · . (i) (i) (i) .γ1 γ2 γ3 · · · 

Then for Pi+1 , let columns until i-th column remained, and look at the i-th column. (i) (i) (i) If αi , βi , andγi are all 0 or all 1, (i+1) (i) (i+1) (i) (i+1) (i) then define αj = 1 − αj , βj = 1 − βj , and γj = 1 − γj for every j > i. Otherwise, Define Pi+1 := Pi . Step 3. Since the i-th iteration does not change the columns until i-th column, (i+1) (i) we can say αj = αj , and so on for every i ≥ j. Therefore define   .α1 α2 α3 · · · (i) αj := αj for any i ≥ j. So we define D(a) :=  .β1 β2 β3 · · ·  which .γ1 γ2 γ3 · · · represent a triangle in M. After we check that P E = EP ∗ , and E and D are inverses to each other, we get the following theorem Theorem 2. . The correspondence between a triangle and a non-prohibited obtuseness word is an isomorphism between the dynamical system of the pedal mapping and that of shift on four symbols. Answer 9.3.4. . Now we can answer the questions given at the first section. (1) The only triangle having all its pedal triangles acute is just the equilateral triangle. (2) The triangles represented on the Sierpinski Gasket as a subset of M have all their pedal triangles obtuse. So the cardinality of such triangles is uncountable. (3) Every triangle terminating in 0 1 2 3 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 000 001 · · · has a sequence of pedal triangles arbitrarily close to all triangles. We call such a triangle is dense in the moduli space M .

74

Proof. (1) Such a triangle must have word 0, and thus its  binary expansion of the  its encoded  .01 1/3 angle matrix must be  .01  which is equal to  1/3  .01 1/3 (2) Such triangles must not have 0 in their encoded words.

9.4

Ergodic Property of the Pedal Mapping

(This section would be quite less rigorous than the previous ones.) Theorem 3. (Measure Preserving Property). Let m be the Lebesgue measure on the plane M . The pedal mapping is measurepreserving, that is, for any measurable subset G of M , m(G) = m(P −1 (G)). Proof. Since the inverse image P −1 (G) of G consists of 4 pieces Gi := P −1 (G) ∩ Mi for i = 0, 1, 2, 3, and Gi ’s are P congruent to G diminished by a factor of 2. Therefore m(Gi ) = 1/4 · m(G). So 3i=0 m(Gi ) = m(P −1 (G)) = m(G). Theorem 4. (Ergodic Property). The pedal mapping is ergodic, that is, any measurable subset of M that is invariant under P has measure 0 or full measure. Theorem 5. (Mixing Property). The pedal mapping has mixing property. In other words, for any subset S of M and for every point T ∈ M , denote by N (T, S, n) the number of points T, P (T ), P 2 (T ), · · · , P n−1 (T ) that lie in S. Then for almost all T and every S, N (T, S, n) m(S) lim = . n→∞ n m(M )

75

Appendix A Cantor Set Review This appendix is from the VIGRE Fractal Working Group, The Ohio State, Autumn 2003. Cantor Set (1880); Henry Smith is sometimes said to have written about this before Cantor, but unclear if he had it. Before Cantor, notion of countable / uncountable wasn’t clear, and what Smith talked about wasn’t clear. Smith was interested in Riemann Integral. If you want to tell if a function is integrable, we nowadays say it must be continuous except on a set of measure 0. Back then, didn’t have measure (which dates from around 1900). They came up with examples of sets that were really small such that you could still be integrable. Some of his examples looked like Cantor sets.

A.1

Cantor Set

A.1.1

Construction

The Cantor set is a subset of the real line. We’ll consider subsets of [0, 1]: [0]−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−[1] Will look at a sequence of approximations to the Cantor set. Let C0 = [0, 1]. Remove the middle third, and get C1 = [0, 13 ] ∪ [ 23 , 1]. Is a closed interval, keep endpoints.

76

[0] − − − − − − − −[1/3]

[2/3] − − − − − − − −[1]

Continue the process. Next is · ¸ · ¸ · ¸ · ¸ 1 2 1 2 7 8 C2 = 0, ∪ , ∪ , ∪ ,1 . 9 9 3 3 9 9 n In general, Cn is the union of 2 closed intervals, each of size 3−n . Note C0 ⊃ C1 ⊃ C2 ⊃ · · ·

(A.1)

(A.2)

Definition A.1.1 (Cantor Set). The Cantor set C is define by C =

∞ \

Cn = {x ∈ R : ∀n, x ∈ Cn }.

(A.3)

n=1

Note that 0, 1 ∈ C. In fact, once we find and endpoint, we never remove the endpoint, thus all the endpoints are in C. One might first think that only the endpoints are left, but not the case. In fact, we’ll see C is uncountable later.

A.1.2

Non-Trivial Point in Cantor Set

Example A.1.2.

1 4

∈ C, but

1 4

is not an endpoint.

The endpoints are always of the form 3mn , m ∈ N. By unique factorization of integers, cannot write 14 as an integer divided by a power of 3. Must show 14 ∈ Cn for all n; infinitely many things to check, checking one at a time won’t be useful. Need to do a more clever job of checking. Will proceed by induction on n. Will proceed by Induction, showing that 14 and 34 are in Cn for all n. Clearly, both points are in C0 , and the base case holds. We now assume that 14 , 34 ∈ Cn , and show they are in Cn+1 . How do we go from Cn to Cn+1 ? We remove the middle third of sets. We take an interval, remove the middle third, and what is left for each sub-interval looks like the union of two pieces, each one-third the length of the previous. Thus, we have shrinking maps fixing the left and right parts L, R : R → R given by L(x)

=

R(x)

= 77

x 3 x+2 . 3

(A.4)

Exercise A.1.3. Prove that Cn+1 = L(Cn ) + R(Cn ).

(A.5)

Thus, each step is related to the previous step. The maps L and R are nice in that the two images of [0, 1] are disjoint, so all future subintervals will be disjoint and it will be easy to count. What happens to 14 and 34 ? Note µ ¶ 3 L 4 µ ¶ 1 R 4

= =

1 4 3 . 4

(A.6)

We now have the inductive step: If 14 , 34 ∈ Cn , then 14 , 43 ∈ Cn+1 . Proof. Clearly, 1 = L 4

µ ¶ 3 ∈ L(Cn ) ⊂ Cn+1 4

(A.7)

3 = R 4

µ ¶ 1 ∈ R(Cn ) ⊂ Cn+1 , 4

(A.8)

and

which completes the proof of the inductive claim. Remark A.1.4. Note that the Induction was easier by working with both and not just 41 .

A.1.3

3 4

and

1 4

Alternate Formulation of C

Note we have proved that C = R(C) ∪ L(C). Thus, 78

(A.9)

C =

∞ \

Cn =

n=0

∞ \

Cn+1 =

n=0

∞ \

(R(Cn ) ∪ L(Cn )) .

(A.10)

n=0

Therefore, we find

C

=

∞ \

R(Cn ) ∪

n=0

=

R(

∞ \

∞ \

Cn ) ∪ L(

n=0

=

L(Cn )

n=0 ∞ \

Cn )

n=0

R(C) ∪ L(C).

(A.11)

This is what we mean by C being self-similar. See is self-similar under shrinking by 1 . 3

A.1.4

Another Formulation of the Cantor Set

Let x ∈ [0, 1], we may write x in base 3. In other words, we can write x =

3 X ai i=1

3i

, ai ∈ {0, 1, 2}.

(A.12)

Note we are not claiming each number has a unique representation. Consider the string .122222222222 · · · and .2. Note that C1 = {x ∈ [0, 1] : a1 6= 1}. Continuing, we find C2 = {x ∈ [0, 1] : a1 6= 1, a2 6= 1}, and in general Cn C

= =

{x ∈ [0, 1] : a1 , . . . , an ∈ {0, 2}} {x ∈ [0, 1] : a1 , a2 , · · · ∈ {0, 2}} .

(A.13)

There are problems, however, As remarked, numbers need not have a unique base 3 expansion. The example given shows that we may replace a number with repeating block with a terminating set, and thus these numbers are rationals. If a number can be written in two ways, one way using 1s and one way not, then it is in the Cantor set (as it can be written in base 3 without using any 1s). Remark A.1.5. The Cantor Set is uncountable. 79

This follows from the fact that C is equivalent to numbers in base 3 without 1Pas a P ai bi digit. Formally, one could map any such x = , a , i ∈ {0, 2}, to y = 3i 2i where bi = 0 if ai is 0 and bi = 1 if ai is 2. Thus, C has as many points as all of [0, 1] (consider base 2 expansions of real numbers).

A.1.5

non-Cantor Sets

Let A0

=

A1

=

A2

=

A3

=

{0}

µ

¶ 2 A0 ∪ A0 + 3 µ ¶ 2 A1 ∪ A1 + 9 µ ¶ 2 A2 ∪ A2 + 27

.. . An+1 A

= =

µ An ∪ An + ∞ [

2



3n+1

An .

(A.14)

n=0

This is an increasing sequence of sets, its union is not the complete Cantor Set, but on the computer, cannot tell the difference between this and the Cantor set. Similarly, we have An+1 = L(An ) ∪ R(An ).

(A.15)

Therefore, the union of these sets, A, just like the Cantor Set, satisfies A = L(A) ∪ R(A).

(A.16)

Remark A.1.6. Note A is not the Cantor Set! A is a countable set, the Cantor Set is uncountable. We do have, however, that C = A, namely, C is the closure of A, and A approximates C as well as we want. 80

(A.17)

A.2

Uniqueness of sets under such constructions

Question A.2.1. Consider the maps L and R. Are there any other sets X such that X = R(X) ∪ L(X),

(A.18)

maybe if we want the sets to be disjoint, and not the empty set? Take 14 and 34 and keep applying these operators. This will generate a countable example. There are lots of examples of sets satisfying this relation. Theorem A.2.2 (Characterization of the Cantor Set). Let X be a closed, bounded, non-empty set such that X = L(X) ∪ R(X). Then X is the Cantor set.

81

Appendix B Algebraic and Transcendental Numbers The following is from An Invitation to Modern Number Theory, by Steven J. Miller and Ramin Takloo-Bighash (Princeton University Press, to appear). There are probably typos and unclear expositions below – please send comments to [email protected]. Definition B.0.3 (Algebraic Number). α ∈ C is an algebraic number if it is a root of a polynomial with finite degree and integer coefficients. Definition B.0.4 (Transcendental Number). α ∈ C is a transcendental number if it is not algebraic. Thus, a transcendental number is a number that does not satisfy any polynomial equation with integer coefficients. Fortunately primitive man must have thought that every number is algebraic otherwise the development of mathematics would have suffered greatly. But transcendental numbers do exist. The mere existence of such numbers was a puzzling problem for hundreds of years. Remember that back in the Pythagorean era the existence of irrational numbers was quite a devastating event. The existence of transcendental numbers, however, must have brought a sense of relief to the mathematical psyche. For one, the transcendence of a certain number, π, settled the long-standing problem of proving the impossibility of squaring a circle. Also, it showed that the theory of equations is simply not enough, and hence it opened the door for the development of other branches of mathematics. The purpose of this chapter is to prove the existence of transcendental numbers. While it is possible to write down explicit examples of transcendental numbers (e, π, etc!), we prefer to show the existence using a different method. 82

Here we will use Cantor’s ingenious counting argument. The basic idea is to show that there are a lot more real numbers than there are algebraic numbers. This will then show that there must be a left-over set, entirely consisting of transcendental numbers. We will see from the proof, that are a lot more transcendental numbers than there are algebraic ones; in fact, if one chooses a random number, the chance of it being transcendental is effectively one hundred percent!

B.1 Definitions and Cardinalities of Sets B.1.1 Definitions A function f : A → B is one-to-one (or injective) if f (x) = f (y) implies x = y; f is onto (or surjective) if given any b ∈ B, ∃a ∈ A with f (a) = b. A bijection is a one-to-one and onto function. We say two sets A and B have the same cardinality (ie, are the same size) if there is a bijection f : A → B. We denote the common cardinality by |A| = |B|. If A has finitely many elements (say n elements), A is finite and |A| = n < ∞. Exercise B.1.1. Show two finite sets have the same cardinality if and only if they have the same number of elements. Exercise B.1.2. If f is a bijection from A to B, prove there is a bijection g = f −1 from B to A. Exercise B.1.3. Suppose A and B are two sets, and suppose we have two onto maps f : A → B and g : B → A. Then show that |A| = |B|. NOT AS EASY AS IT SEEMS Exercise B.1.4. A set A is called infinite if there is a one-to-one map f : A → A which is not onto. Using this definition, show that the sets N and Z are infinite sets. In other words, prove that an infinite set has infinitely many elements. Exercise B.1.5. Show that the cardinality of the even integers is the same as the cardinality of the integers. Remark B.1.6. The above example is surprising to many. MAYBE ADD REMARK HERE ABOUT COUNTING INTEGERS UP TO X, AND LOOKING AT LIMITS. A is countable if there is a bijection between A and the integers Z. A is at most countable if A is either finite or countable. 83

Exercise B.1.7. Let x, y, z be subsets of X (for example, X = Q, R, C, Rn , et cetera). Define R(x, y) to be true if |x| = |y| (the two sets have the same cardinality), and false otherwise. Prove R is an equivalence relation.

B.1.2

Countable Sets

We show that several common sets are countable. Consider the set of whole numbers W = {1, 2, 3, . . . }. Define f : W → Z by f (2n) = n − 1, f (2n + 1) = −n − 1. By inspection, we see f gives the desired bijection between W and Z. Similarly, we can construct a bijection from N to Z, where N = {0, 1, 2, . . . }. Thus, we have proved Lemma B.1.8. To show a set S is countable, it is sufficient to find a bijection from S to either W or N. We need the intuitively plausible Lemma B.1.9. If A ⊂ B, then |A| ≤ |B|. Definition B.1.10. If f : A → C is a one-to-one function (not necessarily onto), then |A| ≤ |C|. Further, if C ⊂ A, then |A| = |C|. Exercise B.1.11. Prove Lemmas B.1.9 and B.1.10. If A and B are sets, the cartesian product A × B is {(a, b) : a ∈ A, b ∈ B}. Theorem B.1.12. If A and B are countable, so is A ∪ B and A × B. Proof. We have bijections f : N → A and g : N → B. Thus, we can label the elements of A and B by A B

= =

{a0 , a1 , a2 , a3 , . . . } {b0 , b1 , b2 , b3 , . . . }.

(B.1)

Assume A ∩ B is empty. Define h : N → A ∪ B by h(2n) = an and h(2n + 1) = bn . We leave to the reader the case when A ∩ B is not empty. To prove the second claim, consider the following function h : N → A × B:

84

h(1) = (a0 , b0 ) h(2) = (a1 , b0 ), h(3) = (a1 , b1 ), h(4) = (a0 , b1 ) h(5) = (a2 , b0 ), h(6) = (a2 , b1 ), h(7) = (a2 , b2 ), h(8) = (a1 , b2 ), h(9) = (a0 , b2 ) .. . 2 h(n + 1) = (an , b0 ), h(n2 + 2) = (an , bn−1 ), . . . , h(n2 + n + 1) = (an , bn ), h(n2 + n + 2) = (an−1 , bn ), . . . , h((n + 1)2 ) = (a0 , bn ) .. . (B.2) Basically, look at all pairs of integers in the first quadrant (including those on the axes). Thus, we have pairs (ax , by ). The above function h starts at (0, 0), and then moves through the first quadrant, hitting each pair once and only once, by going up and over. Draw the picture! Corollary B.1.13. Let Ai be countable ∀i ∈ N. Then for any n, A1 ∪ · · · ∪ An and A1 × · · · × An are countable, where the last set is all n-tuples (a1 , . . . , an ), ai ∈ Ai . Further, ∪∞ i=0 Ai is countable. If each Ai is at most countable, then ∪∞ A is at most countable. i i=0 Exercise B.1.14. Prove Corollary B.1.13. Hint: for ∪∞ i=0 Ai , mimic the proof used to show A × B is countable. As the natural numbers, integers and rationals are countable, by taking each Ai = N, Z or Q we immediately obtain Corollary B.1.15. Nn , Zn and Qn are countable. Hint: proceed by induction. For example write Qn+1 as Qn × Q. Exercise B.1.16. Prove that there are countably many rationals in the interval [0, 1].

B.1.3

Algebraic Numbers

Consider a polynomial f (x) with rational coefficients. By multiplying by the least common multiple of the denominators, we can clear the fractions. Thus, without loss of generality it is sufficient to consider polynomials with integer coefficients. 85

The set of algebraic numbers, A, is the set of all x ∈ C such that there is a polynomial of finite degree and integer coefficients (depending on x, of course!) such that f (x) = 0. The remaining complex numbers are the transcendentals. The set of algebraic numbers of degree n, An , is the set of all x ∈ A such that 1. there exists a polynomial with integer coefficients of degree n such that f (x) = 0 2. there is no polynomial g with integer coefficients and degree less than n with g(x) = 0. Thus, An is the subset of algebraic numbers x where for each x ∈ An , the degree of the smallest polynomial f with integer coefficients and f (x) = 0 is n. Exercise B.1.17. Show the following are algebraic: any rational number, thep square-root of any rational p number, the cube-root of any rational number, r q √ √ where r, p, q ∈ Q, i = −1, 3 2 − 5. Theorem B.1.18. The algebraic numbers are countable. Proof. If we show each An is at most countable, then as A = ∪∞ n=1 An , by Corollary B.1.13 A is at most countable. Recall the Fundamental Theorem of Algebra (FTA): Let f (x) be a polynomial of degree n with complex coefficients. Then f (x) has n (not necessarily distinct) roots. Of course, we will only need a weaker version, namely that the Fundamental Theorem of Algebra holds for polynomials with integer coefficients. Fix an n ∈ N. We now show An is at most countable. We can represent every integral polynomial f (x) = an xn +· · ·+a0 by an (n+1)-tuple (a0 , . . . , an ). By Corollary B.1.15, the set of all (n + 1)-tuples with integer coefficients (Zn+1 ) is countable. Thus, there is a bijection from N to Zn+1 , and we can index each (n + 1)-tuple a ∈ Zn+1 : {a : a ∈ Z

n+1

} =

∞ [

{αi },

(B.3)

i=1

where each αi ∈ Zn+1 . For each tuple αi (or a ∈ Zn+1 ), there are n roots. Let Rαi be the roots of the integer polynomial associated to αi . The roots in Rαi need not be distinct, and the roots may solve an integer polynomial of smaller degree. For example, f (x) = (x2 −1)4 86

is a degree 8 polynomial. It has two roots, x = 1 with multiplicity 4 and x = −1 with multiplicity 4, and each root is a root of a degree 1 polynomial. Let Rn = {x ∈ C : x is a root of a degree n polynomial}. One can show that Rn =

∞ [

Rαi ⊃ An .

(B.4)

i=1

By Lemma B.1.13, Rn is countable. Thus, by Lemma B.1.9, as Rn is at most countable, An is at most countable. Therefore, each An is at most countable, so by Corollary B.1.13 A is at most countable. As A1 ⊃ Q (given pq ∈ Q, consider qx − p = 0), A1 is at least countable. As we’ve shown A1 is at most countable, this implies A1 is countable. Thus, A is countable. Exercise B.1.19. Show the full force of the Fundamental Theorem of Algebra is not needed in the above proof; namely, that it is enough that every polynomial have finitely many roots. Exercise B.1.20. Prove Rn ⊃ An .

B.1.4

Transcendental Numbers

A set is uncountable if there is no bijection between it and the rationals (or the integers, or any countable set). The aim of this paragraph is to prove the following fundamental theorem: Theorem B.1.21. The set of all real numbers is uncountable. We first state and prove a lemma. Lemma B.1.22. Let S be the set of all sequences (yi )i∈N with yi ∈ {0, 1}. Then S is uncountable. Proof. We proceed by contradiction. Suppose there is a bijection f : S → N. It

87

is clear that this is equivalent to giving a list of the elements of S: x1 x2 x3

= = = .. .

x11 x12 x13 x14 · · · x21 x22 x23 x24 · · · x31 x32 x33 x34 · · ·

xn

= .. .

xn1 xn2 xn3 xn4 · · · xnn · · · (B.5)

Define an element ξ = (ξi )i∈N ∈ S by ξi = xii , and another element ξ¯ = (1 − ξi )i∈N . Now the element ξ¯ cannot be in the list; it is not xN because 1 − xN N 6= xN N ! Proof of the theorem. Consider all those numbers in the interval [0, 1] whose decimal expansion consists entirely of numbers 0, 1. Clearly, there is a bijection between this subset of R and the set S. We have established that S is uncountable. Consequently R has an uncountable subset. This gives the theorem. The above proof is due to Cantor (1873 − 1874), and is known as Cantor’s Diagonalization Argument. Note Cantor’s proof shows that most numbers are transcendental, though it doesn’t tell us are transcendental. We can easily pwhich3numbers √ show many numbers (such as 3 + 2 5 7) are algebraic. What of other numbers, such as π and e? Lambert (1761), Legendre (1794), Hermite (1873) and others proved π irrational. In 1882 Lindemann proved π transcendental. What about e? Euler (1737) proved that e and e2 are irrational, Liouville (1844) proved e is not an algebraic number of degree 2, and Hermite (1873) proved e is transcendental. Liouville (1851) gave a construction for an infinite (in fact uncountable) family of transcendental numbers; we will discuss his construction later.

B.1.5 Continuum Hypothesis We have shown that there are more transcendental numbers than algebraic numbers. Does there exist a subset of [0, 1] which is strictly larger than the rationals, yet strictly smaller than the transcendentals? Cantor’s Continuum Hypothesis says that there are no subsets of intermediate size. The standard axioms of set theory are known as the Zermelo-Fraenkel axioms (note to the expert: often the Axiom of Choice is assumed, and we talk of ZF + Choice). 88

Kurt Gödel showed that if the standard axioms of set theory are consistent, so too are the resulting axioms where the Continuum Hypothesis is assumed true; Paul Cohen showed that the same is true if the negation of the Continuum Hypothesis is assumed. These two results imply that the Continuum Hypothesis is independent of the other standard assumptions of set theory!

89

Appendix C Introduction to Continued Fractions The following chapters on continued fractions are from An Invitation to Modern Number Theory, by Steven J. Miller and Ramin Takloo-Bighash (Princeton University Press, to appear). There are probably typos and unclear expositions below – please send comments to [email protected]. For good introductions to Continued Fractions, see [HW] and [La1].

C.1

Decimal Expansions

Idea: there are various ways of representing numbers. There are decimal expansions, binary expansions, et cetera. If you have something complicated, one way to express it is to write it in terms of something simpler. Continued fractions are an example of this. Decimal expansion is very simple: x = xn 10n + xn−1 10n−1 + · · · + x1 101 + x0 + x−1 10−1 + x−2 10−2 + · · · xi ∈ {0, 1, . . . , 9}. (C.1) Exercise C.1.1. Let x have a periodic decimal expansion. For example, assume ∃N0 ∈ N and a1 , . . . , an ∈ {0, . . . , 9} such that x = xm xm−1 · · · x1 x0 .x−1 · · · xN0 +1 xN0 a1 · · · an a1 · · · an a1 · · · an · · · = xm xm−1 · · · x1 x0 .x−1 · · · xN0 +1 xN0 a1 · · · an (C.2) Prove that x is rational, and bound the size of the denominator. 90

Recall [x] is the largest integer less than or equal to x. Exercise C.1.2. Find [x] for x = −2, 2.9, 3, 3.1, 3.14, π, 3.15 and y] = [x] + [y]? Does [xy] = [x] · [y]?

29 . 5

Does [x +

For example, we calculate the decimal expansion of x = 9.75. [x] = [9.75] = 9. Call this x1 : x1 = [x]. How do we retrieve the next digit, 7? Look at x − x1 . This will be .75; if we multiply by 10, we get 7.5, and we note that the greatest integer less than or equal to 7.5 is 7. Thus, look at [10(x − x1 )] = 7, and define x2 = 10(x − x1 ) = 7.5. Iterating the above procedure yields the base ten expansion. Exercise C.1.3. Formally write down the procedure to find the base ten expansion of a positive number x. Discuss the modifications needed if x is negative.

C.2

Definition of Continued Fractions

C.2.1

Uses of Continued Fractions

Continued Fractions are a much more sophisticated machine than decimal expansion. Any finite continued fraction (with integer components) will be a rational number, and vice versa. This is a lot cleaner than something that goes on to infinity and is periodic. A periodic Continued Fraction is actually the solution of a quadratic equation with integer coefficients, which is very different than a periodic decimal expansion. A lot of very complicated numbers (for example, e), have very simple Continued Fraction expansions. Using Continued Fractions of numbers, you can get very interesting results on how to approximate numbers by rationals. For example, if you have the decimal expansion of a number, if you truncate the decimal expansion at some point, you get a rational approximation (some integer divided by a power of ten). You can do this with a continued fraction: you can cut it at some point and get a rational number, and use that rational number to approximate the number we started with. We will see that this is the best approximation you can have; we will, of course, quantify what we mean by best approximation. What does this remind us of? Fourier Series or Taylor Series: for a given expansion, the first n terms of a Fourier Series (or Taylor Series) give the best approximation of a certain order to the given function. 91

A finite continued fraction has this type of power: it is a very sophisticated machine. Given x ∈ R, how does one calculate the continued fraction expansion? We first describe the algorithm for determining the decimal expansion, and then we give an algorithm for finding the continued fraction expansion.

C.2.2

Definition

A Finite Continued Fraction is a number of the form a0 +

1 a1 +

1 a2 +

..

(C.3) 1

. + a1n

One does not want to write something like this every time, so we introduce the following shorthand notations. The first is a0 +

1 1 1 ··· a1 + a2 + an

(C.4)

A better notation is [a0 , a1 , . . . , an ]. Exercise C.2.1. Show [a0 ] = a0 , [a0 , a1 ] = a0 + a0 (a1 a2 +1)+a2 . a1 a2 +1

(C.5) 1 a1

=

a0 a1 +1 , a1

and [a0 , a1 , a2 ] =

Definition C.2.2 (Positive Continued Fraction). A continued fraction [a0 , . . . , an ] is positive if each ai > 0. Definition C.2.3 (Digits). If α = [a0 , . . . , an ], we call the ai the digits of the continued fraction. Definition C.2.4 (Simple Continued Fraction). A continued fraction is simple if each ai is a positive integer. Definition C.2.5 (Quotients or Convergents). Let x = [a0 , a1 , . . . ]. If xm = m m , then pqm is the mth quotient (or convergent) of x. [a0 , . . . , am ] = pqm

92

C.2.3

Calculating Continued Fractions

We expect to get something like x = a0 +

1 1 , a1 + a2 +···

(C.6)

where the ai are positive integers. Obviously, a0 = [x], the greatest integer at most x. Then x − [x] =

1 1 . a1 + a2 +···

(C.7)

and the inverse is x1 =

1 1 = a1 + 1 . x − [x] a2 + a3 +···

(C.8)

Therefore, the second digit of the continued fraction expansion is [x1 ] = a1 . 1 and iterate. Let x2 = x1 −[x 1] Exercise C.2.6. Formally write down the procedure to find the continued fraction expansion of a positive number x. Discuss the modifications needed if x is negative. Exercise √ √ C.2.7. Find the first few terms in the continued fraction expansions of 2, 3, π and e. In Theorem G.4.1, we prove that there exist positive constants A and B such that ¯ µ ¶¯ √ ¯ ¯ 1 A −B n−1 ¯Prob (an (α) = k) − log2 1 + ¯ ≤ . (C.9) e ¯ k(k + 2) ¯ k(k + 1)

C.2.4

Dynamical Interpretation of Continued Fractions

We are defining a map f (x) =

1 , x > 1. x − [x]

(C.10)

If x > 1, then f (x) > 1 (and is infinite only if x ∈ N). As f (x) > 1, we can apply f to f (x) and get f (f (x)). As long as the initial value is greater than 1, we can keep

93

iterating. The results will always be greater than one (and finite for non-integer input). If we start with x ∈ [0, 1), then x − [x] = x. Thus, for x ∈ [0, 1), f (x) = x1 . If x > 1, then f (x) 6= x1 : it will be shifted. Exercise C.2.8. Graph f (x), f (f (x)), and f (f (f (x))) = (f ◦ f ◦ f )(x). Draw a diagonal map g(x)³= x. Given , look at f (x0 ),´ and find x such that g(x) = ´ x0 ³ f (x0 ). Thus, go from x0 , f (x0 ) to f (x0 ), f (x0 ) . Then, project this point to ³ ´ f (x0 ), f (f (x0 )) , and continue the process indefinitely. Exercise C.2.9. Find all points in [0, 1] such that when you iterate infinitely often the above, it converges to a fixed point on the curve. What are the conditions on points in [0, 1] that lead to interesting behavior? (Extremely hard!) See the papers of S. Zakeri at the University of Pennsylvania. FROM REVIEWER: WHAT IS THE CURVE, WHAT ARE THE PAPERS? Exercise C.2.10. Fact: the continued fraction expansion of a rational number is finite. Prove this implies that if x ∈ Q, then eventually you must land on a singular point (ie, you are eventually sent to infinity). Observation: if you start with a rational number, there are finite numbers taken before the process explodes; if you start with a number x which satisfies a degree 2 equation, the process is periodic.

94

Appendix D Properties of Continued Fractions D.1 Representation of Numbers by Continued Fractions Lemma D.1.1. Given x = [a0 , . . . , aN ]. If N is odd, there is another continued fraction which also equals x, but with an even number of terms (and vice-versa). This is equivalent to the non-uniqueness in decimal expansions. For example, 3.499999999 · · · = 3.50. We make the convention that we throw away any decimal expansion ending with all 9s and replace it with the appropriate expansion ending in 0. Where does the ambiguity come from? Consider two continued fractions such that [a0 , . . . , aN ] = [a0 , . . . , aN − 1, 1]. For example, a1 +

1 1 = a1 + . a2 (a2 − 1) + 11

(D.1)

The only caveat is that we cannot have a zero in a continued fraction expansion. Thus, the above is a correct proof only if aN 6= 1; in the example given, we need a2 6= 1. If aN = 1, we consider a slight modification. For example, if a4 = 1, we have a1 +

1 1 = a1 + , 1 a2 + a3 + 1 a2 + a31+1

(D.2)

1

which completes the proof. 2 Consider [a0 , a1 , a2 , . . . , aN ]. Define a0n = [an , . . . , aN ], the tail of the continued fraction. Then [a0 , . . . , aN ] = [a0 , . . . , an−1 , a0n ]; however, the second continued fraction is positive but not necessarily simple (as a0n need not be an integer). 95

Theorem D.1.2. Suppose [a0 , . . . , aN ] is positive and simple. Then [a0n ] = an except when both n = N − 1 and aN = 1, in which case aN −1 = [a0N −1 ] + 1. Proof. a0n is a continued fraction given by a0n = an +

1 . an+1 + .1 ..

(D.3)

We just need to make sure that 1 < 1. (D.4) an+1 + .1 .. How could this equal 1 or more? The only possibility is if an+1 = 1 and the sum of the remaining terms is 0. This happens only if both n = N − 1 and aN = 1, proving the theorem. Uniqueness Assumption (Notation): whenever we write a finite continued fraction, we assume aN 6= 1, where N corresponds to the last term. Again, this is similar to notation from base ten expansion.

D.1.1 Elementary Properties of Continued Fractions Lemma D.1.3. Let [a0 , . . . , an ] be a Continued Fraction. Then h i h i 1. a0 , . . . , an = a0 , . . . , an−2 , an−1 + a1n . h i h i 2. a0 , . . . , an = a0 , . . . , am−1 , [am , . . . , an ] . These are the most basic properties of Continued Fractions, and will be used constantly below. Exercise D.1.4. Prove Lemma D.1.3.

D.1.2 Convergents to a Continued Fraction Direct calculation shows that [a0 ] = a0 , [a0 , a1 ] = a0 aa11+1 , and [a0 , a1 , a2 ] = a0 (a1 a2 +1)+a2 . In general, when we simplify everything (if the continued fraction a1 a2 +1 has finitely many terms), we get the ratio of two numbers. We denote this by (a0 ,...,an ) pn = pqnn(a , where pn and qn are polynomials with integer coefficients of a0 , qn 0 ,...,an ) a 1 , . . . , an . 96

Theorem D.1.5. For any m ∈ {2, . . . , n} we have 1. p0 = a0 , p1 = a0 a1 + 1, and pm = am pm−1 + pm−2 . 2. q0 = 1, q1 = a1 , and qm = am qm−1 + qm−2 . Proof. We proceed by induction. First, we check the basis case; we actually need to check n = 0 and n = 1 for the induction; we will check n = 2 as well to elucidate the pattern. By definition, [a0 ] = a10 , which is pq00 . [a0 , a1 ] = a0 aa11+1 , which agrees with pq11 . 0 [a0 , a1 , a2 ] should be aa22pq11 +p . As p1 = a0 a1 + 1 and q1 = a1 , direct substitution gives +q0 a2 (a0 a1 +1)+a0 . a2 a1 +1

We now show that if [a0 , . . . , am ] =

pm am pm−1 + pm−2 = qm am qm−1 + qm−2

(D.5)

pm+1 am+1 pm + pm−1 = . qm+1 am+1 qm + qm−1

(D.6)

then [a0 , . . . , am+1 ] =

We calculate the continued fraction of x = [a0 , . . . , am , am+1 ]. By Lemma D.1.3, this 1 is the same as the continued fraction of y = [a0 , . . . , am−1 , am + am+1 ]. Note, of course, that x = y; we use a different letter to emphasize that x has a continued fraction expansion with m + 2 terms, and y has a continued fraction expansion with m + 1 terms (remember we start counting with a0 ). We consider the Continued Fraction of y; it will have its own expansion with numerator Pm and denominator Qm . By induction (we are assuming we know the Pm theorem for all continued fractions with m terms), y = Q . m Therefore, ³ ´ 1 a + Pm−1 + Pm−2 m am+1 Pm ´ =³ . (D.7) Qm +Q a + 1 Q m

am+1

m−1

m−2

But the first m terms terms of y are the same as those of x. Thus, Pm−1 = pm−1 , and similarly for Qm−1 , Pm−2 , and Qm−2 . Substituting gives

97

³

Pm Qm

´ 1 am + am+1 pm−1 + pm−2 ´ = ³ . 1 am + am+1 qm−1 + qm−2

(D.8)

Standard algebra gives Pm (am am+1 + 1)pm−1 + pm−2 am+1 = . Qm (am am+1 + 1)qm−1 + qm−2 am+1

(D.9)

am+1 pm + pm−1 am+1 (am pm−1 + pm−2 ) + pm−1 = , am+1 (am qm−1 + qm−2 ) + qm−1 am+1 qm + qm−1

(D.10)

This is the same as

where the last step (substituting in with pm and qm ) follows from the inductive assumption. This completes the proof. A cute example is [1, 1, . . . , 1] = 1 +

1 1+

1 1+

..

= 1

pn . qn

(D.11)

. + 11

where we have n + 1 ones. What are the pi ’s and the qi ’s? p0 = 1, p1 = 2, pm = pm−1 + pm−2 . Similarly, we get q0 = 1, q1 = 1, and qm = qm−1 + qm−2 . Let Fm be the mth Fibonacci number: F0 = 1, F1 = 1, F2 = 2, F3 = 5, and Fm = Fm−1 + Fm−2 . Thus, [1, 1, · · · , 1] = FFn+1 . As we let the number of ones go to infinity, we can show n √

this will converge to the golden ratio (also called the golden mean), 1+2 5 . Notice how beautiful Continued Fractions are. A simple expression like this captures the golden ratio, which has many deep, interesting properties. In base ten, .111111 . . . is just 19 . Exercise D.1.6. Let rn = FFn+1 . Show that the even terms, r2m , are increasing and n the odd terms, r2m+1 , are decreasing. r for the Fibonacci numbers. Show rn conExercise D.1.7. Investigate lim √ n→∞ n 1+ 5 verges to the golden ratio, 2 .

98

D.1.3 Observation +pn−2 . Consider the difference pn qn−1 − pn−1 qn . We know pqnn = aannpqn−1 n−1 +qn−2 Using the recursion relations, this difference also equals

(an pn−1 + pn−2 )qn−1 − pn−1 (an qn−1 + qn−2 ).

(D.12)

This is the same (expand and cancel) as pn−2 qn−1 − pn−1 qn−2 . The key observation is as follows: pn qn−1 − pn−1 qn = −(pn−1 qn−2 − pn−2 qn−1 ). The index has reduced by one, and there has been a sign change. Repeat, and we get pn−2 qn−3 − pn−3 qn−2 . Doing n − 1 times in total, we get (−1)n−1 (p1 q0 − p0 q1 ). Substituting p1 = a0 a1 + 1, q1 = a1 , p0 = a0 and q0 = 1 gives Lemma D.1.8. pn qn−1 − pn−1 qn = (−1)n−1 .

(D.13)

So, even though a priori this difference should depend on a0 through an , it is in fact just −1 to a power. Similarly, one can show Lemma D.1.9. pn qn−2 − pn−2 qn = (−1)n an . Notice that the consecutive convergents to the continued fraction, satisfy Lemma D.1.10.

(D.14) pn pn−1 , qn qn−1

and

pn−2 , qn−2

pn pn−1 (−1)n−1 − = qn qn−1 qn qn−1

(D.15)

pn pn−2 (−1)n an − = qn qn−2 qn qn−2

(D.16)

and

To prove this, divide the previous relations by qn qn−1 and qn qn−2 .

D.1.4 Continued Fractions with Positive Terms REVIWERER: ALREADY DEFINED DIGITS Let x = pqnn be the continued fraction of [a0 , . . . , an ]. We call a0 , a1 , . . . , an the digits m . of the continued fraction. Let xm = pqm 99

Theorem D.1.11. If the digits a0 to an are positive, then the sequence x2m is an increasing sequence, the sequence x2m+1 is a decreasing sequence, and for every m, x2m < x < x2m+1 (if n 6= 2m or 2m + 1). Proof. x2m increasing means x0 < x2 < x4 < . . . . By Lemma D.1.10, x2(m+1) − x2m =

(−1)2m a2m . q2m q2(m+1)

(D.17)

Everything on the right hand side is positive, so x2(m+1) > x2m . The result for the odd terms is proved similarly; there we will have (−1)2m+1 instead of (−1)2m , and we will see the odd terms are decreasing. We know x0 < x2 < x4 < . . . and · · · < x5 < x3 < x1 . We know xn , the last guy, is either an x2m or an x2m+1 (depending on whether n is odd or even). It must be sandwiched somewhere in the middle. We will verify that x2m+1 −x2m is positive. Thus, xn must be between the two. We want to see how x2m , x2m+1 , x2m+2 and x2m+3 should be ordered. We claim the ordering should be x2m < x2m+2 < x2m+3 < x2m+1

(D.18)

Clearly, as the even terms are increasing and the odd terms are decreasing, x2m < x2m+2 and x2m+3 < x2m+1 . Thus, we need only show that x2m+3 is greater than x2m+2 . This follows immediately from Lemma D.1.10 (take n = 2m + 3 in the lemma). If n is even, xn is greater than all the other even terms; if n is odd, xn is less than all the other odd terms. Collecting the results now yields the theorem.

D.2 Uniqueness of Continued Fraction Expansions Theorem D.2.1 (Uniqueness of Continued Fraction Expansion). Let x = [a0 , . . . , aN ] = [b0 , . . . , bM ] be continued fractions with aN , bM > 1. Then N = M and ai = bi for i = 0 to N = M . Proof. We proceed by induction. a0 = [x], b0 = [x]. If [a0 , . . . , aN ] = [b0 , . . . , bN ], then h i h i [x], [a1 , . . . , aN ] = [x], [b1 , . . . , bM ] . (D.19) Then 100

[x] +

1 1 = [x] + . [a1 , . . . , aN ] [b1 , . . . , bM ]

(D.20)

Thus, [a1 , . . . , aN ] = [b1 , . . . , bM ]. We now have one fewer component, and the proof follows by induction. Given x, we can associate a continued fraction to x. a0 = [x], x = a0 +

1 1 = a0 + 0 a1 a1 +

1 a02

,

(D.21)

and so on. We write a prime over the last component to signify it need not be an integer; ie, it is the real number (greater than or equal to 1) that gives an equality. Note the previous components are integer, and the last is like a remainder. If ξ0 6= 0, ξ10 = a1 + ξ1 . If ξ1 6= 0, ξ11 = a2 + ξ2 , et cetera, where in general a0i = ξi−1 . If at some point ξi = 0, the process terminates. This means we have something like x = a0 +

1 a1 +

1 a2 +

...

.

(D.22)

1 + a1 N

Theorem D.2.2. A number is rational if and only if its continued fraction expansion is finite. Proof. Clearly, if the continued fraction expansion is finite, then the number is rational. The other direction is much harder. Let x = hk , (h, k) = 1, k > 0. Then as h = a0 + ξ0 , h = a0 k + ξ0 k, 0 ≤ ξ0 < 1, k

(D.23)

0 ≤ ξ0 k < k.

(D.24)

this implies that

Basically, ξ0 k is the remainder of the division of h by k. We continue this process. k1 = ξ0 k, and 1 k k = = = a1 + ξ1 (D.25) ξ0 kξ0 k1 We now have k1 ξ1 < k1 , and we define k2 = k1 ξ1 . We started with k and now have k1 , k2 , et cetera, a decreasing sequence of positive numbers k > k1 > k2 > · · · . a00 =

101

The sequence must eventually terminate, as each iteration gives us a smaller nonnegative number. We now have 1 k1 k1 = = = a2 + ξ2 , ξ1 k1 ξ1 k2

(D.26)

where k2 > k2 ξ2 = k3 . Exercise D.2.3. Let x have a periodic decimal expansion. Prove that x is rational. Exercise D.2.4. Let x be rational. What can you say about its decimal expansion?

D.3 Positive, Simple Convergents Recall Theorem D.1.11: If the quotients are positive, then x2n is an increasing sequence, x2n+1 is a decreasing sequence, and for all n, x2n < x < x2n+1 . The proof followed from looking at successive quotients, Lemma D.1.10. What is the goal? A decimal expansion of a number converges to the given number, even if the decimal expansion is infinite. We want to prove an analogous property for continued fractions. We described a process which associates a continued fraction to each number. We now show this process is well-defined, namely, that the continued fraction does equal the initial number. Looking at Theorem D.1.11, we show the even (odd) quotients converge to x from below (above). Theorem D.3.1. Let [a0 , . . . , an ] be a positive, simple continued fraction. Then 1. qn ≥ qn−1 ∀n ≥ 1, and qn > qn−1 if n > 1. 2. qn ≥ n, with strict inequality if n > 3. Proof: Recall q0 = 1, q1 = a1 ≥ 1, qn = an qn−1 + qn−2 . Each an > 0 and is an integer. Thus, an ≥ 1 and an qn−1 + qn−2 ≥ qn−1 , yielding qn ≥ qn−1 . If n > 1, qn−2 > 0, giving a strict inequality. We prove the other claim by induction. Suppose qn−1 ≥ n − 1. Then qn = an qn−1 + qn−2 ≥ qn−1 + qn−2 ≥ (n − 1) + 1 = n. If at one point the inequality is strict, it is strict from that point onward. 2

102

Exercise D.3.2. What can one prove about the pn s? Theorem D.3.3. Given a continued fraction expansion [a0 , . . . , an ] with quotient pn . Then pqnn is reduced. qn Proof. Assume not, and let d|pn and d|qn . Then d|(pn qn−1 − qn pn−1 ). By Lemma D.1.8, pn qn−1 − qn pn−1 = (−1)n−1 . Thus, d|(−1)n−1 , which implies d = ±1 and pn is reduced. qn We can sharpen Part Two of Theorem D.3.1: Theorem D.3.4. Let [a0 , . . . , an ] be a positive, simple continued fraction. Then qn2 > 2n−1 for n > 1. Proof. We proceed by induction, recalling that q0 = 1, q1 = a1 , and qm = am qm−1 + qm−2 . Thus, q2 ≥ 1q1 + q0 ≥ 2, proving the basis case. 2 Assume qn−1 > 2n−1 . As qn+1 = an+1 qn + qn−1 ≥ qn + qn−1 > 2qn−1 , we have 2 qn+1 ≥ (2qn−1 )2 > 22 2(n−1)−1 = 2n ,

(D.27)

completing the proof.

D.4 Convergence How well do continued fractions converge to a given number? Recall x = [a0 , a1 , . . . , an , a0n+1 ]. Then x= ¯ ¯ How large is ¯x −

¯

pn ¯ , qn ¯

a0n+1 pn + pn−1 . a0n+1 qn + qn−1

(D.28)

the difference between x and the nth convergent? ¯ ¯ ¯ a0n+1 pn + pn−1 pn pn ¯¯ ¯ − ¯x − ¯ = ¯ qn ¯ a0n+1 qn + qn−1 qn pn−1 qn − pn qn−1 = qn (a0n+1 qn + qn−1 ) (−1)n = 0 qn qn+1 103

(D.29)

as q10 = a01 , qn0 = a0n qn−1 + qn−2 , and by Lemma D.1.8, pn−1 qn − pn qn−1 = (−1)n . 0 How large can qn+1 be? How small? 0 Note an+1 < an+1 < an+1 + 1. Well, they could be equal, but only if we have a finite continued fraction. For simplicity, we are assuming we have an infinite continued fraction, so we don’t need to worry about trivial modifications at the last component. Thus, we are assuming x 6∈ Q. Note a0n is what we need to truncate an infinite continued fraction. Thus, we initially have x = [a0 , . . . , an , an+1 , . . . ] = [a0 , . . . , a0n ]. Thus, 0 qn+1 = a0n+1 qn + qn−1 > an+1 qn + qn−1 = qn+1

(D.30)

0 qn+1 < (an+1 + 1)qn + qn−1 = an+1 qn + qn−1 + qn = qn+1 + qn ≤ an+2 qn+1 + qn = qn+2 ,

(D.31)

and

as an+2 is a positive integer. We have proven Theorem D.4.1.

¯ ¯ ¯ 1 pn ¯¯ 1 ¯ < ¯x − ¯ < , qn qn+2 ¯ qn ¯ qn qn+1

(D.32)

or 1 qn+2

¯ ¯ ¯ ¯ < ¯pn − qn x¯
1. This will give qn+1 ≥ 3. Thus, pqnn is close to x. Consider the ball of radius 3q1n about pqnn . Then x is within this ball; however, pq is not within this ball. pq is at least q1n units from pqnn . Therefore, the closest x can be to pq is 3q2n , or |x − pq | ≥ 3q2n . But |x − pqnn | ≤ 3q1n . Therefore, ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ p p ¯ ¯ ¯ n¯ (E.6) ¯x − ¯ < ¯x − ¯, ¯ ¯ qn ¯ q¯ as was to be proved. Case 2: qn−1 < q < qn By our assumptions on q, pq 6= We will find µ and ν such that

pn qn

or

pn−1 . qn−1

µpn + νpn−1 = p, µqn + νqn−1 = q.

(E.7)

Assume relations of the above form. Multiplying the first by qn−1 and the second by pn−1 yields µ(pn qn−1 − pn−1 qn ) µ

= =

pqn−1 − qpn−1 ±(pqn−1 − qpn−1 ).

(E.8)

Similarly we find that ν = ±(pqn − qpn−1 ), where we use Lemma D.1.8 to get pn qn−1 − pn−1 qn = ±1. Thus, we can find integers µ and ν such that Equation E.7 is true. 112

(E.9)

As q = µqn + νqn−1 < qn , µ and ν must have opposite signs. Further, we know pn −qn x and pn−1 −qn−1 x have opposite signs (the even convergents are increasing, the odd convergents are decreasing: see Exercise D.5.2). Therefore, µ(pn − qn x) and ν(pn−1 − qn−1 x) have the same sign. But p − qx = µ(pn − qn x) + ν(pn−1 − qn−1 x).

(E.10)

|p − qx| > |pn−1 − qn−1 x| > |pn − qn x|.

(E.11)

Thus

The above is the desired inequality. Exercise E.1.3. Show that qn ≥ 3 for n ≥ 2.

E.2 E.2.1

Measures of Sets with Given Continued Fraction Approximations ¯ ¯ ¯ p¯ ¯x − q ¯ ≤

C q 2+²

Infinitely Often

Theorem E.2.1. Let C, ² be positive constants. Let S be the set of all points x ∈ [0, 1] such that there are infinitely many coprime integers p, q with ¯ ¯ ¯ ¯ ¯x − p ¯ ≤ C . (E.12) ¯ q ¯ q 2+² Then the length of S, |S|, equals 0. Proof. Let N > 0. Let SN be the set of all points x ∈ [0, 1] such that there are p, q ∈ Z, q > N , for which ¯ ¯ ¯ ¯ ¯x − p ¯ ≤ C . (E.13) ¯ q ¯ q 2+² If x ∈ S, then x ∈ SN for every N . Thus, if we can show that the measure of the sets SN becomes arbitrarily small as N → ∞, then the measure of S must be zero. How big can SN be? For a given q, there are at most q choices for p. Given a pair C of pq ? Clearly, the set of such points is (p, q), how many x’s are there within q2+² the interval 113

Ip,q =

³p q

p C ´ + 2+² . q q

C



q

, 2+²

(E.14)

Note that the measure of Ip,q is q2C 2+² . Let Iq be the set of all x in [0, 1] that are within C of a rational number with denominator q. Then q 2+² Iq ⊂

q [

Ip,q

(E.15)

p=0

and therefore

|Iq |



q X

|Ip,q |

p=0

= =

2C q 2+² q + 1 2C 4C < 1+² . q q 1+² q

(q + 1) ·

(E.16)

Then |SN | ≤

X

|Iq |

q>N

=

X 4C q 1+² q>N






5

(E.20)

(E.21)

for at least one of any three consecutive values m − 1, m, m + 1 of i. Assume √ a0n−1 + bn−1 ≤ 5, √ a0n + bn ≤ 5. By definition a0n−1 = an−1 +

1 a0n

(E.22)

and qn−1 an−1 qn−2 + qn−3 qn−3 1 = = = an−1 + = an−1 + bn−1 . bn qn−2 qn−2 qn−2 115

(E.23)

Hence √ 1 1 + = a0n−1 + bn−1 ≤ 5. 0 an bn

(E.24)

Therefore 1 a0n 0 an

µ



1 ≤ 1 = 5− b ¶ n µ √ 1 = 6 − 5 bn + . bn a0n



¶ µ √ √ 1 ≤ ( 5 − bn ) 5− bn (E.25)

In other words

√ 1 ≤ 5. (E.26) bn Since bn is rational, the inequality must be strict. Completing the square we obtain bn +

1 √ bn > ( 5 − 1). 2

(E.27)

Now suppose a0m−1 + bm−1 ≤ a0m + bm a0m+1 + bm+1



5 √ ≤ 5 √ ≤ 5.

(E.28)

Applying the above reasoning to n = m, n = m + 1, we obtain 1 √ ( 5 − 1) 2 1 √ > ( 5 − 1). 2

bm > bm+1

By (E.22) with n = m + 1 and (E.23) and (E.24) with n = m,

116

(E.29)

am = < < = =

1 − bm bm+1 1 1 √ − ( 5 − 1) bm+1 2 1 1 √ √ − ( 5 − 1) 1 ( 5 − 1) 2 2 1 √ 1 √ ( 5 + 1) − ( 5 − 1) 2 2 1.

(E.30)

However, am is a positive integer, and there are no positive integers less than 1. Contradiction. 1 From the above, we see that the approximation is often better than √5q 2 . For example, if our continued fraction expansion has infinitely many 3s in its expansion, we can do at least as well as 3q12 infinitely often. √1 is the best one 5q 2 √ 1+ 5 = [1, 1, 1, . . . ]. 2

Exercise E.2.4. Show that studying the golden mean,

can have for all irrationals by

Exercise E.2.5. Let x be any irrational other than the golden mean. How well can x be approximated? See, for example, [HW].

E.3 Convergents are the Best Rational Approximations REVIEWER = ISN’T THIS JUST THEOREM 7.1.1 Theorem E.3.1. Let x = [a0 , a1 , . . . ] with nth convergent 0 < q ≤ qn and pq 6= pqnn . Then ¯p ¯ ¯p ¯ ¯ n ¯ ¯ ¯ ¯ − x¯ < ¯ − x¯. qn q

pn . qn

Suppose n > 1,

(E.31)

Proof. Simplify p and q so that (p, q) = 1. It is sufficient to show |pn − qn x| < |pn−1 − qn−1 x|. 117

(E.32)

It will be sufficient to prove the above for qn−1 < q ≤ qn . Case 1: q = qn We handled this case last time. SAY WHERE HANDLED Case 2: qn−1 < q < qn Thus, p pn pn−1 , . 6= q qn qn−1

(E.33)

Find µ, ν such that µpn + νpn−1 = p µqn + νqn−1 = q.

(E.34)

As ¯ ¯ pn pn−1 ¯ ¯ qn qn−1

¯ ¯ ¯ = (−1)n−1 , ¯

(E.35)

by Cramer’s rule we can find such a µ and ν. Thus, µ = ±(pqn−1 − qpn−1 ) ν = ±(pqn − qpn ).

(E.36)

q = µqn + νqn−1 < qn ,

(E.37)

Since

we find that µ and ν have opposite signs. On the other hand, pn − qn x, pn−1 − qn−1 x

(E.38)

µ(pn − qn x), ν(pn−1 − qn−1 x)

(E.39)

have opposite signs; therefore,

have the same signs. 118

This implies p − qx = µ(pn − qn x) + ν(pn−1 − qn x),

(E.40)

and there is no cancellation (as the two terms on the right have the same sign). As ν is an integer, we find ¯ ¯ ¯ ¯ ¯p − qx¯

> ≥ >

¯ ¯ ¯ ¯ ¯ν(pn−1 − qn−1 )¯ ¯ ¯ ¯ ¯ ¯pn−1 − qn−1 x¯ ¯ ¯ ¯ ¯ ¯pn − qn x¯.

(E.41)

This completes the proof. Thus, the convergents provide the best rational approximation to a given number. Now that we know the convergents are the best rational approximations, we now investigate how well they approximate.

E.4

Weaker Approximation Properties of Convergents

In our proof that every irrational can be approximated (infinitely often) as well as √ 1 , we proved GIVE REF 5q 2 Theorem E.4.1. Of any three consecutive convergents to a continued fraction, at least one satisfies ¯ p ¯¯ 1 ¯ (E.42) ¯x − ¯ < √ 2 . q 5q One can show

¯ ¯ ¯ ¯ Theorem E.4.2. If x 6∈ Q and pq , p an q relatively prime, satisfies ¯x − pq ¯ < then pq is a convergent of x. Namely, for some n, p = pn and q = qn . ¯ ¯ ¯ ¯ Exercise E.4.3. Prove the above theorem. Is it true for ¯x − pq ¯ < q12 ?

119

1 , 2q 2

Theorem E.4.4. Of any two consecutive convergents, one will satisfy ¯ p ¯¯ 1 ¯ . ¯x − ¯ < q 2q 2

(E.43)

Proof. Let x be irrational. We can write p θ² 1 − x = 2 , ² = ±1, 0 < θ < . q q 2

(E.44)

Extend pq as a finite continued fraction [a0 , . . . , an ]. There is non-uniqueness in finite continued fractions (can make it have either even or odd number of terms). Choose ² = (−1)n−1 . Find w such that x= We will consider Claim E.4.5.

pn qn

and

wpn + pn−1 pn p , = . wqn + qn−1 qn q

(E.45)

pn−1 . qn−1

pn−1 pn , qn−1 qn

are in fact convergents to x.

Choose w like this: wqn x + qn−1 x = wpn + pn−1 wqn x − wpn = pn−1 − qn−1

(E.46)

Therefore w =

pn−1 − qn−1 x . qn x − pn

(E.47)

ζ+R with S > 1 and P, Q, R, S are integers such that Lemma E.4.6. If x = PQζ+S P Q > S > 0 and P S − QR = ±1, then RS and Q are two consecutive convergents to x.

How does Lemma E.4.6 imply the Theorem? In order to use the lemma, we need to show that ζ > 1, which translates to showing w > 1. Now,

120

²θ ²θ = 2 2 q qn pn = −x qn pn wpn + pn−1 = − qn wqn + qn−1 pn qn−1 − pn−1 qn = qn (wqn + qn−1 ) (−1)n−1 . = qn (wqn + qn−1 )

(E.48)

Therefore, qn =θ wqn + qn−1

(E.49)

which gives 1 qn−1 − > 1, θ qn which completes the proof of the Theorem. w=

(E.50)

We must now prove Lemma E.4.6. Proof. Let P pn = [a0 , . . . , an ] = . Q qn

(E.51)

We must have P = pn and Q = qn , as these are reduced fractions. Thus, (P, Q) are relatively prime, Q > 0. Choose n such that P S − QR = ±1 = (−1)n−1 . In particular, we have p n S − qn R

= =

(−1)n−1 pn qn−1 − pn−1 qn .

Rewriting gives 121

(E.52)

pn (S − qn−1 ) = qn (R − pn−1 ).

(E.53)

As qn |pn (S − qn−1 ), this implies qn |S − qn−1 . As qn = Q > S > 0, qn > qn−1 > 0, we must have |S − qn−1 | < qn .

(E.54)

As qn |S − qn−1 , this forces S = qn−1 and R = pn−1 . Hence x=

pn ζ + pn−1 qn ζ + qn−1

(E.55)

which implies x = [a0 , . . . , an−1 , ζ],

(E.56)

proving the lemma.

E.5 Exponent (or Order) of Approximation Definition E.5.1 (approximated to order n). ξ is approximated by rationals to order n (n need not be an integer) if ∃k = k(ξ) such that ¯p ¯ k(ξ) ¯ ¯ − ξ ¯ ¯ < q qn has only finitely many solutions, and n is the smallest such n.

(E.57)

Equivalently, Definition E.5.2 (approximation exponent). ξ has order (or exponent) τ (ξ) if τ (ξ) is the smallest number such that ∀e > τ (ξ), the inequality ¯p ¯ 1 ¯ ¯ (E.58) ¯ − ξ¯ < e q q has only finitely many solutions. Example E.5.3. A rational number has approximation exponent 1 and no more.

122

Why? If ξ = ab and r = This implies

s t

6= ab , then sb − at 6= 0. Thus, |sb − at| ≥ 1 (as it is integral). ¯ s ¯¯ ¯ ¯ξ − ¯ t

= = ≥

¯a s¯ ¯ ¯ ¯ − ¯ b t |sb − at| bt 1 . bt

(E.59)

If the rational ξ had approximation exponent e > 1 we would find 1 1 1 , which implies e > . e t t bt < b. Since b is fixed, there are only finitely many such t. 2 |ξ − r|
0 and (pn , qn ) = 1. Then { pqnn }n≥1 is a monotone increasing sequence converging to x. In particular, all these rational numbers are distinct. Not also that qn must divide 10n! , which implies qn ≤ 10n! .

(F.22)

Using this, we get µ ¶ X 1 pn 1 1 1 0n <
qk−1 Also, qk+1 = ak+1 qn + qk−1 , so we get qk+1 qk−1 = ak+1 + < ak+1 + 1. qk qk

(F.26)

Hence writing this inequality for k = 1, · · · , n − 1 we obtain qn = q1

q2 q3 qn ··· < (a1 + 1)(a2 + 1) · · · (an + 1) q1 q2 qn−1 1 1 = (1 + ) · · · (1 + )a1 · · · an a1 an n n < 2 a1 · · · an = 2 101!+···+n! < 102n! = a2n

(F.27)

Combining equations F.25 and F.27 we get: ¯ ¯ ¯ ¯ 1 1 ¯y − pn ¯ < = n+1 ¯ ¯ qn an+1 an µ ¶ n2 µ ¶ n2 1 1 < < 2 an qn2 1 = n/2 . qn 129

(F.28)

In this way we get, just as in the previous theorem, an approximation of y by rationals to arbitrary order. This proves that y is transcendental.

130

Appendix G Distribution of Digits of Continued Fractions G.1 Introduction Given α ∈ R, we calculate its continued fraction expansion. Without loss of generality, we may assume α ∈ (0, 1), as this shift changes only the zeroth digit. Thus, α = [0, a1 , a2 , a3 , a4 , . . . ]

(G.1)

Given any sequence of positive integers ai , we can construct a number α with these as its digits. However, for a generic α chosen uniformly in (0, 1), how often do we expect to observe the nth digit in the continued fraction expansion equal to 1? To 2? To 3? And so on. If α ∈ Q, then α has a finite continued fraction expansion; if α is a quadratic irrational, then its continued fraction expansion is periodic. In both of these cases there are really only finitely many digits; however, if we stay away from rationals and quadratic irrationals, then α will have a bona fide infinite continued fraction expansion, and it makes sense to ask the above questions. We recall some notation: we can truncate the continued fraction expansion after the nth digit, obtaining an approximation pn = [0, a1 , . . . , an ] qn to α, with each ai ∈ N. Further, there exists rn+1 ∈ R, rn+1 ∈ [1, ∞) such that 131

(G.2)

α = [0, a1 , . . . , an , rn+1 ].

G.2

(G.3)

Distribution of a1(α) = k

What is the measure of α ∈ (0, 1) such that a1 (α) = 7? α =

1 7+

1 a2 +

(G.4) 1

..

.

Clearly, if α ≤ a1 ≥ 8. A little thought shows that if 18 < α ≤ 71 , then a1 (α) = 7, because a1 = b α1 c. Note bxc is the greatest integer less than or equal to x. 1 So, the measure of α ∈ (0, 1) such that a1 = 7 is 17 − 18 . This is 7·8 , which is 1 approximately 72 . 1 More generally, we find that the measure of α ∈ (0, 1) such that a1 (α) = k is k(k+1) ≈ 1 . k2 1 , 8

G.3 Bounds for an+1(α) = k Suppose one already has digits a1 , . . . , an . The α ∈ (0, 1) whose first n digits are these numbers is a segment of (0, 1). We want the sub-interval where an+1 = k. Thus, we want to find |{α ∈ (0, 1) : if i ≤ n, ai (α) = ai ; an+1 (α) = k}| |{α ∈ (0, 1) : if i ≤ n, ai (α) = ai }|

(G.5)

This is the conditional probability of observing an+1 (α) = k, given the first n digits have the prescribed values a1 , . . . , an :

Prob(an+1 (α) = k|ai (α) = ai for i ≤ n − 1)

= =

132

Prob(an1 (α) = k, ai (α) = ai (i ≤ n)) Prob(ai (α) = ai (i ≤ n)) Prob(An+1 (k) ∩ A1,...,n (a1 , . . . , an )) (G.6). Prob(A1,...,n (a1 , . . . , an ))

G.3.1

Prob(A1,...,n (a1 , . . . , an ))

We calculate Prob(ai (α) = ai (i ≤ n)).

(G.7)

Thus, we want all α ∈ (0, 1) whose continued fraction begins with [0, a1 , . . . , an ]. We have pn qn α

=

[0, a1 , . . . , an ]

=

[0, a1 , . . . , an , rn+1 ], 1 ≤ rn+1 < ∞.

(G.8)

The recursion relations for the pn s and qn s are pn+1 qn+1 pn qn−1 − pn−1 qn

= = =

pn an+1 + pn−1 qn an+1 + qn−1 (−1)n−1 .

(G.9)

Therefore, the interval is ·

¸ pn pn + pn−1 , . qn qn + qn−1 A straightforward calculation shows this has length 1 ³ qn2 1 +

G.3.2

qn−1 qn

´.

(G.10)

(G.11)

Prob(An+1 (k) ∩ A1,...,n (a1 , . . . , an ))

We can write pn qn α

=

[0, a1 , . . . , an ]

=

[0, a1 , . . . , an , rn+1 ], k ≤ rn+1 ≤ k + 1.

(G.12)

Thus, we need to find the length of the interval going from [0, a1 , . . . , an , k] to [0, a1 , . . . , an , k+ 1]. 133

The recursion relations for the ps and qs are pn+1 qn+1 pn qn−1 − pn−1 qn

= = =

pn an+1 + pn−1 qn an+1 + qn−1 (−1)n−1 .

(G.13)

Therefore, the interval is ·

¸ pn k + pn−1 pn (k + 1) + pn−1 , . qn k + qn−1 qn (k + 1) + qn−1 A straightforward calculation shows the length equals ³ qn2 k 2 1 +

G.3.3

qn−1 kqn

1 ´³ 1+

1 k

+

qn−1 kqn

(G.14)

´.

(G.15)

Prob(An+1 (k))

Putting the pieces together, we find the conditional probability that an+1 (α) = k, given that ai (α) = ai for i ≤ n, is (after simple algebra) 1 ³ k2 1 +

1 + qn−1 ´ ³ qn qn−1 1 + k1 + kqn

qn−1 kqn

´.

(G.16)

We bound the second factor from above and below, independent of k. As qn−1 ≤ qn , the second factor is at most 2 (as the denominator is greater than 1 and the numerator is at most 2). For the lower bound, note the second factor is monotonically increasing with k. Thus, its smallest value is attained when k = 1, namely 31 . We find 1 Prob(An+1 (k) ∩ A1,...,n ) 2 ≤ ≤ 2. 2 3k Prob(A1,...,n (a1 , . . . , an )) k Thus, we see the conditional probability is proportional to independent of k). Therefore. Prob(an+1 (α) = k)

=

X (a1 ,...,an )∈Nn

1 k2

(G.17)

(and the bounds are

Prob(An+1 (k) ∩ A1,...,n (a1 , . . . , an )) · Prob(A1,...,n (a1 , . . .(G.18) , an )). Prob(A1,...,n (a1 , . . . , an )) 134

As each conditional probability is at least 3k12 and at most k22 , and X Prob(A1,...,n (a1 , . . . , an )) = 1

(G.19)

(a1 ,...,an )∈Nn

(because every α ∈ (0, 1) is in exactly one of the above intervals, and the intervals are disjoint), we have 1 2 ≤ Prob(an+1 (α) = k) ≤ 2 . (G.20) 2 3k k Note the above bounds are independent of n + 1. Later we will derive a more exact value for Prob(an+1 (α) = k). Corollary G.3.1. There exist constants 0 < C1 < C2 < ∞ such that C1 C2 < Prob(an+1 (α) ≥ k) < . k k Proof. The sum

1 k2

+

1 (k+1)2

(G.21)

+ · · · ≈ k1 .

Corollary G.3.2. Consider all α ∈ (0, 1) such that for all n, an (α) ≤ K for some fixed constant K. The set of such α has length 0. In other words, Prob(∀n, an (α) ≤ K) = 0.

(G.22)

Proof. We look and see what fraction of the sub-intervals we keep losing. Let β = 1 − CK1 , where C1 is as in Corollary G.3.1. We can show that the probability that the first n digits are all at most K is β n , as the requirement that each ai be at most K causes us to keep at most β of the sub-intervals we still have. As n → ∞, β n tends to 0. ¯ ¯ ¯ ¯ If α has ai (α) > N , then ¯α − pqnn ¯ ≤ N1q2 . Letting ² = N1 , we see we can approximate n to within q²2 . As almost all α have infinitely many i with ai (α) > N , for almost n all α we can find infinitely many pqnn such that the approximation is as good as q²2 . n

G.4

Distribution of an(α) = k

G.4.1

Statement of Kuzmin’s Theorem

We have showed that, independent of n, the probability that the nth digit is k is at least 3k12 and at most k22 . 135

Gauss conjectured that ´as n → ∞, the probability that the nth digit equals k converges ³ 1 to log2 1 + k(k+2) . GIVE A REF In 1928, Kuzmin [?] proved the above, with an explicit error term. Namely, Theorem G.4.1. [Kuzmin] There exist positive constants A and B such that ¯ µ ¶¯ √ ¯ ¯ 1 A −B n−1 ¯Prob (an (α) = k) − log2 1 + ¯ ≤ . (G.23) e ¯ k(k + 2) ¯ k(k + 1) The error term has been improved by Levy [?] to Ae−Cn . We will follow Khinchin’s exposition (see [Kh]). As we are interested in the limit as n → ∞, it is sufficient to investigate the properties of the digits for α ∈ (0, 1), since by changing the zeroth digit we can make the above hold. The general formulation is α rn (α) zn (α) mn (x)

= = = = =

[0, a1 , a2 , . . . , an−1 , an , . . . ] ∈ (0, 1) [0, a1 , a2 , . . . , an−1 , rn (α)] [an , an+1 , . . . ] rn (α) − an ∈ (0, 1) |{α ∈ (0, 1) : zn (α) < x}|.

(G.24)

Thus, mn (x) is the measure (or length, proportion or probability) of α ∈ (0, 1) with zn (α) = [0, an+1 , . . . ] < x. 1 Remark G.4.2. Note an (α) = k is equivalent to k+1 < zn (α) ≤ k1 . Thus, to calculate the probability that an (α) = k, it is sufficient to evaluate mn ( k1 ) − 1 mn ( k+1 ).

Remark G.4.3. Note Prob (an (α) = k) = |{α : α ∈ (0, 1) and an (α) = k}| .

G.4.2

Sketch of the Proof of Theorem G.4.1

Kuzmin’s theorem follows from the following:

136

(G.25)

Theorem G.4.4. Let f0 (x), f2 (x), . . . be a sequence of real valued functions on (0, 1) such that for all n ≥ 0, fn+1 (x) =

∞ X k=1

1 fn (k + x)2

µ

1 k+x

¶ .

(G.26)

If 0 < f0 (x) < M and |f00 (x)| < µ for some M and µ, then for all x, ¯ ¯ Z 1 √ ¯ ¯ ¯fn (x) − a ¯ ≤ Ae−B n , a = 1 f0 (x)dx ¯ 1 + x¯ ln 2 0

(G.27)

with A = A(M, µ) > 0 and B > 0 absolute positive constants. We first show how Theorem G.4.4 implies Kuzmin’s theorem. As zn+1 (α) =

1 , an+1 + zn (α)

(G.28)

1 we see zn+1 (α) < x if and only if for some k, k+x < zn (α) ≤ k1 ; this set has measure 1 mn ( k1 ) − mn ( k+x ). Summing over k yields

¶¸ µ ¶ µ ∞ · X 1 1 mn+1 (x) = mn − mn . k k + x k=1

(G.29)

Differentiating term by term gives m0n+1 (x)

=

∞ X k=1

1 m0 (k + x)2 n

µ

1 k+x

¶ .

(G.30)

Exercise G.4.5. Justify the term by term differentiation by induction. Note m0 (x) = x. Note if mn (x) is continuous and bounded, Taylor expanding for x ∈ (0, 1) yields a uniformly convergent series. Now the functions {m0n (x)} satisfy the conditions of the theorem for any M > 1 and any µ > 0 (note a = ln12 ). Therefore, ¯ ¯ √ ¯ 0 1 ¯¯ ¯mn (x) − 1 ≤ Ae−B n . (G.31) ¯ ¯ ln 2 1 + x Integrating (remember the length of (0, 1) is just 1) gives 137

¯ ¯ √ ¯ ¯ ln(1 + x) ¯mn (x) − ¯ ≤ Ae−B n . ¯ ¯ ln 2

(G.32)

To have an (α) = k, we must have 1 1 < zn−1 (α) ≤ . k+1 k

(G.33)

Thus,

|{α ∈ (0, 1) : an (α) = k}|

= =

µ ¶ µ ¶ 1 1 − mn−1 mn−1 k k+1 Z 1 k m0n−1 (x)dx. (G.34) 1 k+1

The integral is ¯1 k ln(1 + x) ¯¯ ¯ ln 2 ¯ 1

=

³ ln 1 +

1 k(k+2)

ln 2

´ ,

(G.35)

k+1

plus an error at most Ae

G.4.3

√ −B n−1

, which is what we set out to prove. 2.

Preliminary Lemmas

We now state some preliminary results which are needed in the proof of Theorem G.4.4. As always, pn = [a1 , . . . , an ]; qn we should really write

pn (α) . qn (α)

(G.36)

The recursion relations for the ps and qs are

pn+1 qn+1 pn qn−1 − pn−1 qn

= = =

pn an+1 + pn−1 qn an+1 + qn−1 (−1)n−1 .

(G.37)

For each n-tuple (a1 , . . . , an ) ∈ Nn , we look at all α ∈ (0, 1) whose continued fraction expansion begins with [0, a1 , . . . , an ]. As pn+1 = pn an+1 + pn−1 , we see 138

numbers beginning with [0, a1 , . . . , an ] are given by the following union of intervals: ¸ · ¸ ∞ · [ pn (k + 1) + pn−1 pn k + pn−1 pn pn + pn−1 , = , , qn (k + 1) + qn−1 qn k + qn−1 qn qn + qn−1 k=1

(G.38)

as the extremes are given by k = 1 and k = ∞. We have thus shown Lemma G.4.6.h The set of α ¯ that the first ¯ n digits are [0, a1 , . . . , an ] i ∈ (0, 1) such ¯ pn pn +pn−1 ¯ pn pn +pn−1 is the interval qn , qn +qn−1 of length ¯ qn − qn +qn−1 ¯. Note each pqnn = [0, a1 , . . . , an ] corresponds to a unique subinterval. As each string [0, a1 , . . . , an ] leads to a different subinterval, and the subintervals are disjoint, we have

[0, 1]

·

[

=

(a1 ,...,an )∈Nn

1

X

=

pn pn + pn−1 , qn qn + qn−1

¸

¯ ¯ ¯ pn pn + pn−1 ¯ ¯ − ¯. ¯ qn ¯ q + q n n−1 n

(G.39)

(a1 ,...,an )∈N

Henceforth we shall assume the functions fn (x) satisfy the conditions of Theorem G.4.4. Lemma G.4.7. For all n ≥ 0, fn (x) =

X (a1 ,...,an )∈Nn

µ f0

pn + xpn−1 qn + xqn−1

¶ ·

1 . (qn + xqn−1 )2

(G.40)

Note (a1 , . . . , an ) ∈ Nn means consider all n-tuples where each ai is a positive integer. Thus, summing over such n-tuples is the same as summing over all possible length n beginnings for continued fractions. Exercise G.4.8. Prove the above lemma by induction. For n = 0, remember we have p0 = 0, q0 = 1, p−1 = 1 and q−1 = 0. Assuming the result is true for fn (x), substituting in Equation G.26 yields the claim for fn+1 (x). A little care is needed in the above, as we are going from n-tuples (a1 , . . . , an ) ∈ Nn to (n + 1)-tuples (a1 , . . . , an+1 ) ∈ Nn+1 . In the substitution, we will evaluate f0 n k+pn−1 )+xpn n+1 +xpn n+1 at (p . By the recursion relations, this is just pqn+1 , and now pqn+1 (qn k+qn−1 )+xqn +xqn gives us an arbitrary (n + 1)-tuples [0, a1 , . . . , an+1 ]. 139

Lemma G.4.9. For all n ≥ 0, |fn0 (x)|
2n−1 (see Theorem D.3.4), ¯ ¯ P P ¯ pn pn +pn−1 ¯ 1 = − 3. =1 ¯ (a1 ,...,an )∈Nn qn (qn +qn−1 ) (a1 ,...,an )∈Nn qn qn +qn−1 ¯ completes the proof (the last result follows from Lemma G.4.6). Lemma G.4.10. For x ∈ (0, 1), if t T < fn (x) < , 1+x 1+x

(G.43)

t T < fn+1 (x) < . 1+x 1+x

(G.44)

then

Exercise G.4.11. Prove the above. Substitute the assumed bounds for fn (x) in Equation G.26. Lemma G.4.12. For all n ≥ 0, Z 1 Z fn (x)dx = 0

1

f0 (x)dx.

(G.45)

0

Exercise G.4.13. Prove the above lemma by induction, using Equation G.26.

140

G.4.4

Proof of Theorem G.4.4

Choose m such that for all x ∈ [0, 1], m ≤ f0 (x) < M . (For us, f0 (x) = m00 (x), and we can take m = 1 and M any number greater than 1). Letting g = m2 and G = 2m, we immediately see that g G < f0 (x) < , x ∈ [0, 1]. 1+x 1+x

(G.46)

For each n ≥ 0, let φn (x) = fn (x) −

g . 1+x

(G.47)

Note for any C, h(x) = C ln(1 + x) satisfies ONE OF TWO FORMULAS BELOW IS WRONG µ ¶¸ ∞ · µ ¶ X 1 1 h(x) = h −h . k k+x k=1

(G.48)

Differentiating (justify the term by term differentiation) gives µ ¶¸ ∞ · µ ¶ X 1 1 0 0 h (x) = h −h . k k+x k=1 0

(G.49)

C As h0 (x) = 1+x , the above plus the assumptions on {fn (x)} imply that the sequence of functions {φn (x)}∞ n=0 satisfy Equation G.26. In particular, all consequences that we derived for families satisfying Equation G.26 hold for the family {φn (x)}. n−1 . Then As before, for notational convenience let un = pqnn +xp +xqn−1

φn (x) =

X (a1 ,...,an

φ0 (un )

)∈Nn

1 . (qn + xqn−1 )2

(G.50)

As the denominator above is less than 2qn (qn + qn−1 ), we obtain the lower bound φn (x) >

X

φ0 (un )

(a1 ,...,an )∈Nn

1 . qn (qn + qn−1 )

(G.51)

For fixed (a1 , . . . , anh) ∈ Nn , byi Lemma G.4.6 we are working with the subintern−1 val I(a1 ,...,an ) = pqnn , pqnn +p . Summing over all n-tuples gives us [0, 1], each +qn−1 subinterval has length Theorem we have

1 . qn (qn +qn−1 )

Thus, on each subinterval, by the Mean Value

141

Z

pn +pn−1 qn +qn−1 pn qn

φ0 (z)dz =

φ00 (u0n )

· ¸ 1 pn pn + pn−1 0 , un ∈ , . (G.52) qn (qn + qn−1 ) qn qn + qn−1

Summing over all n-tuples and multiplying by 12 yields Z 1 X 1 1 φ0 (u0n ) φ0 (z)dz = . 2 qn (qn + qn−1 ) 0 n

(G.53)

(a1 ,...,an )∈N

Combining the above gives 1 1 φn (x) − φ0 (z)dz > 2 2

X

[φ0 (un ) − φ0 (u0n )]

(a1 ,...,an )∈Nn

As the derivative of φ0 (x) = fn (x) −

g 1+x

1 . (G.54) qn (qn + qn−1 )

is less than µ + g,

|φ0 (un ) − φ0 (u0n )| < (µ + g)|un − u0n |.

(G.55)

1 As un , u0n ∈ I(a1 ,...,an ) , |un −u0n | < qn (qn +q . As qn2 < 2n−1 , we obtain |φ0 (un ) − φ0 (u0n )| < n−1 ) µ+g . 2n−1 Substituting into Equation G.54 gives Z 1 1 µ+g µ+g φn (x) > φ0 (z)dz − n = L − n . (G.56) 2 0 2 2

Recalling the definition of φ gives fn (x) >

g µ+g g + L − 2−n+1 (µ + g) g1 +L− n > = . 1+x 2 1+x 1+x

A similar argument applied to ψn (x) = fn (x)
0, for n large

(G.57)

(G.58)

(G−g) ln 2 . 2

g < g 1 < G1 < G and

142

(G.59)

G1 − g1 < G − g − (L + L0 ) + 2−n+2 (µ + G). Letting δ = 1 −

ln 2 , 2

(G.60)

we find G1 − g1 < (G − g)δ + 2−n+2 (µ + G).

(G.61)

We have proved Lemma G.4.14. Assume 1.

g 1+x

< f0 (x)