Synthesis and Applications of Chaotic Maps

Synthesis and Applications of Chaotic Maps by Alan R. Rogers, BE, MEngSc Department of Mathematical Physics NUI Maynooth PhD Thesis Submitted to the National University of Ireland for the degree of Doctor of Philosophy November 2005 Supervisors: Professor Daniel M. Heffernan Professor Robert N. Shorten

Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . xii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction

1

1.1

Motivations and Objectives . . . . . . . . . . . . . . . . . . .

1

1.2

Structure and Contributions . . . . . . . . . . . . . . . . . . .

3

2 A History of Applied Chaos

5

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2

Chaos: A nonlinear history . . . . . . . . . . . . . . . . . . .

6

2.2.1

The Three-body problem . . . . . . . . . . . . . . . .

6

2.2.2

Horseshoes and Butterflies . . . . . . . . . . . . . . . .

12

2.2.3

Period Three Implies Chaos . . . . . . . . . . . . . . .

15

2.2.4

Feigenbaum Numbers . . . . . . . . . . . . . . . . . .

16

2.2.5

Chaos under control . . . . . . . . . . . . . . . . . . .

18

2.3

Characteristics of Chaotic Maps . . . . . . . . . . . . . . . . .

20

2.4

Chaos as an engineering tool . . . . . . . . . . . . . . . . . .

39

2.4.1

Communicating with Chaos . . . . . . . . . . . . . . .

40

2.4.2

Chaotic Information Storage and Retrieval . . . . . .

43

2.4.3

Chaos and Cryptography . . . . . . . . . . . . . . . .

45

i

CONTENTS

2.5

2.4.4

Chaos and Computing . . . . . . . . . . . . . . . . . .

47

2.4.5

Digital Watermarking . . . . . . . . . . . . . . . . . .

48

2.4.6

Other Applications . . . . . . . . . . . . . . . . . . . .

49

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

3 Application of Chaotic Maps to Pattern Classification

52

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

3.2

Pattern Classification

. . . . . . . . . . . . . . . . . . . . . .

53

3.2.1

The Pattern Classification Problem . . . . . . . . . . .

54

3.2.2

Traditional Pattern Classification Techniques . . . . .

54

3.2.3

Pattern Classification using Chaos in Neural Networks 61

3.3

3.4

3.5

The Baker’s Map: A natural XOR gate . . . . . . . . . . . .

62

3.3.1

The Generalized Baker’s map . . . . . . . . . . . . . .

63

3.3.2

Solving the XOR problem . . . . . . . . . . . . . . . .

66

Training the Pattern Classifier . . . . . . . . . . . . . . . . .

74

3.4.1

Simulated Annealing . . . . . . . . . . . . . . . . . . .

74

3.4.2

Training and Testing the Classifying Lines . . . . . . .

75

Summary and Conclusions . . . . . . . . . . . . . . . . . . . .

83

4 Synthesis of Chaotic Maps with Arbitrary Invariant Densities

85

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.2

Background: The Inverse Frobenius-Perron Problem . . . . .

87

4.2.1

The Frobenius-Perron equation . . . . . . . . . . . . .

88

4.2.2

The Frobenius-Perron operator in explicit form . . . .

89

4.2.3

The Inverse Frobenius-Perron Problem and the FPO as a Markov operator . . . . . . . . . . . . . . . . . .

91

Other work on the IFPP and applications . . . . . . .

93

A useful Matrix from TCP congestion control . . . . . . . . .

94

4.2.4 4.3

ii

CONTENTS

4.4

Synthesizing 1-D maps with arbitrary piecewise-constant invariant densities . . . . . . . . . . . . . . . . . . . . . . . . .

98

4.4.1

Synthesis Procedure . . . . . . . . . . . . . . . . . . .

99

4.4.2

Examples . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.4.3

The role of the β’s . . . . . . . . . . . . . . . . . . . . 109

4.4.4

Lyapunov Exponents . . . . . . . . . . . . . . . . . . . 113

4.4.5

Nonuniform Partitions . . . . . . . . . . . . . . . . . . 114

4.5

Results on switching between maps . . . . . . . . . . . . . . . 118

4.6

Comparison with other methods . . . . . . . . . . . . . . . . 123 4.6.1

4.7

3-band matrix solution to the IFPP . . . . . . . . . . 126

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5 Synthesis of Higher-Dimensional Maps and Parameter-Space Structures, and Potential Applications

131

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.2

Pseudo 2-D map from 1-D map . . . . . . . . . . . . . . . . . 132 5.2.1

5.3

5.4

N-Dimensional maps . . . . . . . . . . . . . . . . . . . 133

Bollt’s Affine Function Method . . . . . . . . . . . . . . . . . 133 5.3.1

Statistical Image Generation . . . . . . . . . . . . . . 135

5.3.2

Modification of Bollt’s Method . . . . . . . . . . . . . 140

Synthesising Maps with Arbitrary Chaotic Regions in Parameter Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.4.1

Dynamics of the Tent map . . . . . . . . . . . . . . . 141

5.4.2

Candidate functions for various parameter-space structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.4.3 5.5

5.6

The Spiral Problem . . . . . . . . . . . . . . . . . . . 150

System Identification and Modelling . . . . . . . . . . . . . . 152 5.5.1

System Identification Examples . . . . . . . . . . . . . 155

5.5.2

Modelling Time-Series Data . . . . . . . . . . . . . . . 157

Adaptive control of chaos . . . . . . . . . . . . . . . . . . . . 159 iii

CONTENTS

5.6.1

Example . . . . . . . . . . . . . . . . . . . . . . . . . . 162

5.7

Encryption using Time-Varying Switched Chaotic Maps . . . 163

5.8

Encryption using Symbolic Dynamics

5.9

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6 Conclusions

. . . . . . . . . . . . . 168

175

6.1

Thesis Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 175

6.2

Ideas for the Future . . . . . . . . . . . . . . . . . . . . . . . 176

Bibliography

179

iv

Abstract Chaos theory is at the stage where people are actively seeking applications of the diverse properties of nonlinear dynamical and chaotic systems. Up to this point, most chaotic maps have been derived in an ad hoc manner, often as models of physical phenomena. Although there are many existing chaos applications, we believe that for new applications to flourish, a systematic approach for creating chaotic systems with desired properties is necessary. In this thesis, we shall address the synthesis problem of how to create chaotic maps with arbitrary invariant densities. This problem is known as the Inverse Frobenius-Perron problem (IFPP). As our main result, we present an elegant new solution to this problem based on the theory of positive matrices. We show how to create one-dimensional maps with piecewise-constant invariant densities, and also extend this result to n-dimensional maps. In addition, we derive some results on invariant densities of time-varying maps. We view the main result as a useful enabling method, bridging the gap between chaos theory and its applications. The rest of the thesis contextualizes the result. First, we look at the origins and development of chaos theory. We also consider the properties of chaos which make it potentially useful, and the variety of ways in which chaos has been applied. We apply these properties in a new way to the problem of pattern classification. Following the result on the IFPP, we introduce some new ways of applying chaotic maps to the problems of system identification, modelling, and encryption. We also show how images can be encoded in chaotic maps, and give a method for synthesizing maps with arbitrary regions of chaos in their parameter-space.

v

List of Figures 2.1

Chaotic orbit in restricted three-body problem . . . . . . . .

2.2

The use of Poincaré sections in distinguishing different types

8

of motion: (a) Stable fixed point (b) Period-two motion (c) Cycle (d) Chaos

. . . . . . . . . . . . . . . . . . . . . . . . .

9

2.3

Homoclinic tangle

. . . . . . . . . . . . . . . . . . . . . . . .

11

2.4

Action of Smale’s Horseshoe Map . . . . . . . . . . . . . . . .

13

2.5

Van der Pol’s neon bulb relaxation oscillator

. . . . . . . . .

14

2.6

Bifurcation diagram of the logistic map . . . . . . . . . . . .

17

2.7

Stabilizing an unstable fixed point of the logistic map at r = 3.9 using the OGY method. . . . . . . . . . . . . . . . . . . .

19

2.8

The logistic map (Equation 2.10)

. . . . . . . . . . . . . . .

21

2.9

The tent map (Equation 5.5) . . . . . . . . . . . . . . . . . .

22

2.10 Bifurcation Diagram and variation of Lyapunov exponent of the Logistic map . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.11 Definition of a Lyapunov exponent . . . . . . . . . . . . . . .

24

2.12 Henon attractor: neither a line nor a plane . . . . . . . . . .

26

2.13 Middle-thirds Cantor set . . . . . . . . . . . . . . . . . . . . .

27

2.14 Bernoulli shift map with two diverging trajectories. Initial conditions are x0 = 0.31415 and x0 = 0.31416 . . . . . . . . .

vi

28

LIST OF FIGURES

2.15 The fixed points of the third iterate of the Bernoulli map σ 3 (x) are shown at intersections with identity line (circles). Squares are fixed points of the Bernoulli map σ(x). . . . . . .

31

2.16 Broadband power spectrum from 100000 iterates of the Bernoulli map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.17 System identification with periodic windows. . . . . . . . . .

34

2.18 Period-3 window from the logistic map. . . . . . . . . . . . .

34

2.19 Route to chaos: (a), (b): Before and after a period-doubling bifurcation; (c), (d): Before and after a tangent bifurcation .

35

2.20 Sketch of first few Arnold tongues, or mode-locking regions of the sine-circle map . . . . . . . . . . . . . . . . . . . . . . .

37

2.21 Critical point and branches of the Logistic map . . . . . . . .

39

2.22 Communication System . . . . . . . . . . . . . . . . . . . . .

41

2.23 Storing the word ‘face’ in a 1-d map . . . . . . . . . . . . . .

44

2.24 Public-key Encryption System . . . . . . . . . . . . . . . . . .

45

2.25 Action of the Baker’s Map on the unit square . . . . . . . . .

48

3.1

Feature space of twin-engined aircraft families . . . . . . . . .

54

3.2

The nearest-neighbour rule creates a polyhedral tessellation of the feature space . . . . . . . . . . . . . . . . . . . . . . . .

56

3.3

Model of a neuron . . . . . . . . . . . . . . . . . . . . . . . .

57

3.4

Perceptron as a signal-flow graph . . . . . . . . . . . . . . . .

58

3.5

Hyperplane as a decision boundary for a two-class, two-dimensional pattern classification problem. . . . . . . . . . . . . . . . . . .

60

3.6

Action of Baker’s Map on unit square . . . . . . . . . . . . .

64

3.7

Lyapunov numbers chracterise the average stretching factors of some small circle of radius δ. Here, Λ1 > 1 and Λ2 < 1. . .

65

3.8

Variation of Lyapunov dimension with R and S. . . . . . . .

66

3.9

Variation of Lyapunov dimension with R and with S as a parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

67

LIST OF FIGURES

3.10 A more general way of solving the XOR problem: Draw a straight line through the two points belonging to Class A and find where the line intersects the y-axis. . . . . . . . . . . . .

69

3.11 Four points selected to illustrate the chaotic XOR system . .

72

3.12 Output from chaotic XOR gate with inputs as in Table 3.2. Contiguous sets of 200 points are averaged. Class A corresponds to a modified Lyapunov Dimension DM ≈ 1.75. Note that we cycle through the points (1), (4), (2), (3) and (1) respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

3.13 Three possible class configurations (out of a possible 16). . .

76

3.14 Distances from each pattern cluster to the separating line. + symbols belong to Class A and o symbols belong to Class B.

77

3.15 Tanh functions: Solid Line is tanh(d) and dotted line is tanh(d2 ) 79 3.16 Illustrative example of applying simulated annealing to an AND gate (Class B corresponding to (1,1) gives a TRUE output): the initial line has slope m = 2 and y − intercept = −3. The desired line should have slope m = −1 and y − intercept > 1. . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

3.17 Energy landscape for patterns in Figure 3.16. Minimum energy = −3.84, when m = −1 and y0 = 1.55. . . . . . . . . . .

81

3.18 Plots of energy (cost) as temperature is decreased for four runs of the simulated annealing algorithm . . . . . . . . . . . 4.1

Smooth and Fractal Invariant Densities of Logistic Map (a) r = 4 (b) r = 3.67 . . . . . . . . . . . . . . . . . . . . . . . .

4.2

82

86

(i) Logistic and Tent maps (ii) Conjugating Function (iii) Invariant Densities of both maps . . . . . . . . . . . . . . . . .

93

4.3

Evolution of Congestion Window . . . . . . . . . . . . . . . .

96

4.4

Illustration of the construction of a 1-D map from a Markov matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 viii

LIST OF FIGURES

4.5

One-dimensional chaotic map with partition on unit-interval shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.6

Invariant density of map in Figure 4.5 . . . . . . . . . . . . . 104

4.7

Time-series of chaotic map in Figure 4.5

4.8

One-dimensional chaotic map with sinusoidal invariant density 105

4.9

Invariant density of map in Figure 4.8 . . . . . . . . . . . . . 105

4.10 Time-series of chaotic map in Figure 4.8

. . . . . . . . . . . 104

. . . . . . . . . . . 106

4.11 Gaussian-shaped invariant density generated from a 121×121 transition matrix . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.12 Time-series of chaotic map from example (c), corresponding to invariant density in Figure 4.11 . . . . . . . . . . . . . . . 108 4.13 Illustration of a vector evolving towards the Perron eigenvector109 4.14 Detailed plot of a 1-D map with large β values: trajectories become trapped in subintervals and transitions occur infrequently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.15 (a), (b)1-D map and state-space plot with βi = 0.9 ; (c), (d) βi = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.16 Variation in Lyapunov Exponent with β, and α1 = α2 = 0.5 . 115 4.17 Construction of 1-D map with a nonuniform partition . . . . 116 4.18 Synthesized non-uniform map . . . . . . . . . . . . . . . . . . 117 4.19 Time-series of map, with different interval densities visible . . 117 4.20 Piecewise constant distribution of initial conditions being applied to chaotic map derived from a rank 1 transition matrix 124 4.21 Chaotic map synthesized using the matrix method of Gora and Boyarsky . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.22 Chaotic map with same invariant density as that in Figure 4.21129 5.1

Synthesized 2-D map with Chequer-board Invariant Density . 133

5.2

Iterates of a synthesized map with ρ = 1, 2, 3, 4 showing selfsimilar structure. . . . . . . . . . . . . . . . . . . . . . . . . . 135 ix

LIST OF FIGURES

5.3

Bollt’s piecewise affine function designed as an Anosov diffeomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5.4

Steps involved in encoding an image . . . . . . . . . . . . . . 137

5.5

Part of the Lena image (30 × 30 pixels) . . . . . . . . . . . . 138

5.6

Lena emerges as the map is iterated . . . . . . . . . . . . . . 138

5.7

80000 iterations of the map.

5.8

Non-invertible affine mapping with no self-similarity . . . . . 140

5.9

Tent map with r < 1 . . . . . . . . . . . . . . . . . . . . . . . 142

. . . . . . . . . . . . . . . . . . 139

5.10 Tent map with r = 1.5 - chaotic region . . . . . . . . . . . . . 143 5.11 Tent map with r = 2.1 - unstable . . . . . . . . . . . . . . . . 143 5.12 Bifurcation diagram of the tent map . . . . . . . . . . . . . . 144 5.13 Parameter-space plot of Equation 5.10 with g(a, b) = a + b . . 145 5.14 Parameter-space plot of Equation 5.10 with g(a, b) = ab . . . 146 5.15 Parameter-space plot of Equation 5.10 with g(a, b) = a2 + b2 . 147 5.16 Periodic structure created with the modulo function . . . . . 148 5.17 Concentric chaotic circles in parameter-space . . . . . . . . . 149 5.18 Chaotic regions created using Gaussian functions . . . . . . . 151 5.19 A chaotic spiral in parameter space . . . . . . . . . . . . . . . 153 5.20 The minimum of the total least squared error corresponds to the correct value of β, 0.2 . . . . . . . . . . . . . . . . . . . . 156 5.21 Variation of total squared error with matrix dimension . . . . 157 5.22 Time-series from the logistic map (black) and a model of the logistic map (red) having the same Lyapunov exponent . . . . 158 5.23 A better model of the logistic map (red) which has a Lyapunov exponent of 2.1 . . . . . . . . . . . . . . . . . . . . . . 160 5.24 Block diagram of an adaptive controller which maintains a constant invariant density for a time-varying chaotic system . 161

x

LIST OF FIGURES

5.25 Controller off: Blue lines are the elements of the controller invariant density (constant). Coloured lines are the elements of the (uncontrolled) invariant density which are varying because of the time-varying map. The desired invariant density is the black line corresponding to ρ = [.25, .25, .25, .25]. . . . . 163 5.26 Controller on: Blue lines are the elements of the controller invariant density. These are varying and are enabling the overall invariant density to be controlled (coloured lines). . . 164 5.27 Hiding a message using time-varying chaotic maps . . . . . . 167 5.28 Recovered message (‘data’: d=4, a=1, etc.) from a timevarying chaotic map . . . . . . . . . . . . . . . . . . . . . . . 167 5.29 (Bernoulli) Shift map, with S = 0.5 . . . . . . . . . . . . . . . 169 5.30 Size of encoded data file . . . . . . . . . . . . . . . . . . . . . 172 5.31 Time taken to encode data . . . . . . . . . . . . . . . . . . . 172 5.32 Original Lena Image . . . . . . . . . . . . . . . . . . . . . . . 173 5.33 Decoded Lena Image, with initial condition error of 1 × 10−17 173

xi

List of Publications This thesis has so far given rise to the following three publications: 1. Alan R. Rogers, John G. Keating, Robert N. Shorten, and Daniel M. Heffernan, ‘Chaotic Maps and Pattern Recognition - the XOR problem’, Chaos, Solitons and Fractals, vol.14, pp57-70, 2002. 2. Alan R. Rogers, John G. Keating, and Robert N. Shorten, ‘A novel pattern classification scheme using the Baker’s Map’, Neurocomputing, vol.55, pp. 779-786, 2003. 3. Alan R. Rogers, Robert N. Shorten, and Daniel M. Heffernan, ‘Synthesizing chaotic maps with prescribed invariant densities’, Physics Letters A, vol.330, pp.435-441, 2004.

xii

Acknowledgments Special thanks go to my supervisors for all of their help in finally bringing this work to a conclusion: Danny Heffernan for regaling us with tales of Feynmann and Feigenbaum; and especially Bob Shorten for keeping the whole thing on the rails for the past five years, and for his insights into the third-level system, and for helping me with the CV. Thanks also to John Keating for help with the pattern classification work. My friends and colleagues in the Electronic Engineering department have also been a great source of help in varying capacities during the last few years. Thanks especially to John Ringwood for his leadership, and for giving me a job, and Tomas Ward for telling me about it in the first place, and to all those along the corridor who made NUIM such a great place to work over the last few years, including John, Seamus, Sean (×2), Rudi, Frank, Dave, Karl, Mark, Bob, Ronan, Eoin, Joanne, and Orla. Although not directly involved in the actual writing of the thesis, thanks must go to my mam and dad, and the rest of the family, for being there, and for baby-sitting, and for dropping me a few quid when I needed it. And finally, a special thank-you to my wife Jean, the sunshine of my life, for putting up with me and this work since 2001. No more will I be able to avoid emptying the dishwasher because of vague PhD-related ‘stuff’ ! Greetings to the young lad Evan, who appeared out of nowhere in 2004, throwing our lives into more chaos. The thesis is dedicated to them. The thesis was typeset using LATEX. Graphs and charts were done using MATLAB and Visio.

xiii

Chapter 1

Introduction Determinism, like the Queen of England, reigns, but does not govern. Sir Michael Berry

1.1

Motivations and Objectives

Chaos is everywhere: in the stars and planets, in atoms, in weather systems, in man-made and mathematical systems of all kinds. It is synonymous with randomness and yet is not random. It can be both predictable and unpredictable. It can look like noise, but gives rise to beautiful patterns. And it has disciples who study it with zeal, and detractors who wonder what all the fuss is about. The motivation behind this thesis is the search for new applications of chaos. The scientific field of chaos theory has matured since the heady days of the 1960’s and 70’s when the pioneers of chaos were at work. A lot of the major analytical questions have been answered and the nature of scientific enquiry, as so often before, is turning from an analytic mode to a synthetic mode: what can chaos be used for? There is no such thing as Chaos Theory, in the sense that there is no actual theory [Holmes, 1990]. Chaos Theory is really an umbrella term for 1

1.1 Motivations and Objectives

a number of different results from dynamical systems theory, fused with elements of symbolic dynamics, bifurcation theory, and ergodic theory. This theory says that strange behaviour can occur in apparently very simple deterministic systems: trajectories that look periodic suddenly become aperiodic; miniscule changes in parameter values or initial conditions lead to huge changes in long-term behaviour; in short, simple systems can be unpredictable. And yet the behaviour is not random or noisy. Chaotic trajectories have a lot of structure: the attractor is often aesthetically pleasing (accounting for the enormous interest in chaos by the public), and when considered statistically, chaotic systems are entirely predictable. Over the past century, through the work of Poincaré, van der Pol, Smale and Lorenz, it has become clear that chaos manifests itself in an array of physical systems. In the past thirty years, many of the mechanisms that give rise to chaos have been explained. Forced second-order and autonomous third-order differential equations, the bread and butter of so many scientists, engineers, and economists, admit chaotic solutions when some nonlinearity is present. Chaotic systems are not in short supply. More recently still the area of chaos applications has apparently opened up. The discovery of a chaos control scheme by Ott and his co-workers, and the discovery by Pecora and Carroll that two chaotic systems could be synchronized, put the scientific community on notice that perhaps chaos was more than a mathematical novelty, but could be put to some use. The idea of chaotic communication schemes, where a message is hidden in a noisy chaotic signal, has received a great deal of attention, but the results are mixed. Other applications have been slower to appear, and when they do, it has been on a piecemeal basis. Researchers appear to pick chaotic systems or maps almost randomly to fit their would-be application. The purpose of this work is to make some small steps towards correcting this situation. In this thesis, we present a new method for synthesising

2

1.2 Structure and Contributions

chaotic maps with desired invariant densities. This method is based on very recent results from positive matrix theory which were derived in a communications theory context. The synthesis method presented here is a great deal simpler than any other synthesis method appearing in the literature, and shows great promise in helping to develop chaos applications. We also present some novel applications of chaos, and ideas for applications which might merit further study.

1.2

Structure and Contributions

The rest of the thesis is structured as follows: Chapter 2 gives a history of chaos and its applications. We begin with Henri Poincare’s entry to a mathematical competition, in which he considered solutions to the three-body problem, and realised that the solutions could become unimaginably complex. Moving through the twentieth century, we look at the development of chaos theory and discrete mappings, and the various applications that have been proposed. From Chapter 2, we learn that that chaotic systems possess many interesting and potential useful properties, but that many of these are not being exploited. Chapter 3 suggests a way of exploiting the variation in Lyapunov dimension of the generalized baker’s map to achieve pattern classification. This application serves as a motivational example of what can be done with chaotic systems when some lateral thinking is employed. Although the method for pattern classification suggested is novel, it is still rather ad hoc. The method doesn’t work for all maps or all patterns, though it’s particularly suited to the XOR problem, a benchmark problem in pattern recognition. If anything, Chapter 3 emphasises the need for a more systematic approach to developing applications of chaos, leading us to Chapter 4, where we present the main results of the thesis. In Chapter 4, we present a new method of synthesizing chaotic maps with 3

1.2 Structure and Contributions

arbitrary piecewise constant invariant densities. Essentially, the method allows us to create maps with desired statistical properties: the invariant density of the map tells us where iterates go in the state-space on average. This is not an application of chaos in itself, but it is a significant enabling-method which could give rise to new applications. The problem of synthesizing maps is called the Inverse Frobenius-Perron problem (IFPP), and background information on this and approaches to the problem also appear in Chapter 4. We also consider what happens to the invariant density when the mapping is switched between one of several alternatives at each iterate, and prove some theorems on this topic. Chapter 5 extends the synthesis method to two dimensions and higher. We show how images can be encoded in the invariant density of a twodimensional map, and illustrate this by encoding the well-known Lena image, which then emerges from a haze of chaotic points. In this chapter, we also give a simple method of creating chaotic regions in the parameter-space of a map. We then present some novel applications of chaotic maps in encryption, modelling and control. Chapter 6 offers conclusions to be drawn from the thesis, and ideas for expanding and deepening the work presented here. This thesis has given rise to three journal publications so far, with three more journal papers in preparation. Two of the papers concern the work in Chapter 3 on pattern classification [Rogers et al., 2002], [Rogers et al., 2003]. The third paper presents the bare bones of the synthesis procedure in Chapter 4 [Rogers et al., 2004]. It is hoped that future publications will appear on switching between chaotic maps (Chapter 4), new methods of encoding data using chaos (Chapter 5), and a review article on chaos and its applications (Chapter 2).

4

Chapter 2

A History of Applied Chaos ...it may happen that small differences in the initial conditions produce very great ones in the final phenomena. A small error in the former will produce an enormous error in the latter. Prediction becomes impossible and we have the fortuitous phenomenon. Jules Henri Poincaré, 1908

2.1

Introduction

In his popularizing account of chaos theory, ‘Does God play Dice’, the mathematician Ian Stewart claims that there is usually a time-lag of about 100 years before mathematical concepts find any real application [Stewart, 1990]. Surprisingly enough, a century after Henri Poincaré analysed the restricted three-body problem to find chaotic orbits lurking in the shadows, there is widespread interest in applications of chaos. A case could possibly be made that chaos theory in the modern era didn’t really begin until 1961 with Edward Lorenz’s now famous simulations of the Navier-Stokes equations. Hopefully in this century, a real application of chaos will be found. What is a real application though? Textbooks on chaos and nonlinear dynamics 5

2.2 Chaos: A nonlinear history

usually have a chapter or two on ‘applications’. These so-called applications are invariably nothing more than examples of chaos occurring in physical systems, ranging from Chua’s circuit to El Ni˜ no. A real application should use the chaos, or properties of it, to do something useful. Furthermore, it should be at least as good, if not better, than existing ways of doing that task, otherwise it won’t be adopted. The phenomenon of natural selection is as ruthless in the engineering world as it is in the biological. In this chapter, we will look at the history of chaos, and how it developed into a mature scientific field over the course of the twentieth century. We will also consider the various characteristics of chaotic systems which are useful in real-world applications, and we will comprehensively review the applications of chaos that have been proposed.

2.2

Chaos: A nonlinear history

In this section, we sketch the history of chaos theory, from Poincaré’s discovery of it around 1890, through to the development of chaos control by Ott and others in the 1990s. There are a number of key sources on the history of chaos which we have drawn upon, including Stewart [Stewart, 1990], Barrow-Green [Barrow-Green, 1997], and Abraham [Abraham and Ueda, 2000].

2.2.1

The Three-body problem

In 1885, Henri Poincaré entered a mathematics competition in honour of the 60th birthday of King Oscar II of Sweden and Norway. The King was wellknown as a patron of the sciences, and the competition, proposed by Gösta Mittag-Leffler, created great interest across Europe. Four questions were proposed by Karl Weierstrass and Charles Hermite, one of which concerned the coordinates of several point masses orbiting each other according to Newton’s laws, and whether the coordinates could be written down explicitly for

6


all time. Poincaré’s initial submission was deemed the winner, even though he hadn’t quite solved the problem (in his solution, he effectively invented the field of topology and revolutionized the study of dynamical systems). While writing a paper for Mittag-Leffler’s journal Acta Mathematica, he realized that he made a serious mistake regarding the stability of orbits in the three-body problem. The revised paper, extended by an extra 100 pages, incorporated many of the ideas we now use in chaos theory including Poincaré sections and homoclinic points. Interestingly, Isaac Newton had also studied the three-body problem by looking at the effect of the sun on the moon’s orbit around the earth and had remarked to the astronomer John Machin “that his head never ached but with his studies on the moon”. In the restricted three-body problem which Poincaré studied, two bodies orbit their centre of mass under the influence of gravitation, and then a third point mass orbits in the same plane as the main bodies. Figure 2.1 shows a typical orbit where a point mass is influenced by the gravity of two bodies (marked in black). The American mathematician G. W. Hill laid much of the groundwork for Poincaré in that he derived a new class of periodic solutions for the problem, and proposed several new lines of attack. It was realised by Hill and others that a closed-form solution would never be found. Rather, the differential equations had to be solved using infinite series which then had to be convergent for a valid solution. (Many of the fascinating details of the three-body problem and Poincaré’s competition entry can be found in Barrow-Green’s historical treatise [Barrow-Green, 1997].) One of Poincaré’s key inventions was the use of a transverse surface cutting through the orbits, which allowed him to convert a continuous-time system into a discrete mapping. Figure 2.2 shows some two-dimensional Poincaré sections of three-dimensional flows. A flow approaching a stable fixed point appears as a sequence of points getting closer together as they approach the fixed point. A stable circular orbit would appear as a single

7


0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8 -0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Figure 2.1: Chaotic orbit in restricted three-body problem

8

1.4


Figure 2.2: The use of Poincaré sections in distinguishing different types of motion: (a) Stable fixed point (b) Period-two motion (c) Cycle (d) Chaos

9


point on this surface, as the body revisited the same point in space again and again. An unstable orbit would appear as a sequence of points moving away from the initial point on the surface. A period-2 orbit would appear as a mapping between two points, jumping back and forth. In a chaotic orbit, a given point on the Poincaré surface would never be revisited (but might be approached infinitely closely). Poincaré then discovered what he called doubly-asymptotic solutions to the three-body problem, where the orbits would approach a stable solution along one direction (stable manifold) as t → ∞, and would approach the same solution along a different direction (unstable manifold) as t → −∞. Poincaré called the intersection of the stable and unstable manifolds a homoclinic point, and was able to show that if one homoclinic point existed, then an infinite number existed. In order for this to happen, the manifolds had to fold back on themselves in a very complicated web. Figure 2.3 shows the stable and unstable manifolds of a homoclinic point of the two-dimensional map, xn+1 = xn + yn+1 , yn+1 = yn + kxn (xn − 1) (studied by Arrowsmith and Place [Arrowsmith and Place, 1992]). Poincaré wrote in his Methodes Nouvelles [Barrow-Green, 1997] that When one tries to depict the figure formed by these two curves and their infinity of intersections, each corresponding to a doubly asymptotic solution, these intersections form a kind of web, or infinitely tight mesh; neither of the two curves can ever intersect itself, but must fold back on itself in a very complex way in order to intersect all the links of the mesh infinitely often. One is struck by the complexity of this figure that I am not even attempting to draw. Nothing can give us a better idea of the complexity of the three-body problem and all of the problems of dynamics in general where there is no single-valued integral and Bohlin’s series diverge. 10


1 0.8

Unstable Manifold Wu

0.6 0.4

y

n

0.2 0

Elliptic fixed point

-0.2

Homoclinic point

-0.4 -0.6

Stable Manifold Ws

-0.8 -1 -0.6

-0.4

-0.2

0

0.2

0.4

x

0.6

0.8

1

1.2

1.4

n

Figure 2.3: Homoclinic tangle This was chaos in action. Unstable manifolds were pulling nearby initial conditions apart, and stable manifolds were reinjecting them back from whence they came, the stretching and folding mechanism essential for chaos. The lack of computer power killed off any chance of these doubly-asymptotic solutions being investigated further, and so they became a footnote in dynamical systems history. Poincaré’s other work on dynamical systems was continued by others, notably George Birkhoff. Poincaré had also corresponded with Alexander Lyapunov in the 1890s on notions of stability, with their respective definitions differing wildly. Poincaré’s rather flexible definition was that the orbit returned to the same vicinity as the initial point, and did so as closely as one wished, while Lyapunov gave a very precise definition, arguing that solutions should remain arbitrarily close to a given solution for all time.

11


2.2.2

Horseshoes and Butterflies

There was not much development of Poincaré’s results on homoclinic and heteroclinic points and tangles for many years. Physicists had moved on from the three-body problem: the era of general relativity and quantum mechanics had arrived. Quietly though, mathematicians were developing results in Topology and Dynamical Systems. Some key developments occurred in the early 1960s by the leading American topologist Stephen Smale, who was considering higher-dimension analogues of the Poincaré-Bendixson theorem. It was clear that in a two-dimensional phase-space, the four typical behaviours were sources, sinks, saddles and limit cycles, but was this true in three-dimensional systems? Smale showed that other types of motion were indeed possible. He introduced the idea of the horseshoe map as an abstraction of the effect of a transversal homoclinic point on a set of initial conditions. Figure 2.4 shows the action of the map, F on a region consisting of the union of a square, S and two semi-circular regions, D1 and D2 . The map F stretches and folds the regions into a horseshoe shape. Of great interest is the invariant set Λ, the set of points that remain in the square S under repeated iteration of the map. The two horizontal rectangles H0 and H1 are F (S) ∩ S. But the preimages of H0 and H1 are the vertical rectangles V0 and V1 . Since points within Λ do not leave S, they must be contained in H0 ∪ H1 , and therefore must be contained in V0 ∪ V1 . Therefore, Λ ⊂ (H0 ∪ H1 ) ∩ (V0 ∪ V1 ), which is four squares. Repeated application of the map refines each of these squares into four more smaller squares, and on eventually to a Cantor set. In [Abraham and Ueda, 2000], Smale argues that an algebraic description of the horseshoe would not lead to the insights that the geometric view does. Many of the key concepts of dynamical systems, such as contractive mappings, sensitive dependence, stable and unstable sets, homoclinic points, can all be explained using the horseshoe map, without recourse to algebraic analysis. The essential point 12


is that any system possessing a homoclinic point will also have horseshoetype dynamics. Both Devaney [L.Devaney, 1989] and Ott [Ott, 2002] give detailed descriptions of the horseshoe map’s dynamics by imposing a symbolic dynamics scheme to label the various horizontal and vertical strips.

Figure 2.4: Action of Smale’s Horseshoe Map This work on horseshoe maps was prompted by a letter to Smale from Norman Levinson, who had attempted in 1949 to simplify Cartwright and Littlewood’s analysis of the Van der Pol equation. The electrical engineer Balthesar Van der Pol had noticed unstable nonperiodic behaviour in an 13


R E

Ne

+ _ C E0sinω ωt

Figure 2.5: Van der Pol’s neon bulb relaxation oscillator electrical oscillator in 1927. In hindsight, he had almost certainly witnessed chaos in the circuit, but was not in a position to explain it. He wrote of hearing an ”irregular noise” as he varied the capacitance, and the oscillator jumped to different frequencies (a loosely-coupled telephone circuit was used to listen to the oscillator)[Kennedy and Chua, 1986]. The behaviour of the Van der Pol oscillator can be described with an equation of the following form: x ¨ + ε(x2 − 1)x˙ + x = E sin ωt

(2.1)

Forced harmonic oscillator equations, such as Equation 2.1 occur frequently in various different physical systems such as mechanical spring-damper systems, electrical systems and quantum mechanical systems. During the same period that Smale discovered the horseshoe map, Edward Lorenz, a meteorologist at MIT, was carrying out simulations of atmospheric convection on his primitive Royal McBee computer. He was simulating a set of twelve differential equations in twelve unknowns in an attempt to recreate a weather system that behaved in a vaguely realistic way [Lorenz, 1993]. He spent a long time adjusting the parameters in order to get non-periodic behaviour. Of course he wasn’t looking for chaos. The purpose of his research was to move away from the linear predictive weather forecasting that was so prevalent then by studying real nonlinear models. 14


Eventually he found a solution to the equations which was a non-periodic flow, and appeared to flip randomly between the two lobes on an attractor. By accident, Lorenz found that the equations exhibited sensitive dependence on initial conditions when he entered a truncated starting point for his simulation, not expecting it to make a difference. He used the image of a butterfly beating its wings to illustrate that microscopic initial differences will be eventually lead to different outcomes at the macroscopic scale in chaotic systems. Later in 1961, Lorenz reduced a set of seven equations that another meteorologist, Barry Saltzmann, had been studying, to the three equations now known as the Lorenz equations [Lorenz, 1993]: X˙ = σ(Y − X) Y˙

= −XZ + ra X − Y

(2.2)

Z˙ = XY − bZ

2.2.3

Period Three Implies Chaos

Robert May was a theoretical physicist who became interested in simple ecological population models, while working in Princeton. He was studying the logistic map, xn+1 = rxn (1 − xn ), in particular, and noticed the perioddoubling sequence that occurred as the control parameter r was increased. May was at a loss to explain what happened beyond the accumulation point where the system goes chaotic. He wrote a landmark review article in 1976 in Nature detailing his explorations of the logistic map and containing an exhortation for more studies of simple nonlinear systems [May, 1976]. James Yorke in the University of Maryland also did pioneering work on chaos theory and is widely regarded as having come up with term Chaos. He had encountered Lorenz’s paper, Deterministic Nonperiodic Flow [Lorenz, 1963], and had widely circulated it to people like May, and to Steven Smale in Berkeley. Performing a rigorous analysis on the logistic map, Yorke showed that if a period-3 cycle existed at some parameter value, then periods of all 15


cycle lengths would also be present. His paper Period Three Implies Chaos, written with Li in the American Mathematical Monthly, introduced chaos to a whole new audience of mathematicians [Li and Yorke, 1975]. It should also be noted that in parallel with all the developments in nonlinear dynamics happening in the US, Russian scientists had also discovered many of the same results, mostly published in Russian. A.N. Sarkovskii for instance had also proved a theorem in 1964, pre-dating the work of Yorke, in which he ordered the natural numbers into the sequence of periods that occur in chaotic systems, with Period-3 at the start of the sequence: 3 . 5 . 7 . 9... 3 · 2 . 5 · 2 . 7 · 2 . 9 · 2... 3 · 22 . 5 · 22 . 7 · 22 . 9 · 22 . . . .. . 3 · 2m . 5 · 2m . 7 · 2m . 9 · 2m . . . 2m . . . . . 16 . 8 . 4 . 2 . 1 Sarkovskii’s Theorem does not mention the stability of orbits. Thus, if a stable period-three window occurs at a particular parameter value of a map, orbits of all periods will be present at that parameter value, but will be unstable.

2.2.4

Feigenbaum Numbers

After witnessing a lecture by Smale, the physicist Mitchell Feigenbaum started thinking about maps. He was interested in nonlinear systems and phase transitions, and by 1975, it was known that strange things happened in the logistic map when you vary its parameter. Robert May had studied the map as a nonlinear population model, and witnessed the perioddoubling sequence. At the same time, Metropolis, Stein, and Stein, showed that the logistic map’s behaviour was extremely complicated, and could 16


not be explained using traditional methods [Metropolis et al., 1973]. Using a pocket calculator, Feigenbaum started investigating the map, and found that in the period-doubling sequence, the successive branches of the bifurcation tree were related to previous branches by the factor 4.669, and this number cropped up again when he looked at other unimodal maps such as the sine map. Using the recently developed renormalization group methods of Wilson [Wilson, 1971], Feigenbaum showed that the scaling behavior at the transition to chaos is governed by universal constants, Feigenbaum’s α and δ [Feigenbaum, 1980]. These constants appear in the bifurcation sequence of continuous maps with quadratic maxima [Schuster, 1989], such as the logistic map (Figure 2.6). Feigenbaum’s δ gives the ratio of differences

Figure 2.6: Bifurcation diagram of the logistic map in parameter values (r1 , r2 , . . . in the figure) at which the bifurcations occur: δn =

rn − rn−1 → 4.66920161 . . . as n → ∞ rn+1 − rn

17

(2.3)

2.2 Chaos: A nonlinear history Feigenbaum’s α gives the size scaling-ratio of the period-2n forked branches just before bifurcation to period-2n+1 : dn = 2.5029 . . . n→∞ dn+1

α = lim

2.2.5

(2.4)

Chaos under control

In 1990, Ott, Grebogi and Yorke (OGY) [Ott et al., 1990] introduced a method for stabilizing a chaotic trajectory. The OGY method enables a chaotic trajectory to be manoeuvred onto some desired periodic orbit by applying small perturbations to a control parameter. A chaotic trajectory is itself made up on an infinite number of unstable periodic orbits. A trajectory might follow an unstable period-13 orbit for a while, then end up on a period-125 orbit for another while, and so on. Any chaotic trajectory will approach the constituent points of any desired period-m trajectory arbitrarily closely due to the ergodic property of chaos. In the OGY method, the controller waits until the trajectory is in the vicinity of the desired orbit, and then small perturbations are applied to keep the trajectory on the desired orbit. Figure 2.7 illustrates the principle (see [Gauthier, 2003]), where iterates of the logistic map are stabilized onto the unstable fixed point located at x∗ = 1 − 1/r by applying small perturbations to the control parameter r → r + δr, where δr is proportional to the deviation from the unstable fixed point, δr = −γ(xn − x∗ ). When iterates are close to the unstable fixed point, the map may be linearly approximated as: xn+1 = x∗ + α(xn − x∗ ) + βδr

(2.5)

The parameters α and β are the partial derivatives of the map at the unstable fixed point:

¯ ∂f (x, r) ¯¯ = r(1 − 2x∗ ) α= ∂x ¯x=x∗ ¯ ∂f (x, r) ¯¯ β= = x∗ (1 − x∗ ) ∂r ¯x=x∗ 18

(2.6) (2.7)


Unstable fixed point 0.8

xn

0.6

0.4

0.2

0 Control On -0.2 0

50

100

150

Control Off 200

250

Control On 300

350

400

Iterations

Figure 2.7: Stabilizing an unstable fixed point of the logistic map at r = 3.9 using the OGY method. The deviation from the unstable fixed point can be written as yn = xn − x∗ , allowing the linearized equation to be recast as: yn+1 = (α − βγ)yn

(2.8)

It is clear from Equation 2.8 that for control, we simply require that the following equation holds, and that γ be chosen accordingly: |α − βγ| < 1

(2.9)

Another type of chaos control is called targeting [Shinbrot et al., 1993] in which it is desired to get some initial condition to some desire target point on the attractor in the shortest possible time. Targeting is often used in conjuction with OGY method to get the trajectory onto the desired UPO (unstable periodic orbit) quickly. Targeting works by determining a tree of paths linking the target back to the initial condition. Small perturba19

2.3 Characteristics of Chaotic Maps

tions are applied at steps along the path to keep the trajectory moving towards the target. Since 1990, many different chaos control schemes have been proposed, and the interested reader should consult the books by Chen [Chen and Dong, 1997] and Schuster [Schuster, 1989]. Chaos control has been shown to work experimentally by many different groups. One particularly exciting area is the control of laser output as the pumping is increased. Often, a laser will enter a chaotic regime as the power is increased. The chaos manifests itself as a broadening of the output light spectrum. By controlling the chaos, one can get a very pure emission at very high powers [Roy et al., 1992]. Later in this thesis (Chapter 4) we discuss the Inverse Frobenius-Perron problem (IFPP), which some researchers, specifically Bollt, have regarded as a control problem. In [Bollt, 2000], Bollt applies perturbations to a map to give some desired invariant density. Our approach is not control based, as we synthesize or create the map from scratch, however some authors, notably Chen [Chen and Dong, 1997] refer to the IFPP as a type of chaos-control.

2.3

Characteristics of Chaotic Maps

The properties of chaotic systems are described in copious detail in many different sources, including the books by Schuster [Schuster, 1989], Ott [Ott, 2002], Devaney [L.Devaney, 1989], and Hilborn [Hilborn, 1994], and collections of reprints of classic papers such as Cvitanović [Cvitanović, 1989] or Bai-Lin [Bai-Lin, 1988]. In this section, we shall give an overview of the behaviour and properties of chaotic maps as background material for what is to follow in Chapters 3, 4 and 5. Chaotic maps may be either continuous or piecewise-linear, as exemplified by the logistic map (Equation 2.10) and the tent map (Equation 5.5) respectively: xn+1 = rxn (1 − xn ) 20

(2.10)


1 0.9 0.8 0.7

xn+1

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xn

Figure 2.8: The logistic map (Equation 2.10) xn+1 = 1 − 2|xn − 0.5|

(2.11)

Maps are essentially feedback systems: the output at iterate n becomes the input at iterate n + 1. One-dimensional maps such as the logistic map are often represented using cobweb diagrams, such as Figure 2.8. Take an initial condition x0 , and find its image, f (x0 ) = x1 , by drawing a vertical line from x0 up to meet the function. Drawing a horizontal line from x1 to the identity line allows the output to become the input, and the process repeats. Chaotic systems possess certain properties that we list here, and describe in subsequent sections: 1. Chaos results from deterministic nonlinear processes which possess a stretching and folding mechanism. 2. Chaotic trajectories remain bounded within the state-space. 3. Chaotic trajectories exhibit exponential sensitivity to initial conditions in the short-term. 21


1 0.9 0.8 0.7

xn+1

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xn

Figure 2.9: The tent map (Equation 5.5) 4. Chaotic trajectories have a broadband power spectrum because a chaotic trajectory is made up of unstable orbits of all periods. 5. Chaotic attractors often have fractal properties. 6. Chaotic systems generally follow certain routes to chaos as a control parameter is varied, e.g. period-doubling route; intermittency route. Analysis of chaotic systems often starts with the fixed points, determining their stability for different parameter values. The fixed points x∗ of a map f are those points that satisfy x∗ = f (x∗ )

(2.12)

Period-N points are those points that satisfy x∗ = f N (x∗ )

(2.13)

where the superscript N denotes N -times composition. For a one-dimensional map, such as in Figure 2.8, the fixed points are where the graph intersects 22


1 0.8

xn

0.6 0.4 0.2 0 0

0.5

1

1.5

2

2.5

3

3.5

4

2.5

3

3.5

4

r value

Lyapunov Exponent

1 0 -1 -2 -3 -4 0

0.5

1

1.5

2

r value

Figure 2.10: Bifurcation Diagram and variation of Lyapunov exponent of the Logistic map the identity line. A fixed point is stable if |df /dx|x∗ < 1. It is easy to show that the logistic map has fixed points at x∗n = 0, 1 − 1r , and that the derivatives at these fixed points are r and 2 − r respectively. The fixed point at 0 is thus stable for 0 < r < 1, and all trajectories are attracted to it. The other fixed point is stable for 1 < r < 3. Beyond r = 3, a pair of period-2 points become stable, then period-4 points, and so on through the famous period-doubling cascade, illustrated in Figure 2.10. Lyapunov Exponents and Fractal Dimensions The most useful quantitative measure of chaos is the Lyapunov exponent, which is a measure of the rate of divergence of nearby trajectories, averaged over the chaotic attractor [Ott, 2002], [Schuster, 1989]. There is one Lyapunov exponent per dimension of the system. In a bounded system, a finite 23


positive Lyapunov exponent indicates that the system is chaotic. If the exponent is zero or negative, then the system or map is either area-preserving, or contractive. Two-dimensional maps will often possess one positive and one negative exponent indicating that exponential divergence may only occur along a certain direction. Noise is characterised by an unbounded exponent. Chaotic flows governed by three differential equations generally have a positive, a negative, and a zero exponent. A hyperchaotic system is one with two or more positive exponents. Following Schuster’s notation [Schuster, 1989], the Lyapunov exponent may be represented graphically as follows:

εeNλ(x0) x0

x0 +ε

N iterations

f N (x0 )

f N (x 0 +ε)

Figure 2.11: Definition of a Lyapunov exponent It follows from Figure 2.11 that enλ(x0 ) = |f N (x0 + ²) − f N (x0 )|/² 1 f N (x0 + ²) − f N (x0 ) λ(x0 ) = lim lim log | | N →∞ ²→0 N ² 1 df N (x0 ) = lim log | | N →∞ N dx0

(2.14)

By using the chain rule for derivatives, Equation 2.14 may be recast as: N −1 1 X λ(x0 ) = lim log |f 0 (xi )| N →∞ N

(2.15)

i=0

In Figure 2.10, the Lyapunov exponent of the Logistic map is plotted against the parameter r, together with a bifurcation diagram. The Lyapunov exponent is negative when the fixed point at 0 is stable, but increases to zero when that fixed point becomes marginally stable at r = 1. The exponent also rises to 0 at the various period-doubling bifurcation points. When the 24


system becomes chaotic beyond r = 3.57, the exponent goes positive, but returns to negative values within the (infinitely) many periodic windows. It reaches its maximum value of log 2 when r = 4. Lyapunov exponents are also very useful in estimating the fractal dimension of an attractor. Fractal dimensions (Mandelbrot’s corruption of fractional ) are a measure of the space-filling properties of an attractor. For example, the attractor of a two-dimensional map such as the Henon map, does not fill an entire planar region in the state-space; rather it resembles a line which is folded in a convoluted way so as to fill up only part of the space (Figure 2.12). Its dimension is thus between one and two, and has been calculated to be about 1.08, and is thus closer to a line than a plane, as is clear from Figure 2.12. Some fractals may have integer dimension, such as the famous Sierpinski Sponge, which appears to be three-dimensional, but whose fractal dimension is 2. Chaotic attractors which are fractals are called strange attractors by Ruelle and Takens [Ruelle and Takens, 1971]. There are several different definitions of dimension which need to be explained. The Topological dimension, DT , takes only integer values, and corresponds with our innate notion of dimension (a point is zero-dimensional, a line, one-dimensional, and so on). The Box-counting dimension, or the Capacity dimension, was introduced by Kolmogorov, as a practical method of determining dimension. (The Hausdorff dimension, which pre-existed the box-counting dimension, is not practical to use in some circumstances [Kapitaniak, 2000].) It is calculated by covering the attractor or geometric object in boxes of side R, and letting N (R) be the number of boxes required to cover the object. Letting the size of the boxes go to zero, it is assumed that the following scaling relationship holds: N (R) = kR−DB

25

(2.16)


2

1.5

1

yn

0.5

0

-0.5

-1

-1.5

-2 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

xn

Figure 2.12: Henon attractor: neither a line nor a plane where k is a proportionality constant. Taking logs, we get: Db = − lim

R→0

log N (R) log R

(2.17)

For example, if we consider the famous middle-thirds Cantor set in Figure 2.13, at stage M , 2M boxes are required, each of length R = (1/3)M . The box-counting dimension is thus Db = log 2/ log 3 ' 0.631. The information dimension is defined as: S(R) R→0 log R

Di = − lim where S(R) = −

PN (R) k=1

(2.18)

pk log pk and pk is the fraction of points in the kth

box. The correlation dimension is defined as follows: Dc = lim log R→0

C(R) log R

where C(R) is called the correlation sum and is defined as: 1 X C(R) = 2 θ(b − |Xi − Xj |) N i6=j

26

(2.19)


0

1

2

3

4

Figure 2.13: Middle-thirds Cantor set Hentschel and Procaccia [Hentschel and Procaccia, 1983], and Grassberger [Grassberger, 1983], have shown that there exists an infinite number of generalized dimensions that can characterize an attractor, and that the types of dimension mentioned above are all special cases of these. Of most interest to us from an applications viewpoint, is the Lyapunov dimension, defined in the well-known Kaplan-Yorke conjecture: If the Lyapunov exponents (of an attractor) are ordered λ1 > λ2 . . . > λd , and j is the largest integer for P which ji=1 λi > 0, then the Lyapunov dimension, which is the same as the information dimension for typical attractors, is given by: Pj λi DL = j + i=1 |λj+1 |

(2.20)

In the next chapter, when we consider the generalized Baker’s map, which is characterized by two Lyapunov exponents, Equation 2.20 reduces to: DL = 1 +

27

λ1 |λ2 |


1 0.9 0.8 0.7

0.5

x

n+1

0.6

0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

n

Figure 2.14: Bernoulli shift map with two diverging trajectories. Initial conditions are x0 = 0.31415 and x0 = 0.31416 Sensitive dependence on initial conditions This is the most recognisable property of chaos, that points which are arbitrarily close together will move apart exponentially quickly as the map is iterated. It is a consequence of the stretching and folding mechanism that gives rise to chaos. We illustrate this mechanism using the Bernoulli map: σ(x) : xn+1 = 2xn

mod 1

(2.21)

The action of the map can be viewed as a Bernoulli shift if we express the initial condition in binary number form, where ap ∈ {0, 1}: x0 =

∞ X

ap 2−p ≡ (0.a1 a2 a3 , . . .)

(2.22)

p=1

Application of the map to the initial condition shifts all the digits to the

28


left, and deletes the first digit: σ(x0 ) = 0.a2 a3 a4 σ 2 (x0 ) = 0.a3 a4 a5 It can be readily seen that a difference in initial conditions at the nth digit will be amplified, so that after only n iterations, the two trajectories will lie in different halves of the state-space. In fact, we can label the points that lie in the interval [0, 0.5] or the interval (0.5, 1] as L or R respectively. Then, an arbitrary initial condition will give rise to a random sequence: . . . R, L, L, R, L, R, . . ., which is equivalent to the tossing of a coin [Ford, 1988], [Schuster, 1989]. This comes about because the number of non-periodic points (corresponding to irrational initial conditions) are an uncountably infinite set. The probability of choosing a periodic point by choosing a random number from the set [0, 1] is zero [Ott, 2002]. As the sequence of digits in a random number is also random, an arbitrary initial condition should give rise to a random sequence of L’s or R’s. From an applications viewpoint, the Bernoulli map (and the tent map) present a straightforward way of simulating the tossing of a coin. Of course, the limited numerical precision of computers must also be accounted for. A typical computer using double precision can store numbers as large or small as 1 × 10±308 , however the precision with which computers can carry out simple operations such as addition may only be 1 × 10−16 . And, as is pointed out by Press et al. in their well-known book of Numerical Recipes [Press et al., 2002], a rigorous analysis of roundoff error in many practical algorithms has never been made, by us, or anyone. The sensitive dependence on initial condition (and parameter values) also has an obvious application in cryptography, as the initial condition can be thought of as a unique key which gives rise to a unique trajectory. 29


If the trajectory conveys information, then that information may only be retrieved by the correct key. This is discussed in the next section of the thesis, where studies have shown that chaotic cryptographic systems do not perform particularly well. Broadband Power Spectrum A chaotic system may possess an infinite number of stable and unstable periodic orbits for any given parameter value. Thus, an arbitrary chaotic trajectory will lie close to different orbits for different lengths of time. A Fourier analysis of the trajectory will then reveal these different periodicities in the form of a broadband power spectrum. We will use the Bernoulli map again to consider the notion of unstable periodic orbits. An initial condition of the form x0 = 0.a1 a2 . . . an 00000 will map to the origin after n iterations, and will remain there. The origin is called a fixed point of the map. An initial condition of the form: x0 = 0.a1 a2 . . . am a1 a2 . . . am a1 . . . will be mapped onto a period-m trajectory. A point on the period-m trajectory can be viewed as a fixed point of the map f (x) = σ m (x) where the power of m denotes m-times composition. Ott [Ott, 2002] shows that the number of fixed points of σ m (x) which are not fixed points of σ(x) is 2m − 2. If m is a prime number, then the number of periodic orbits of period m is (2m − 2)/m, and for large non-prime m, the number of periodic orbits of period m is approximately 2m /m. All of the periodic points are unstable. This can be shown using a Taylor expansion, or by observing graphically (Figure 2.15) that the slopes of the graph at the fixed points of the composed map are greater than one, and are hence unstable. Although all the periodic points of the Bernoulli map form a countably infinite set, the probability (as was mentioned earlier) of choosing a periodic point randomly from

30


1 0.9 0.8 0.7

x

n+1

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

n

Figure 2.15: The fixed points of the third iterate of the Bernoulli map σ 3 (x) are shown at intersections with identity line (circles). Squares are fixed points of the Bernoulli map σ(x). the set [0, 1] is zero because of the uncountable infinity of real numbers in the interval [0, 1]. The fourier spectrum reflects this by showing frequency components at all periodicities (see Figure 2.16). This property of chaotic trajectories has found application in secure communications [Kennedy et al., 2000], [Kolumban, 2002]. In conventional communication systems, information is modulated onto a high frequency carrier wave, which is usually easy to detect. It is possible to create communication systems which use the noise-like properties of chaos to enable the signal to fade unseen into the background atmospheric noise. Only those with the requisite reception equipment can detect the signal. For a receiver, one often needs to synchronize a chaotic system to the chaos in the signal, a result shown to be possible by Pecora and Carroll [Pecora and Carroll, 1990]. 31


2 1.8 1.6 1.4

Power

1.2 1 0.8 0.6 0.4 0.2 0 0

200

400

600

800

1000

1200

1400

1600

1800

2000

Frequency

Figure 2.16:

Broadband power spectrum from 100000 iterates of the

Bernoulli map.

32


Another application of the broadband power spectrum is as a dedicated white noise generator [Kautz, 1999]. Noise itself has many applications including the generation of random numbers, and as a source of uncertainty when testing the behaviour of electronic devices [Gupta, 1975]. Periodic Windows An interesting characteristic of continuous chaotic maps is the presence of periodic windows which appear amidst the chaos as a parameter is varied. In the well-known maps such as the logistic map, we can find the positions and widths of these windows accurately. One potential application of periodic windows is in system identification. We can use the period-3 window (see Figure 2.18) of the logistic map to identify a system which consists only of a gain, K. This can be done by placing the system to be identified in a loop with the logistic map, xn+1 = rx(1 − x) (see Figure 2.17). The resultant difference equation is then xn+1 = Krx(1 − x). By adjusting the value of r in the logistic map until the system output is period -3, we can identify the value of the gain K to a high degree of accuracy. From Figure 2.18 above, it can be seen that the width of the period-3 window is about 0.02, and so this is the accuracy to which we could estimate the gain K. Alternatively, we could utilize one of the other periodic windows which are narrower still, leading to higher accuracy in the estimation of K. Transitions to Chaos Another property of chaotic systems is that they usually undergo a series of bifurcations as the trajectory moves from a steady-state to a chaotic regime. Common routes to chaos are the period-doubling route (pitchfork bifurcation), the intermittency route (tangent bifurcation), and the quasiperiodic route. In the period-doubling route, a change in parameter alters the map so that a stable-fixed point becomes unstable, but two new stable

33


Zero Input

Output K

Logistic Map

Figure 2.17: System identification with periodic windows.

Figure 2.18: Period-3 window from the logistic map.

34


(a)

(b)

Xn+2

Xn+2

S U

S S

(c)

xn

Xn+1

xn

(d) Xn+1

S

U

xn

xn

Figure 2.19: Route to chaos: (a), (b): Before and after a period-doubling bifurcation; (c), (d): Before and after a tangent bifurcation period-2 points are created (see Figure 2.19 (a) and (b)). Further changes in parameter leads to instability of the period-2 points, and the birth of a stable period-4 orbit. The period-doublings then recur quicker and quicker with parameter changes, until there are stable and unstable periods of infinite length, indicating a chaotic regime [Ott, 2002]. In the intermittent route to chaos, a change in parameter may cause fixed points (or periodic points) to suddenly appear or disappear (Figure 2.19 (c) and (d)) depending on whether the map function intersects the identity line or not. If the map function is almost tangent to the identity line, iterates can 35


spend a long time almost trapped near the fixed point, before escaping. Intermittency is characterised by almost periodic behaviour interspersed with chaotic bursts [Hilborn, 1994]. In the quasi-periodic route to chaos, which is particularly associated with nonlinear oscillators and circle-maps, such as Equation 2.23, the modelocking regions of the map start to overlap as the degree of nonlinear interaction K is increased. When K > 1 the map becomes non-invertible and chaos can occur. The first few mode-locking regions (also called Arnold tongues, after the Russian mathematician V.I. Arnold) are shown in Figure 2.20. The parameter Ω is called the frequency ratio, or sometimes, the bare winding number, and the Arnold tongues are each labelled p : q where q where q is the periodicity, and p is the number of times the modulo function was invoked before the cycle repeated. θn+1 = θn + Ω −

K sin(2πθn ) 2π

mod 1

(2.23)

Invariant Densities When a chaotic map is iterated, different initial conditions exponentially diverge leading to completely different long-run behaviour. However, when looked at statistically, many chaotic maps possess a single physically relevant invariant density, which remains stable when random noise is added to the process [Schuster, 1989]. Given an arbitrary initial condition, the invariant density describes where iterates end up on average. For simple maps such as the tent map, or the Bernoulli map, iterates have an equal probability of landing anywhere in the state-space: the natural invariant density ρ is constant, and equals one [Ott, 2002]. Many maps, however, do not possess a simple invariant measure. The invariant density of the logistic map is a complicated fractal for most values of the control parameter. A large

36


1 0:1

1:1

1:2

K

0 0

0.5

Ω

1

Figure 2.20: Sketch of first few Arnold tongues, or mode-locking regions of the sine-circle map

37


portion of the thesis (Chapters 4 and 5) centres on the creation of maps with arbitrary invariant densities. Such maps could be used to generate random numbers with specific distributions, in order to simulate a loaded die, or a normal distribution. The invariant density could also be used to transmit covert messages or pictures by associating codewords with the density of the iterates. This intriguing possibility will be developed further in Chapter 5. Symbolic Dynamics A course-grained description of a system’s dynamical evolution, where symbols replace entire numerical intervals, is called (a) symbolic dynamics. A great deal of information about the trajectory is discarded, and yet a symbolic dynamics description can contain the essence of a trajectory’s character [Bai-Lin, 1989]. In a symbolic dynamics description, the entire phase-space is divided into labelled regions. All iterates entering a particular region are given that region’s label. Thus many different trajectories can have the same symbolic description, allowing numerical sequences to be classified. It is usual to divide a map into intervals corresponding to monotonic branches. These branches are separated by the critical points of the map, where the derivative is zero. The branches of the logistic map, and tent map, are usually labelled L and R, corresponding to xn < 0.5 and xn > 0.5. The critical point at xn = 0.5 is labelled C (See Figure 2.21). If we consider the logistic map, it can be shown using symbolic dynamics that there exists one superstable period-2 orbit, denoted RC, one period-3 orbit, denoted RLC and three period-5 orbits, denoted RLRRC, RLLRC and RLLLC. These letter strings are known as words, and one can use symbolic dynamics to determine which words can occur (called admissible words). Other sequences can never occur and are called inadmissible words. Symbolic dynamics may also be used to determine the parameter values at which certain words occur. The

38

2.4 Chaos as an engineering tool

xn+1 C R

L

xn 0.5 Figure 2.21: Critical point and branches of the Logistic map details are quite involved and are given in Bai-Lin’s book [Bai-Lin, 1989].

2.4

Chaos as an engineering tool

In this section, we review the various ways in which chaos has been applied. There are three main areas in which researchers are active: Chaos-based communications, chaos-based cryptography, and chaos-based computation. Indeed, the term chaos-based information processing could be used to cover about 95% of the research activity. We have not touched on the area of chaos control because it mostly centers on control theory, or on ways of quenching the chaos. The chaos is not used in any way. A number of recent books also 39


comprehensively cover the entire field [Chen and Dong, 1997].

2.4.1

Communicating with Chaos

Communication systems are used to transmit information from a source to a receiver over a channel. The information or message must be encoded into a suitable form for transmission. If security is paramount, the encoded data may also be encrypted prior to transmission. There are several aspects of chaos which can be exploited in communication systems: 1. Chaotic signals are noise-like and possess a broadband power spectrum. This is useful in spread-spectrum (SS) communications. In SS communications, the power of the signal is spread out over a wide frequency band in order to avoid problems with fading and narrowband interference, and to provide additional security. The enhanced security comes about because the carrier signal is no longer a single spike in the frequency spectrum, easy to detect, monitor or jam. 2. Chaotic systems exhibit sensitive parameter dependence, allowing one chaotic system to provide a whole selection of different chaotic attractors to be used as basis functions. 3. The random aspect of chaotic signals means they have a vanishing autocorrelation function, and hence are orthogonal. This is useful for code-division multiple access (CDMA) applications, where multiple users can share the same channel, as long as the codes being used are orthogonal. 4. Chaos communications provides some immunity against multi-path propagation effects, which can cause signal dropout [Kolumban, 1997]. Recent work has shown that differential chaos-shift-keying (see below)

40


Information Source

Encoder

Modulation

Transmitter

Information Recipient

Decoder

C h a n n e l

Demodulator

Receiver

Figure 2.22: Communication System can out-perform traditional communication systems under certain conditions [Kolumban, 2002]. In analog communication systems, such as radio and television, a sinusoidal carrier wave is modulated in some way to convey information, the most common modulation techniques being amplitude modulation (AM) and frequency modulation (FM). The pioneering work of Pecora and Carroll [Pecora and Carroll, 1990] on synchronization of chaotic systems, led to much interest in using chaos for analog communications. One early application was chaotic masking, where a message signal is added to a chaotic carrier wave, effectively hiding the message. The message is retrieved by synchronising a similar chaotic system to the first one, so that the chaos can be subtracted, leaving the original message [Chen and Dong, 1997]. At the moment, the consensus is that this method is not robust enough to be of practical use, and needs more work. Digital communication systems are more complex and involve three encoding stages: • Source encoding / decoding • Channel encoding / decoding 41


• Modulation / Demodulation Say we wish to transmit a picture over some channel. Assuming the picture is represented digitally (as a bitmap, for instance), the source encoding stage removes redundant information from the picture data. The picture data is then encoded via the channel encoder. This introduces controlled redundancy as an error-correction mechanism. Finally, the encoded data is mapped onto a set of analog waveforms for transmission across the channel. This is called pulse code modulation (PCM). There are a variety of PCM techniques. In pulse amplitude modulation (PAM), the analog signal takes on a finite number of amplitudes, corresponding to different codewords. This is also called amplitude-shift keying (ASK). Phase-shift keying (PSK) conveys the information by shifting the phase of the analog signal, and frequency-shift keying (FSK) achieves this through the shifting of the analog signal frequency according to each codeword. The most intuitively obvious ways of using chaos is a system called chaos-shift keying (CSK). In CSK, different chaotic signals are associated with different bits or codewords, to convey the information across the channel. This is easily accomplished by varying the parameters of a chaotic system, such as Chua’s circuit [Kennedy and Dedieu, 1993]. A variant of CSK, Chaotic on-off-keying (COOK) uses a chaotic signal to represent a 1, and absence of chaos to represent 0. Another variant is called differential chaos-shift keying (DCSK). In DCSK, a chaotic reference signal is transmitted, followed by the same signal again, to indicate a 1. To transmit a 0, the chaotic reference signal is again transmitted, followed by an inverted copy. DCSK is more robust to channel noise and imperfections that CSK or COOK [Kennedy et al., 1998]. Note that CSK and DCSK are examples of noncoherent communication techniques, i.e. they do not involve chaos synchronisation. A number of excellent review articles and books on chaos communications have appeared recently, and the interested reader is referred to 42


them [Abel and Schwarz, 2002], [Kolumban, 2002], [Kennedy et al., 2000]. As well as CSK approaches to secure communications, Abarbanel and Linsay [Abarbanel and Linsay, 1993] have suggested using unstable periodic orbits of chaotic attractors to store information, although no one appears to have followed up this idea. More recently, Lai and Bollt suggested using a symbolic dynamics approach to secure communications [Lai et al., 1999]. Again, like so many ideas on using chaos, it looks good on paper, but the practical details often present insurmountable obstacles to experimental implementation.

2.4.2

Chaotic Information Storage and Retrieval

In modern computers, information is stored in a set of sequential locations called registers, and to retrieve that information, the address of each register must be known [Lawrence and Mauch, 1988]. This memory is of a passive type. In recent years, a way of encoding information using dynamical systems has been suggested. The basic idea is that nonlinear maps may be synthesized incorporating information-bearing regions. When the map is iterated, these information-bearing regions form a stable cycle, and the stored information can be read by looking at the output of the map. The idea was first presented by Dmitriev in [Dmitriev et al., 1991], and was expanded upon in [Andreyev et al., 1996] and [Andreyev et al., 1997]. We give an outline of the method below. One begins by dividing the state-space into a number of intervals, depending on how many different symbols we wish to use to represent the information. Each of these intervals is subdivided by the same number again. Following the terminology in [Andreyev et al., 1997], we will use a set of six symbols a,b,c,d,e,f. The cardinality of the set is denoted NA . We divide the state-space (the unit interval I = [0, 1]) into NA intervals, and then divide each of these intervals into NA subintervals, each of length

43


x(n+1)

e d c b a

fe d c b a fe d c b a fe d c b a fe d c b a fe d c b a fe d c b a

f

Information- bearing regions

Nonlinear function g

g(fa)=ac etc... abc de f abc de f abc de f abc de f abc de f abc de f

a

b

c

d

e

f

x(n)

Figure 2.23: Storing the word ‘face’ in a 1-d map 1/NA2 ' 0.0277. Say we wish to store the word ‘face’. It is necessary to construct a function g such that g(f ) = a; g(a) = c; g(c) = e; g(e) = f . In practice, we consider the word fragments, fa, ac, ce, ef, and place line segments such that g(f a) = ac, g(ac) = ce etc. These line segments are then connected by straight lines. If the information-bearing line segments have slopes of less than one, the cycle formed will be stable, as illustrated in Fig.2.23. Many different pieces of information can be stored in one map, as long as they don’t overlap. This system can also form an image recognition system. If all the information regions have slope of one, and an element of a stored image is presented to the map, iteration of the map will reproduce the image.

44


Public Channel (Cryptanalyst)

~ P

Message

P

Encrypter

Decrypter

Public Key

Private Key

P

Message

Figure 2.24: Public-key Encryption System

2.4.3

Chaos and Cryptography

The need for strong cryptographic systems has never been greater, as the world wide web becomes the new medium for doing business. The impetus for the renewed interest in this field came from Diffie and Helmann’s famous paper on public key cryptographic systems [Diffie and Hellman, 1976]. In these systems, a pair of public and private keys are generated, with the public key being stored in a public directory. If you wish to send a message P to Mr. X, you encrypt the message P using his public key, to get the encrypted message P˜ . Mr. X then uses his private key to decrypt the message. A cryptanalyst eavesdropping on the public channel should find it computationally infeasible to decode the message without the private key. As with so many other fields, it was Shannon who laid the groundwork for the entire area in [Shannon, 1949] (reprinted in [Sloane and Wyner, 1993]). It is often assumed that chaotic systems have excellent properties for cryptographical applications — sensitivity to initial conditions, noise-like random properties, simple nature of the maps — but the reality is somewhat different. The problem is that much of the research in chaos-based cryptography has not taken into account the basics of cryptographical attacks. One of the key cryptographical tests is called known-plaintext attack, where the attacker knows the message being encrypted (plaintext), the corresponding encrypted message (cipher-text), and the system upon which it

45


is based [Buchmann, 2000]. The only thing hidden is the key. Given this information, most chaos-based encryption schemes do not perform particularly well. That is not to say that all such schemes are inherently flawed. In fact, good non-chaotic cryptographical algorithms display all the characteristics of chaotic systems, viz., exponential sensitivity to changes in key and message data, diffusion and mixing properties. It could well be that modern cryptographical algorithms are innately chaotic, and there may be no need to make the chaos explicit. A recent, highly-readable article by Kocarev [Kocarev, 2001] explores at length the similarities and differences between chaotic maps and cryptographic algorithms. The big difference between the two is that chaotic maps operate on a set of real numbers, whereas cryptographic algorithms operate on a set of integers. Kocarev has shown that by extending the domain of certain cryptographic algorithms to the real numbers, one can obtain chaotic maps. It would seem that chaos is a necessary condition for good encryption algorithms, but its sufficiency needs to be proven in each case. Many novel uses of chaotic systems for cryptography have been suggested. In [Scharinger, 1998], Scharinger proposes an image encryption system based on discrete Kolmogorov flows (which are used to model turbulence). The system possesses both confusion and diffusion properties, and is resistant to differential cryptanalysis attacks. Baptista [Baptista, 1998] suggested a stream encryption scheme based on the logistic map, where ²-intervals of the attractor are associated with each character to be encrypted. The character is encrypted as the number of iterations required to get from some initial condition to the ²-interval of that character. In [Alvarez et al., 1999], Alvarez suggests searching a chaotic sequence of real numbers (which has been thresholded onto the set 0,1), for a binary word. If and when the word is found, the initial condition, the thresholding value and the number of iterations required to find the word, are stored. Both

46


of these encryption schemes are easily-broken using known-plaintext attacks [Jakimoski and Kocarev, 2001]. In [Kocarev and Jakimoski, 2001], Kocarev presents a new block-encryption algorithm based on the logistic map. Although the chaos in the logistic map yn+1 = 4yn (1 − y) is not robust, it can be made so by replacing y with y = (ˆ y + p) mod 1, where yˆ ∈ [0, 1] and p is the key (a real number). By carrying out a cryptanalysis, it is shown that the algorithm is resistant to known-plaintext attack. Some open problems in chaotic encryption are posed in [Li et al., 2003], along with cryptanalyses and reviews of some other chaotic encryption schemes.

2.4.4

Chaos and Computing

Can chaos be used in computation? There is not a great deal of literature on this topic, but Moore has addressed the question in [Moore, 1990] and [Moore, 1998] (in [Calude et al., 1998]). He shows that the Smale horseshoe map and the Baker’s map can be used to form the basis of a Turing machine. A Turing machine consists of a bi-infinite tape with symbols written on it. The tape head operates on these symbols, and then moves the tape to the left or right. If the tape consists of binary digits, . . . a−2 a−1 a0 a1 a2 a3 . . ., this can represent a point (x, y) in the unit square, where x = 0.a−1 a−2 . . . and y = 0.a0 a1 a2 . . .. The action of the Baker’s map is equivalent to halving x and doubling y to get x = 0.a0 a−1 a−2 . . . and y = 0.a1 a2 . . . (see Figure 2.25). In other words, the tape has shifted to the right. It is important conceptually that a chaotic map can do this, but nobody would ever actually build such a Turing machine. A different computation paradigm has been mooted by Sinha and Ditto [Sinha and Ditto, 1998], [Sinha and Ditto, 1999]. They have used a lattice of chaotic maps, coupled together, to form simple adders, multipliers and logic gates.

They use the logistic map, with r = 4, throughout their

47


y

y

1

1

S

S x 0

x

1

0 R1

0.5 0.5+R2 1

Figure 2.25: Action of the Baker’s Map on the unit square system. The coupling is carried out via a thresholding mechanism. If the value of a map x(n) exceeds some critical value x∗, then the excess value x(n) − x∗ avalanches onto the next member of the lattice. When this relaxation is carried out for all the elements, each chaotic map member of the lattice is iterated once. The system evolves dynamically by iteration → relaxation → iteration . . .. The system output is just the state x(n)of the last member of the lattice. For different values of the critical value x∗, the system gives rise to outputs of different periodicities. The critical values of the lattice elements are essentially ‘programmed’ to give the desired output response. The chaos is needed in order to yield the different periodicities required, and also to ensure that the system does not become locked in one region of the state-space (ergodic property of chaos).

2.4.5

Digital Watermarking

Digital watermarking refers to the embedding of a watermark into data for security purposes [Voyatzis and Pitas, 1996], [Voyatzis and Pitas, 1999]. The watermark can be something like a company logo, or a code, and may either be visible or imperceptible. The data is often in the form of an image. If you were a photographer or artist working in the digital domain and wished to protect your work, a watermark could be added to your work 48


and only you would have to key that would verify the watermark was present or absent. There should be almost no difference between the watermarked image and the original. Watermarking may take place in the spatial domain (i.e. direct modification of the image), or in a transform domain (such as Discrete Cosine Transform) [Giovanardi and Mazzini, 2001]. In one common technique, the original image I and the watermark W are combined to give a watermarked image I 0 : I 0 = I + aIW

(2.24)

The constant a is chosen so that I 0 ' I. Common watermarks include corporate logos and pseudo-random number sequences. There is increasing interest in using chaotic sequences as watermarks as an efficient alternative to pseudo-random sequences. The general idea is that a chaotic map is iterated, and using a thresholding function, a sequence of 10 s and 00 s is produced, which form the watermark W .

2.4.6

Other Applications

Confusion often arises between the phrases ‘Applications of Chaos’ and ‘Applications of Chaos Theory’. We try to emphasize the former meaning in the thesis. The difference is exemplified in the question ‘Is the chaos actually doing something, or merely explaining something?’. Some examples of both types of applications are mentioned below. Applications of Chaos • Bollt and Meiss [Bollt and Meiss, 1995] show how chaotic orbits in the Earth-Moon system can be used to achieve transport with a 38% fuel saving over conventional transfer from a parked Earth orbit to a parked Moon orbit. Unfortunately, the duration of the transfer is significantly longer than conventional methods. 49


• Wang has shown that chaotic systems can be used to detect weak signals [Wang et al., 1999]. The driven Duffing chaotic oscillator is placed in an operating region near a bifurcation point. A weak signal is added to the system, which sends the system into an intermittently chaotic state, where the dynamics alternate between chaos and nonchaos. The length of time spent in the laminar (non-chaos) region is used to estimate the weak added signal. • Cellular Neural/Nonlinear Networks (CNNs), devised by Chua (see e.g. [Chua, 1999]) are networks of chaotic circuits in which complicated and emergent phenomena arise. They can be used to solve partial differential equations, among other things. Applications of Chaos Theory • In [Stewart, 1990], ideas from chaos theory are used in an industrial spring factory to test whether springs are good or bad. The spring coil-spacings are represented as a time-series, and the Ruelle-Takens theorem for attractor reconstruction is applied. The shape of the reconstructed attractor is a good indicator of how good the spring is. • Digital phase-locked loops are very important electronic components use for frequency synthesis, frequency tracking, and FM demodulation. The occurrence of nonlinear effects such as phase jitter, instability and cycle-slipping in these devices, is explained by Teplinsky in [Teplinsky et al., 1999] by modelling the system as a 2-D map and applying various techniques from chaos theory to the problem. • Voltage collapse in power systems occurs when the demand for reactive power by loads and by transmission lines becomes too great. In the past it has lead to widespread blackouts, and can be very costly for everyone concerned. Chaos theory is being applied in the 50

2.5 Conclusions

study of voltage collapse, as it is an inherently nonlinear phenomenon [Wang et al., 1994] [Various Authors, 1995]. Voltage collapse is often preceded by a series of bifurcations and chaotic crises. Knowledge of these precursors to the collapse can lead to new contingency strategies for power system operators.

2.5

Conclusions

In this chapter, we have traced the history of chaos theory from the work of Poincaré on the three-body problem in the late nineteenth century through to its most recent applications in encryption and information processing. We have outlined the key developments in the theory by visionaries such as Smale and Feigenbaum. We have introduced the key characteristics of chaotic systems that will be of relevance in the coming chapters. And we have attempted to summarize the main work on applications of chaos. Having considered the diverse areas in which chaos has been applied, two things stand out. First is the diversity of the applications, from communication systems to chaotic moon-orbits. This is a consequence of the many facets of chaos. One application might use the noise-like property of chaos, whereas another might use the sensitive dependence property. Another noteworthy point is that there is no coherence to the applications, they run off in all directions without any sense of unity. If a non-specialist asked what chaos is mainly used for, it would be impossible to give a simple answer (assuming that the question is valid). By way of contrast, the phenomenon of resonance, which also occurs in many different fields, has many concrete applications, ranging from band-pass filters to microwave ovens. In the next chapter, we introduce a new application of chaos by applying the Baker’s map to a benchmark problem in pattern classification.

51

Chapter 3

Application of Chaotic Maps to Pattern Classification If the doors of perception were cleansed, everything would appear to man as it is, infinite. William Blake, 1790

3.1

Introduction

In this chapter, we present a novel way of performing simple pattern classification tasks by utilizing the variation in Lyapunov dimension of the generalized Baker’s map. There are several ways in which this example of applied chaos can be viewed. First of all, it serves as a motivating example of what can be done with chaotic maps. It is not obvious that the Lyapunov dimension of the Baker’s map varies in a way which creates a natural exclusive-OR gate, or that this map can be trained as a rudimentary pattern classification system. Secondly (and disappointingly) our method illustrates that using chaos just for the sake of it may not be quite as good as traditional methods. This is the main criticism that is often levelled at chaos applications in telecommunications and encryption (although the chaos may prove its worth 52

3.2 Pattern Classification

in the future with further research and refinement). More important than these two viewpoints is that the example is typical of the ad-hoc approach that pervades current applications of chaos. While it may be a testament to the skills of physicists and engineers that communications and computational systems can be fashioned out of the most unlikely of components, the entire field is crying out for a systematic approach where maps can be synthesized to a given specification. We present some small steps in that direction in Chapters 4 and 5. In this chapter, we describe the pattern classification problem and traditional approaches to it. We then describe the generalized Baker’s map and show how it can be used to solve the XOR problem, a benchmark problem in pattern classification. We also show that our classification system can be trained to recognize new patterns. As an illustration, we use a simulated annealing algorithm to perform the training. This work has been published in [Rogers et al., 2002] and [Rogers et al., 2004].

3.2

Pattern Classification

Pattern Classification is the act of recognising patterns as belonging to a particular class or group. A pattern can be an abstract representation of an object (e.g. a letter of the alphabet or an aircraft, or a fish), or it can simply be a set of numbers, a vector. In the former case, distinctive features of the object are selected, and represented by numbers. For example, an aircraft might be represented by its length, number of engines, and wing-span. The pattern representing the aircraft is then a three-dimensional vector in feature-space. A simple classifier would be able to easily distinguish different aircraft types, such as concordes and jumbo-jets. One would also find that similar planes, such as the Boeing 737 family, would be clustered in feature-space, as they all have similar properties. (Note that the terms Pattern Classification and Pattern Recognition are often used interchangeably in the literature.) 53


Boeing 757s

Wingspan

Airbus A320 family

Boeing 737s Length

Figure 3.1: Feature space of twin-engined aircraft families

3.2.1

The Pattern Classification Problem

The purpose of a pattern classifier is to assign a feature vector to its correct class [Duda et al., 2001]. The central problem of pattern classification is to construct a system to do this which gives as few classification errors as possible, and also to have the classification system as simple as possible. As is pointed out in Duda [Duda et al., 2001], it would be perfectly possible in theory to generate a lookup table of patterns for optical character recognition. But if the character was represented as a matrix of say 20 × 20 binary pixels, which corresponds to about 10120 patterns, the resources for storage and for searching through the table would be quite prohibitive, hence simpler ways are sought.

3.2.2

Traditional Pattern Classification Techniques

We outline here some of the more traditional pattern classification techniques that are used. Note that supervised learning, which is the type our pattern classifier uses, means that the classifier is given training data in order to optimise the classifier before use. The theory is that if the classifier 54


can recognise the test data correctly, then it should perform well when new patterns are presented to the classifier. There is another technique called unsupervised learning where the classifier attempts to form groups or natural clusters of the input data, but is not told whether these are right or wrong. Reinforcement learning is also called learning with a critic, where the classifier is told merely that the classification is right or wrong, but not the degree of the error. The schemes outlined below are supervised learning algorithms. Further details may be found in the classic texts by Duda et al. [Duda et al., 2001], and Ripley [Ripley, 1996], and in [Hastie et al., 2001] and [Theodoridis and Koutroumbas, 1999]. Linear Discriminant Functions and Nearest Neighbours The simplest way to do pattern classification is to use a linear discriminant function, which is written as g(x) = wT x + w0

(3.1)

where x is an input vector representing the pattern, w is the weight vector, and w0 is called the bias [Duda et al., 2001]. If we have two classes of patterns, C1 and C2 , then a typical classifier implements the following rule: Pattern x ∈ C1 if g(x) > 0 and pattern x ∈ C2 if g(x) < 0. The equation g(x) = 0 represents the decision surface that separates the two classes. If g is linear, then the decision surface is a hyperplane in the space of patterns. If x1 and x2 are two vectors lying in the hyperplane, then from Equation 3.1 we get wT x1 + w0 = wT x2 + w0 and so wT (x1 − x2 ) = 0 which shows that the weight vector is normal to the hyperplane. The task in pattern classification is to determine the values of the weight vector to 55


x2

C2 C2

C1 C1 C2

x1

Figure 3.2: The nearest-neighbour rule creates a polyhedral tessellation of the feature space give optimal classification. There are several methods used to do this including linear regression and gradient descent methods [Duda et al., 2001], [Hastie et al., 2001]. Perceptrons and Neural Networks, described in the next section, are simply used to implement linear discriminant functions. Another conceptually simple method of pattern classification is called the nearest-neighbour rule (often generalised as k-nearest neighbours (KNN)). In this algorithm, the training data set elements are all classified into the various classes C1 , . . . , Cn . We then take a test point x, find the nearest training element to the test point, x0 and assign the class of the nearest neighbour to the test point. In the KNN algorithm, a hypersphere is expanded around the test point until the k nearest neighbours are enclosed, and a majority vote is taken as to the class of the test point. The nearest-neighbour rule leads to a partitioning of the feature space into a grid of polyhedral cells called a Voronoi tessellation (see Figure 3.2) [Duda et al., 2001].

56


Fixed Input

x0

Input Signals

x1 x2

wk0

wk0 = b k (bias)

Activation Function

wk1

Σ

wk2 Synaptic Weights

xm

vk

ϕ(.)

Output

yk

Sum m ing Junction

wkm

Figure 3.3: Model of a neuron Perceptrons and Neural Networks The term artificial neural network is used to describe a wide-class of statistical pattern recognition and function approximation paradigms [Bishop, 1995], [Haykin, 1999], [Ripley, 1996]. Originally motivated by the perceived structure of the human brain, neural neworks consist of a large number of individual units, or neurons, connected together in some way. Information is usually stored in the artificial neural network by modifying the connection strength between neurons, and by modifying the non-linear activation characteristics of the individual neurons. Perhaps the most widely used artificial neural network is the multi-layer perceptron (MLP). Originally derived as an extension of the single layer perceptron, the MLP has found widespread academic application in pattern recognition problems, and in problems that require non-linear static and dynamic function approximation [Narendra, 1996]. The popularity of this network type stems directly from the availability of an algorithm for adapting the parameters of the network based upon training data. In Figure 3.3, we show a typical neuron, with its constituent parts. Here,

57


Bias, b

x1 x2 Inputs

w1

. . .

v

w2

ϕ(v) Hard Limiter

wm

Output y

xm Figure 3.4: Perceptron as a signal-flow graph we have m inputs, which are modified by the synaptic weights and summed at the junction. The activation function determines the output from the neuron. Typically, a sigmoid function or a Heaviside function is used as the activation function, although there are many other possibilities. The MLP is constructed by connecting together many neurons. A single-layer perceptron (introduced by Rosenblatt [Rosenblatt, 1962]), or Adaline (introduced by Widrow and Hoff [Widrow and Hoff, 1960]), is a particular type of neuron, based around the model shown in Figure 3.3, with a hard limiter as the activation function. That is, when the hard limiter input is positive, the neuron produces a +1 output, and it produces a −1 output when the input is negative. For some pattern recognition problems, single-layer perceptrons can be used to separate a set of inputs into two classes. We shall now describe how this is achieved. First of all, it is customary to show the single-layer perceptron as a signal-flow graph. As can be seen from Figure 3.4, the inputs are given by (x1 , x2 , xm ) . The externally applied bias b, can be used to modify the activation potential v (also called the local induced field). The hard limiter input is given by: v=

m X

wi xi + b

i=1

58

(3.2)


As we saw earlier with the linear discriminant function, the task of the perceptron is to classify some set of inputs (x1 , x2 , xm ) into one of two classes, C1 or C2 . In other words, if the inputs belong to class C1 , then the neuronal output, y, is +1 (say), and the output y is −1 if the inputs belong to class C2 . If we look at Equation 3.2, we can see how this is done. A hyperplane in m-dimensional space is given by the following relation: m X

wi xi + b = 0

(3.3)

i=1

This hyperplane forms the decision boundary between the two classes C1 and C2 . If v is positive, then the points lie on one side of the decision boundary, and lie on the other side if v is negative. This is illustrated in Figure 3.5. For simplicity, we have only illustrated the two-dimensional case (i.e. two inputs x1 , x2 ). It is also possible to have several perceptrons in parallel, which allows the classification of points into more than two classes. Clearly, if we had two perceptrons, then we would have two hyperplanes, allowing the classification of points into one of four classes. It should also be clear though that because the hyperplane acts as a linear boundary, the classes must be linearly separable, i.e., you must be able to draw a line (hyperplane) down the middle separating the two sets of points. Otherwise, the perceptron may give an incorrect classification for some points. Based upon the above discussion, it is clearly not possible to solve the Exclusive OR (XOR) problem (see Section 3.3) using a single layer perceptron since the members of each pattern class are not linearly separable. It is possible to solve this problem using an MLP, or more specifically, using a two-layer perceptron; the first layer transforms the problem into a linearly separable problem, which the second layer then solves.

59


x2

Class C2 Class C1 x1

Decision Boundary w1x1+ w2x2 + b = 0 Figure 3.5:

Hyperplane as a decision boundary for a two-class, two-

dimensional pattern classification problem.

60


3.2.3

Pattern Classification using Chaos in Neural Networks

The use of chaotic neural networks for pattern classification tasks traces its origins to Freeman’s discovery of chaos in the olfactory bulb of rabbits [Skarda and Freeman, 1987], [Freeman, 1994]. Freeman contends that chaos makes perception possible (through sensitivity to initial conditions), and that the presence of chaos means that the brain is in an ‘always on’ mode, and is a source of novel brain activity experienced by us as new ideas [Freeman, 1991]. This key discovery that the brain is always in a chaotic state has motivated many groups to create neural networks which mimic the action of the brain, albeit in a limited way, in order to perform learning and classification tasks. There has been some debate over the precise role of chaos, with some arguing that the sensitive dependence property is illsuited to pattern classification because it tends to amplify minor differences in patterns. Other suggestions include chaos as the brain’s novelty filter, chaos as a memory searcher, and chaos as a self-referential logic generator [Tsuda, 1992], [Freeman, 1995]. The neural networks used to model brain activity are called recurrent neural networks. They differ from perceptrons in that there are feedback connections between different layers. It is possible to arrange things so that each node in the network has an associated chaotic attractor, with memories being stored in the connection weights between the different nodes. Most of the literature on this topic has involved variations on this neural network theme, and one senses that a lot of the work has been on an ad-hoc basis: the networks are difficult to understand and harder to interpret. More recently, and based on real-world data, Freeman has created a robust pattern classification system based on networks which have co-existing global chaotic attractors and which appear to more closely model real brain activity [Kozma and Freeman, 2001]. He has overcome the problem of sensitive dependence by adding noise to the system. 61

3.3 The Baker’s Map: A natural XOR gate

The papers of Kojima and Ito are representative of much of the work combining neural networks, chaos, and pattern classification [Kojima, 1998], [Kojima and Ito, 1999b], [Kojima and Ito, 1999a]. They set up a neural network in which each neuron models the Lorenz equations, and then patterns are stored as synaptic coupling weights between different neurons. When no pattern is presented as an input to the system, the output of the system roams through the stored patterns (a process akin to dreaming). If part of a stored pattern is presented as an input, the whole pattern will be reproduced. Without being over-critical, it is hard to see the precise role of the chaos in these systems. Ordinary neural networks can store many patterns without recourse to chaos. Also, the inner-workings of neural networks can be rather opaque to the uninitiated, and adding chaos to the mix does not help.

3.3

The Baker’s Map: A natural XOR gate

In this section, we will show how the Generalised Baker’s map can be used to solve the Exclusive OR (XOR) problem. This is a fundamental problem of pattern recognition, and involves telling at a single glance whether a point belongs to one of two classes: class A or NOT class A (class B), where class A consists of two diagonally opposite corners of a unit square, and class B consists of the other two corners. The inability of a single-layer artificial neural network (ANN) to solve this problem was considered to be a severe drawback for ANNs as a mechanism for nonlinear problem-solving, but the advent of multi-layer networks overcame these problems. We will look at the XOR problem in more depth later. The Generalised Baker’s map is a two-dimensional, three-parameter, nonlinear mapping, which is chaotic for virtually all parameter values. We use it here because it is one of the best-understood chaotic maps, and is particularly suited to rigorous analysis (see Farmer [Farmer et al., 1983]). 62


It also has the useful property that its Lyapunov dimension is monotonically increasing for a wide range of parameter values, and we shall utilise this when we develop the XOR gate. Neither the Baker’s map, nor any other chaotic map, has been previously used to solve the XOR problem in this way.

3.3.1

The Generalized Baker’s map

In their classic study of fractal dimensions, Farmer introduced the Generalised Baker’s Map in order to obtain rigorous results on the dimension of strange attractors [Farmer et al., 1983]. It is a transformation of the unit square [0, 1] × [0, 1], and has three parameters, R1 , R2 and S:

xn+1

yn+1

  R1 xn if yn < S =  1/2 + R x if y ≥ S 2 n n   yn /S =  yn −S 1−S

(3.4) if yn < S if yn ≥ S

We illustrate the Baker’s map transformation in Figure 3.6. As can be seen from Equation 3.4, the mapping depends on whether the point in question is above or below a horizontal line y = S. Since the Baker’s Map is a mapping of the unit square, we restrict S to the range (0, 1) and R1 and R2 to the range (0, 0.5]. From Figure 3.6, we see that iterating the map gives two vertical strips, whose widths depend on R1 and R2 . Iterating the map again gives four strips, then eight strips, and so on. The attractor is the union of a line segment (vertical direction) and a Cantor set (horizontal direction). It can also be seen in the figure that the action of the map leads to stretching in the y-direction and compressing in the x-direction, which we

63


y

y

1

1

S

S x 0

x

1

0 R1

0.5 0.5+R2 1

Figure 3.6: Action of Baker’s Map on unit square can quantify using Lyapunov numbers. These numbers characterise the stability of the map, and are defined as follows (see Farmer [Farmer et al., 1983]): Let Jn = [J(xn ) · J(xn−1 ) · · · J(x1 )] , where J(x) is the Jacobian of the map,J(x) = (∂F/∂x) , for some map F . Let j1 (n) ≥ j2 (n) ≥ · · · ≥ jp (n) be the magnitudes of the p eigenvalues of Jn . Then the Lyapunov numbers are given by: Λi = lim [ji (n)]1/n , i = 1, 2, · · · , p n→∞

(3.5)

Since the Baker’s map is two-dimensional, it will have two Lyapunov numbers, characterising the average stretching/compression factors in the x and y directions (see Figure 3.7). Note that the Lyapunov exponents are simply the logarithms of the Lyapunov numbers. It is customary to order the Lyapunov numbers, so that Λ1 > Λ2 > · · · > Λn . As was mentioned in Chapter 2, the Lyapunov dimension was introduced by Kaplan and Yorke [Kaplan and Yorke, 1979] in the so-called KaplanYorke Conjecture: that the Lyapunov dimension DL is the same as the information dimension for ‘typical’ attractors. For the Baker’s map, DL = 1 −

64

log Λ1 log Λ2

(3.6)

(Λ1 )n δ


(Λ2 )n δ

Figure 3.7: Lyapunov numbers chracterise the average stretching factors of some small circle of radius δ. Here, Λ1 > 1 and Λ2 < 1. The Jacobian of Equation 3.4 can be  L2 (y) J = 0   1/S L1 (y) =  1/(1 − S)   R1 L2 (y) =  R 2

written in the following form:  0  where (3.7) L1 (y) when y < S when y ≥ S when y < S when y ≥ S

From equation 3.5 we get Λ1 = lim [L1 (yn ) · · · L1 (y1 )]1/n n→∞

Λ2 = lim [L2 (yn ) · · · L2 (y1 )]1/n n→∞

By taking logs, and with some manipulation, noticing that the orbits are ergodic in the y-direction (see Farmer [Farmer et al., 1983]), we find that the Lyapunov exponents are:

log Λy = S log

1 1 + (1 − S) log S 1−S 65

(3.8)


Figure 3.8: Variation of Lyapunov dimension with R and S. log Λx = S log R1 + (1 − S) log R2

(3.9)

In our implementation of the XOR gate, we only require two input parameters, so we shall let R2 = R1 . In Figure 3.8, we show how the Lyapunov (fractal) dimension varies with R and S. Notice that the fractal dimension varies between 1 and 2, as we would expect, and is symmetrical about S = 1/2.

3.3.2

Solving the XOR problem

If we plot the fractal dimension of the Baker’s map for varying values of R and S, it becomes obvious how we can use the map to solve the XOR

66


Figure 3.9: Variation of Lyapunov dimension with R and with S as a parameter. problem. Firstly, we show how the Lyapunov exponents (Equations 3.8 and 3.9) vary with R and S (see Figure 3.8). Clearly, since the map is contractive in the x-direction, the Lyapunov exponent in that direction is always negative. Conversely, the map is expansive in the y-direction, and therefore that Lyapunov exponent is always positive. In Figure 3.9, we plot DL against R, with S as a parameter. The fractal dimension varies between 1 and 2 and the fractal dimension is symmetrical about S = 1/2. We have chosen slightly asymmetrical values of S to illustrate this. We can choose values of R and S, so that a pair (low R, high S) and another pair (high R, low S) give the same fractal dimension, say DA . (We thus encode the problem using the values of R and S.) This corresponds to a diagonally opposite corner pair in the XOR problem. We

67


R value

S value

Fractal Dimension

Class

0.1

0.5

1.3

A

0.36

0.1

1.3

A

0.1

0.1

1.14

B

0.36

0.5

1.68

B

Table 3.1: Parameter values and their corresponding fractal dimension and class for Figure 3.9 can say therefore that if the fractal dimension DL = DA , then the inputs (R and S) are in class A, and if DL 6= DA , then the inputs belong to class B. Note that we always limit S to the range [0, 0.5], to ensure a unique fractal dimension for any given (R, S) pair. Obviously, the points in Table 3.1 do not lie on a perfect square, but that is unimportant. The key idea is that two pairs of diagonally opposing points are mapped to the same class. It is also clear that we are quite restricted in the possible pairs of points which we can map to the same fractal dimension. However, if we choose any four (R, S) pairs of points corresponding roughly to (low, low), (low, high), (high, low) and (high, high), then by drawing a straight line through the (low, high), (high, low) points and intersecting the y-axis, we can effectively solve the XOR problem for much larger set of inputs. This is because we no longer require diagonally opposite points to have the same Lyapunov dimension DA as in Figure 3.9. We call the intersection of this line with the y-axis, DM , the (modified) Lyapunov Dimension. This is illustrated in Figure 3.10. Procedure for calculation DM 1. Given four points in the R − S plane, select the two points belonging to the same class: (Ra , Sb ), (Rb , Sa ) in Figure 3.10. 2. Calculate the Lyapunov dimensions corresponding to the two points,

68


DM

Figure 3.10: A more general way of solving the XOR problem: Draw a straight line through the two points belonging to Class A and find where the line intersects the y-axis.

69


called D1 and D2 . 3. Calculate the slope, m = (D1 − D2 )/(Ra − Rb ). 4. The dimension DM = D1 + mRa = D2 + mRb . As DM is constantly calculated, we can tell whether the inputs are in class A, or not. An algorithm of this form is a rudimentary type of training algorithm, as referred to in the artificial neural network and statistical pattern recognition literature [Ripley, 1996]. The availability of such an algorithm, and its complexity, ultimately determines the applicability of a particular paradigm for a given problem. In our case, given a set of class labels, and a set of vectors, the training parts of the pattern recognition problem is trivial, involving only the simple calculation of a slope. For an ANN, solving this problem requires repeated calculation of the slope for at least two hyperplanes, and so is more computationally intensive. The system is easily implemented with a few lines of code. Essentially, we need to simulate the Baker’s map, given its input parameters, and then, using its state variables xn and yn , compute the Lyapunov dimension DL of the attractor. The speed of the system depends on the computation of the Lyapunov dimension. The traditional way to do this [Hilborn, 1994] assumes that no detailed information is available about the system, that is to say, only a time-series x0 , x1 , x2 , . . . is available from the system. Given this time series, some value from the sequence is selected, say xi , and then one searches the sequence for another value xj that is close to xi . The sequence of differences is assumed to diverge exponentially, on the average: d0 = |xj − xi | d1 = |xj+1 − xi+1 | .. . dn = |xj+n − xi+n | 70


We assume that dn = d0 eΛn which, after taking logarithms gives, Λ=

1 dn log n d0

(3.10)

Since we would like our system to be as fast as possible, this method is computationally expensive1 , as it involves continually searching through some large array of numbers, and then performing additional calculations. As we have the Baker’s map data readily available, and since we have its input parameters already, we use a quicker way to compute the Lyapunov numbers. Iterate the Baker’s map f (x, y) as normal (call it B1), but also iterate another Baker’s map (B2) after each iteration of B1. For each pair (xn , yn ) generated by B1, use some nearby pair of numbers (xn +δ, yn +²) as initial conditions for B2. Then iterate B2 once (we have found that iterating more than once does not improve accuracy, but merely slows things down). Compute the Lyapunov numbers: Λx = log

|f (xn , yn ) − f (xn + δ, yn | δ (X component)

Λx = log

|f (xn , yn ) − f (xn , yn + ²)| ² (Y component)

The numbers thereby computed tend to be noisy, but when averaged, they give the expected theoretical values. Note also that since the map is always contracting in the x-direction, choosing a very small value for δ gives entirely inaccurate results, hence we use δ ≈ 1. To illustrate the action of the system, we choose four distinct points, more or less arbitrarily (see Figure 3.11). We compute the slope of the line 1

Computation of a Lyapunov exponent from a 1-D time-series approximates to an

O(n2 ) calculation because it is necessary to search through the entire time-series for points near the one under consideration, whereas the method outlined here approximates to an O(2n) calculation.

71


Figure 3.11: Four points selected to illustrate the chaotic XOR system

Point No.

R value

S Value

Class

1

0.2

0.5

A

2

0.3

0.1

A

3

0.15

0.2

B

4

0.35

0.4

B

Table 3.2: Parameter values and their corresponding fractal dimension and class for Figure 3.9

72


Figure 3.12: Output from chaotic XOR gate with inputs as in Table 3.2. Contiguous sets of 200 points are averaged. Class A corresponds to a modified Lyapunov Dimension DM ≈ 1.75. Note that we cycle through the points (1), (4), (2), (3) and (1) respectively. between points 1 and 2, belonging to class A, to be: m = −1.60667, and the (modified) Lyapunov Dimension DM = 1.752. In Figure 3.12, we plot the output from the Baker’s map system. Here, we have averaged every 200 points, to smooth the output. Clearly, there is a tradeoff between speed of pattern classification, and accuracy. If we average more points, then we get a smoother output, but this introduces a delay into the recognition process. Note also that the relative smoothness also depends on how large the value of S is, with S = 0.5 giving a perfectly smooth output. This is because the expansion rates in the y-direction are the same only when S = 0.5 (see Figure 3.6).

73

3.4 Training the Pattern Classifier

It is clear from Figure 3.12 that by merely observing if the output dimension lies in some suitable range about 1.75, we can tell if the input is in class A, or class B.

3.4 3.4.1

Training the Pattern Classifier Simulated Annealing

Annealing is a process used to toughen steel, so that it may be machined or cold-worked [Amstead, 1997]. It involves heating a solid to a high temperature, and then slowly cooling it in a controlled manner. At high temperatures, the molecules have a lot of energy and are able to move randomly within the solid. They tend to move to positions that lower the energy of the system as a whole, but can also move to positions of higher energy, with a probability e−∆E/T , where ∆E is the change in energy of the system, and T is the temperature of the system. The annealing process wipes out any traces of previous structure, and relieves internal stresses within the metal, making it less likely to fracture. With an absence of defects, the metal crystal is in a global minimum energy state. In 1953, Metropolis [Metropolis et al., 1953] proposed a simulation scheme for the evolution of thermodynamic systems to equilibrium. Thirty years later, Kirkpatrick [Kirkpatrick et al., 1983] realized that the Metropolis algorithm could be applied to optimisation problems in general, with a cost function taking the place of energy. Simulated annealing is particularly suited to optimisation problems where the global minimum is located amongst many poor local minima [Press et al., 2002]. In order to apply the Metropolis Algorithm, we require the following elements [Keating and Noonan, 1994]: (i) A description of possible system configurations (ii) A method of randomly perturbing the system configurations (iii) A cost function (analog of energy), whose minimisation is the aim of

74


the procedure (iv) A control parameter T (analog of temperature) which determines the likelihood of an increase in cost being accepted. The simulated annealing algorithm then follows these basic steps: 1. Initialise the system with some state S, and control parameter T = T0 . 2. Perturb system randomly to a new state SN . 3. Determine change in cost function, ∆E = E(SN ) − E(S), due to the random perturbation. 4. If ∆E < 0, accept new system state SN , or, if ∆E > 0, accept SN with probability e−∆E/T . 5. Repeat steps 2 to 4 for, say, 100 successful reconfigurations. 6. Let T = T ∗ c, where c < 1, and then repeat steps 2 to 5. 7. Stop when T gets small, or E cannot be reduced further. The annealing schedule, which determines how many perturbations are carried out at a given temperature, and the value of the control parameter c, is usually determined through experimentation. The initial temperature T0 is chosen so that e−∆E/T0 ' 1, that is to say, most random perturbations leading to an increase in energy will be accepted. Herein lies the power of simulated annealing - the system is able to escape from local minima. As the temperature is lowered, only small perturbations are accepted, and so the system gradually approaches equilibrium.

3.4.2

Training and Testing the Classifying Lines

In our system, we wish to position a line so that it will either map two patterns to the same modified Lyapunov dimension DM , or separate patterns so that we can say that values of DM less than a certain value belong to, say, 75


Illustration 1 Dim

B

DM

Illustration 2

A

A

Dim

B

A B

B

DM

B

R

R

Illustration 3 DM B

B

Dim B

B

R

Figure 3.13: Three possible class configurations (out of a possible 16). Class A, and all others are class B. For illustration purposes, we show three possible situations in Figure 3.13. The three illustrations show different class configurations in the R − DL plane (with S as a parameter). Illustration 1 is simply the XOR problem. Illustration 2 is an AND gate. Illustration 3 is an inverting logic gate (every value of DM leads to a NOT Class A outcome). Obviously, there are 16 possible variations on this theme. We seek a way of training the dotted line in the figures so that it correctly classifies the patterns presented to it. If we consider a line given by ax + by + c = 0, the perpendicular distance from a point x1 , y1 to the line is given by: d=

|ax1 + by1 + c| √ a2 + b2

(3.11)

In the pattern classification literature, it is usual to train the classification scheme using a certain set of patterns, and then test the system with a different set of patterns to see if these are classified correctly. This shows 76


yo d1 B

A

d4

ooooooo oooo

+++ ++ ++ +++ +++ + ++

d2

d3

oo o oo oo ooo ooo oooooo ooooo

B

line, slope m, intercept y0

+ + + +++ + ++ +++ +++++++ +

A

Figure 3.14: Distances from each pattern cluster to the separating line. + symbols belong to Class A and o symbols belong to Class B.

77


that the system can generalise - that it can classify patterns that it hasn’t seen before. In our scheme, we could train two different classifying lines to take account of the possibility of an XOR-type arrangement of the pattern classes. In the testing process, we would then choose the classifying line that gives the fewer classification errors. Consider a set of patterns and classes as in Figure 3.14, with a line, slope m, and y-intercept y0 lying in the plane. Considering each pattern in turn, if the point lies on the correct side of the line for classification purposes, we say that the distance is negative, otherwise it is positive. In Figure 3.14, the line classifies patterns in three of the clusters correctly. Hence we have d2 , d3 , d4 < 0, and d1 > 0. We use a cost function based upon the hyperbolic tan function (see Figure 3.15), where we sum over all the patterns in all the classes. Cost Function 1: P E = ClassA,B tanh(di ). If the pattern classes are in an XOR-type arrangement, we could just consider one of the pattern classes, and using simulated annealing to train a line which runs through the centroid of both pattern clusters. The cost function we would then use would have the following form: Cost Function P 2: E = ClassA tanh(d2i ). When the annealing algorithms for both cost functions have converged to a solution (a slope and an intercept), we test the two solutions using a new set of patterns. The correct classifying line should give a zero or minimal error. As an illustration of the simulated annealing process, we will demonstrate with a set of classes as in Figure 3.16. The desired line should separate the two classes as shown. We will choose an initial line with slope m = 2 and y−intercept= −3. We determine the perpendicular distances from points scattered randomly about the centroids of the individual patterns (0,0),(1,0),(0,1) and (1,1), and determine if the line is orientated to the correct side of each of the patterns. In Figure 3.17, we show an energy landscape of the system, where the

78


tanh functions

1

0.5

0

-0.5

-1 -4

-2

0

2

d

Figure 3.15: Tanh functions: Solid Line is tanh(d) and dotted line is tanh(d2 )

79

4


desired line

1

0

B

A

initial line

A

A

0

1

Figure 3.16: Illustrative example of applying simulated annealing to an AND gate (Class B corresponding to (1,1) gives a TRUE output): the initial line has slope m = 2 and y − intercept = −3. The desired line should have slope m = −1 and y − intercept > 1.

80


Figure 3.17: Energy landscape for patterns in Figure 3.16. Minimum energy = −3.84, when m = −1 and y0 = 1.55. elevation represents the cost of a particular line. The uplands represent regions where all patterns are classified incorrectly, and the basin represents the desired line, which in this case has slope of −1 and intercepts the y−axis at 1.55. The simulated annealing algorithm will find the minimum energy point of this landscape. Note that for simplicity in this example, we only used four patterns, and the minimum value of the cost function 1 is, as a consequence, approximately −4. For cases where we have several hundred patterns, the minimum energy would be considerably lower. In Figure 3.18, we show four different plots of energy during four runs of the annealing process. Clearly, there is quite a lot of variation because it 81


Plot of Energy Minimsation during Annealing Process 10

8

Energy of System

6

4

2

0

-2

-4

0

50

100

150

200

250

300

350

400

450

Temperature Decreasing

Figure 3.18: Plots of energy (cost) as temperature is decreased for four runs of the simulated annealing algorithm

82

500

3.5 Summary and Conclusions

is a random process, however the final value of energy is very close to −4 in each case. It is advantageous to introduce an extra condition into the cost function to ensure that the values of slope and intercept remain small: E=

4 X

tanh(di ) + |m| + |y0 |

i=1

This condition is relaxed after 250 iterations and accounts for the small drop in energy at that point. The choice of cost function depends on the type of patterns we wish to classify, as there may be occasions where we need a large slope. A typical final output from the algorithm is: System Energy = −3.93811 Slope = −0.970605 y-intercept = 1.50911.

3.5

Summary and Conclusions

We have shown that it is possible to engineer a rudimentary pattern recognition systems using the chaotic Baker’s map, by treating the map’s parameters as inputs, and the Lyapunov exponent of the map trajectory as the output. We used the map parameters R, and S, to encode the pattern data, and found that opposite points in the XOR problem gave rise to the same Lyapunov dimension output. We attempted to generalise the system by using the concept of a modified Lyapunov dimension. We also illustrated how the system could be used to identity other simple 2-bit patterns, and we showed that the system could be trained once a suitable cost function was developed. This was illustrated using a simulated annealing algorithm, although other optimisation algorithms could also be used. There are a number of points to be made about this system which act as severe drawbacks to its potential usefulness. First, there is the issue of speed: the system is slow in that it typically takes several thousand iterations before the Lyapunov exponent output value settles down. A neural network can do the same task with far less iterations. (Clearly with fast 83

3.5 Summary and Conclusions

modern computers, the overall computation may still take less than a second.) Another drawback is that the system cannot be easily generalised to anything more complicated than 2-bit patterns. Again, traditional pattern classification techniques cope easily with more complex patterns. A more serious criticism, and one which can be aimed at many applications of chaos, is that our pattern classification system is a Rube Goldberg device, i.e. it is performing a simple task in a very roundabout way. In response to these criticisms, we emphasise that we view our chaotic pattern classification system as a motivating example as to what can be accomplished using the diverse properties of chaotic systems, and as a catalyst for the development of a systematic approach to chaos applications.

84

Chapter 4

Synthesis of Chaotic Maps with Arbitrary Invariant Densities ...although the exquisite fine structure of the chaotic regime is mathematically fascinating, it is irrelevant for most practical purposes. What seems called for is some stochastic description of the dynamics. Robert May, 1976

4.1

Introduction

In this chapter, we present a new method for synthesizing chaotic maps with arbitrary piecewise-constant invariant densities. This result, and its extension to switched systems and higher dimensional maps, is the main contribution of this thesis, and has been published [Rogers et al., 2004]. We take synthesis to mean the creation of a system with some set of desired properties through a mechanical or algorithmic procedure. In this case, we will be creating chaotic maps with prescribed statistical properties. 85

4.1 Introduction

0

Invariant Density ρ

(b)

Invariant Density ρ

(a)

0.2

0.4 0.6 x(n)

0.8

1.0

0

0.2

0.4 0.6 x(n)

0.8

1.0

Figure 4.1: Smooth and Fractal Invariant Densities of Logistic Map (a) r = 4 (b) r = 3.67 The invariant density (or invariant measure) describes where iterates end up on average, as the map is iterated. Most maps possess a single physically relevant invariant density ρ. For instance, the tent map has a constant invariant density of ρ(x) = 1, indicating that on average, iterates occur uniformly throughout the interval. Piecewise linear maps always have piecewise constant invariant densities. In continuous maps, such as the logistic map, ρ is usually a fractal, and so does not have a closed-form description. Special cases do exist though. For example, the invariant density of the logistic map xn+1 = rx(1 − x) with r = 4 can be shown (see Bai-Lin [Bai-Lin, 1989] for example) to be: 1 ρ(x) = p π x(1 − x)

(4.1)

Interestingly, equation 4.1 may be obtained from the tent map invariant density through a change of variables. 86

4.2 Background: The Inverse Frobenius-Perron Problem

The main tool used to analyse the statistical properties of chaotic maps is the Frobenius-Perron operator (FPO). The Inverse Frobenius-Perron problem (IFPP) is the technical name given to the synthesis problem: how to find a map that has a pre-specified invariant density. There are several approaches to this problem in the literature, and they will be discussed along with the FPO in the next section. The IFPP is interesting in both theoretical and practical ways. Theoretically, it is interesting that there is a method, indeed several methods, for controlling the invariant density of a map, and this is of practical use in areas such as the modelling of data (see [Boyarsky and Góra, 2002]). There are several key references on the FPO, the most notable being the book by Lasota and Mackey [Lasota and Mackey, 1994]. The background material in the following section is quite well known.

4.2

Background: The Inverse Frobenius-Perron Problem

Normally, when we iterate a chaotic map on computer starting from some initial condition x0 , the iterates fall chaotically on some attractor. It can be difficult to see the attractor if we just look at the time-series of the iterates. If we partition the state space into a series of bins, and count the fraction of iterates in each bin, a statistical picture of the chaotic attractor emerges. For almost all initial conditions, the same picture emerges: a unique invariant density ρ(x). It is true that for certain initial conditions (rational numbers, or extreme points) other invariant densities are possible, but for the maps we will be considering, there will be only one physically relevant invariant density. This density is stable if a small amount of noise is added to the system. In order to characterize the density mathematically, we consider an ensemble of initial conditions described by a probability density

87


function ρ0 (x) and observe how it changes as the entire ensemble is iterated. Eventually, the invariant density is reached, after, say n iterates. Further iteration of the ensemble of points just gives the invariant density ρ(x) each time. The collection of initial conditions has reached a stable fixed-point. We now define the Frobenius-Perron operator, P , which is a linear operator acting on distributions of points: ρn+1 = P ρn

4.2.1

(4.2)

The Frobenius-Perron equation

The invariant density is a fixed point of the Frobenius-Perron operator (FPO). More formally, consider the iterates of some one-dimensional map, f (x). An initial condition x0 will map to f (x0 ), and a delta-function distribution δ(x − x0 ) will map to δ(x − f (x0 )) after one iteration. Now, utilising the sifting property of delta functions, Z f (x)δ(x − x0 )dx = f (x0 )

(4.3)

we get the following relationship: Z δ(x − f (x0 )) =

(4.4)

1

0

δ(x − f (y))δ(y − x0 )dy

Now we simply replace the δ(y − x0 ) with the more general ρn (x), some arbitrary density after n iterations of f (x), to get the Frobenius-Perron equation:

Z ρn+1 (x) =

0

1

ρn (x)δ(x − f (y))dy

(4.5)

The invariant density ρ(x) is a fixed point of equation 4.5, and so we get: Z 1 ρ(x) = ρ(x)δ(x − f (y))dy (4.6) 0

The Frobenius-Perron equation governs the time evolution of some arbitrary distribution of initial conditions ρn (x) under some mapping f (x). We shall only be concerned with mappings on the unit interval (hence the limits of 88


integration in the above equations). Equation 4.5 is not particularly useful in itself. We now show how to recast it in a usable form.

4.2.2

The Frobenius-Perron operator in explicit form

Consider some one-dimensional map f acting on the unit interval, and an arbitrary subset of the state-space, A. Let ρn and ρn+1 be densities at time steps n and n + 1 respectively. From conservation of probability, we can write that:

Z

Z A

ρn+1 (x)dx =

A0

ρn (x)dx

(4.7)

The right-hand integral must consist of all those points mapped to A under one iteration of the map. Therefore A0 is the preimage of A under the mapping f . We denote the preimage of A as A0 = f −1 (A). Suppose A is an interval contained in [0, 1] of the form A = [a, x]. A may have many preimages under the mapping f . Equation 4.7 can be written as: Z x Z ρn+1 (y)dy = ρn (y)dy

(4.8)

f −1 ([a,x])

a

Now take the derivative with respect to x to get: Z d ρn+1 (x) = ρn (y)dy dx f −1 ([a,x])

(4.9)

Finally, the invariant density will be a fixed point of this equation, so we drop the subscripts to get: d P ρ(x) = dx

Z ρ(y)dy

(4.10)

f −1 ([a,x])

Equation 4.10 is the most common form of the FPO. For example, to find the invariant density of the logistic map f (x) = 4x(1 − x), we first find the preimages of the interval [0, x]. These can be easily found to be f −1 ([0, x]) = [0,

1 1√ 1 1√ − 1 − x] ∪ [ + 1 − x, 1] 2 2 2 2

Equation 4.10 then becomes: Z 1/2−1/2√1−x Z 1 d d P ρ(x) = ρ(y)dy + ρ(y)dy dx 0 dx 1/2+1/2√1−x 89

(4.11)


Leibniz’s rule is used to evaluate the integrals in equation 4.11: Z v(x) d dv du f (t)dt = f (v(x)) − f (u(x)) dx u(x) dx dx A simple calculation then gives us: 1 1 1√ 1 1√ P ρ(x) = √ ρ( − 1 − x) + ρ( + 1 − x) 2 2 4 1−x 2 2

(4.12)

Notice that this is a functional equation, and was famously solved by Ulam and von Neumann in 1947 [Ulam and von Neumann, 1947]. 1 ρ(x) = p π x(1 − x) There is no general analytic method for solving such equations, although the invariant densities of some of the other well-known chaotic maps have been determined. There is a useful alternative representation of the FPO for one-dimensional piecewise monotonic functions which is often cited in the literature. If we let χσ (y) = fσ−1 (y) be the k preimages of y under f , with σ = 1 . . . k, then equation 4.7 can be written as: Z b XZ ρn+1 (x)dx = a

χσ (b)

χσ (a)

σ

ρn (x)dx

(4.13)

In the right hand integral, we make the substitution x → χσ (y). We must find the derivative of χσ (y) by using the rule for derivatives of inverse functions: χ0σ (y) = (fσ−1 )0 (y) =

1 f 0 (fσ−1 (y))

Equation 4.13 becomes Z b XZ ρn+1 (x)dx = a

σ

a

b

=

1 f 0 (χσ (y))

ρn (χσ (y))dy f 0 (χσ (y))

(4.14)

(4.15)

Since f 0 (χσ (y)) is a constant and as [a, b] is an arbitrary interval, the integrands on both sides must be equal, allowing us to write: ρn+1 (y) =

X ρn (χσ (y)) |f 0 (χσ (y))| σ 90

(4.16)


Finally, we rewrite equation 4.16 in operator form, and slightly simpler notation:

X

P ρ(y) =

x∈f −1 (y)

ρ(x) |f 0 (x)|

(4.17)

Equation 4.17 can be used to give functional equations for the invariant densities of piecewise linear maps.

4.2.3

The Inverse Frobenius-Perron Problem and the FPO as a Markov operator

We saw in the previous section how the FPO gives rise to functional equations which must be solved for ρ(x). This is a difficult (if not impossible) problem for arbitrary continuous maps: Firstly, the invariant density1 may be a fractal or Cantor set, in which case the intervals concerned have measure zero2 ; secondly, it may not be possible to solve the resulting functional equations, assuming the invariant measure is continuous. So the inverse problem, of choosing an arbitrary invariant measure, and finding which map gives rise to it, must seem like quite an impossible task. All is not lost though. There are two main approaches to this inverse Frobenius-Perron problem (IFPP) in the literature. The first method uses a conjugate function approach, the second is based on approximation of the FPO by a Markov matrix. The conjugate function approach, which was first described by Grossman and Thomae [Grossmann and Thomae, 1977], makes use of the following equivalence relation between two mappings: The maps f : I → I and g : J → J on intervals I and J are conjugate if there exists a one-to-one map 1

We assume that there is a single physically relevant invariant density, or natural invari-

ant density [Ott, 2002], which is stable in the presence of weak random noise. There are infinitely many invariant densities, but they are not physically relevant [Schuster, 1989]. 2 The concept of sets of measure zero comes from Lebesgue integration, and means that they are negligible sets that can be ignored, as they can be enclosed within an arbitrarily small interval [Strichartz, 2000].

91

4.2 Background: The Inverse Frobenius-Perron Problem onto

h : I → J such that g(x) = h(f [h−1 (x)])

(4.18)

The conjugating function h, assumed to be continuous and sufficiently smooth, establishes a one-to-one correspondence between the iterates of the two maps f and g. The invariant densities of g and h are related as follows: ρg (x) = ρf [h−1 (x)]|

dh−1 (x) | dx

(4.19)

Numerous examples are given in the paper by Grossman and Thomae. Also, this approach can be used to find the invariant density of the logistic map. It is conjugate to the tent map through the conjugating function h(x) = sin2 ( πx 2 ) (see [Ott, 2002] for details). The invariant density of the tent map is constant and equals one, and so Equation 4.19 reduces to ρg (x) = |

dh−1 (x) | dx

(4.20)

The inverse function h−1 can be shown to be h−1 (x) =

√ 2 sin−1 x π

(4.21)

We can find the derivative, and thus the required invariant density, as follows: d ρg (x) = dx

µ

¶ 2 1 −1 √ sin x = p π π x(1 − x)

(4.22)

Figure 4.2 illustrates the two conjugate maps, the conjugating function h(x) = sin2 ( πx 2 ), and the invariant densities of the logistic and tent maps. The Markov matrix approach, upon which our new results are based, was first suggested by Ulam [Ulam, 1960]. The FPO is a Markov operator, in the sense that the density at step n + 1 is only a function of the density at step n(see [Luenberger, 1979]). Ulam suggested that the state-space (the unit interval in all of our work) be arbitrarily partitioned into N subintervals, I1 , . . . , IN . Then define a probability vector at step n: Z Z Pn = { ρn (x)dx, . . . , ρn (x)dx} I1

IN

92

(4.23)


1

1

0.8

0.8

1.6

0.6

0.6

1.4

0.4

0.4

0.2

0.2

1.8

1.2 1 0.8

0

0 0

0.5 x

1

0

0.5 x

1

0.6 0

0.5 x

1

Figure 4.2: (i) Logistic and Tent maps (ii) Conjugating Function (iii) Invariant Densities of both maps Now introduce an N × N transition matrix, W , which gives the probabilities of iterates moving from any subinterval to any other subinterval in the partition. Ulam hypothesized that the FPO could now be approximated by the following matrix equation: Pn+1 = W Pn

(4.24)

It is clear that as N → ∞, equation 4.24 gives a better and better approximation of the FPO. Ulam’s hypothesis was later proved by Li [Li, 1976]. It is remarkable that the statistical properties of chaotic systems can be represented by such a simple linear equation, allowing us to bring many of the results of positive matrix theory to bear on the problem.

4.2.4

Other work on the IFPP and applications

There has been a surprisingly large amount of work done on controlling the statistical properties of 1-D maps. Apart from the seminal work of Grossman and Thomae cited earlier, we mention the paper of Baranovsky and Daems [Baranovsky and Daems, 1995], in which piecewise linear Markov maps are used as references whose statistical properties are known. These maps are then transformed into non-Markov maps and smooth maps, using conju93

4.3 A useful Matrix from TCP congestion control

gating functions. They also consider the problem of designing maps with prescribed correlation functions. Pingel, Schmelcher and Diakonos [Pingel et al., 1999] manage to solve the Frobenius-Perron equation exactly for a class of unimodal chaotic maps, whose invariant densities are members of a class of beta distributions. By varying parameters in the maps which control symmetry and pointedness, they can obtain a variety of different invariant densities. Interestingly, the logistic map is a member of their class of maps. This group subsequently developed a monte-carlo approach, based on their class of parameterized maps, for generating maps with desired invariant densities and correlation functions [Diakonos et al., 1999]. In a series of papers, the group led by Setti have studied the Markov approach to the IFPP with a view to applying it to signal processing tasks (see especially [Setti et al., 2002] and the copious references therein). They consider a variety of piecewise linear maps including n − way Bernoulli shifts and develop a matrix-tensor formulation for quantifying high-order correlations of such maps. In [Setti et al., 2002], they also discuss applying chaos to help with EMC (electromagnetic compatibility) issues, and they discuss the use of chaos in spread-spectrum communication schemes. Another active area of research is the use of chaotic maps to model packet traffic in computer networks. Packet traffic is notoriously bursty, and chaotic maps are ideal for modelling the fractal properties of the traffic. Mondragon neatly summarizes the previous work in this area in [Mondragon, 1999], and introduces some different types of intermittency maps along with a discussion of the statistical properties of these maps.

4.3

A useful Matrix from TCP congestion control

The internet is only able to function the way it does because certain protocols have been widely accepted, allowing computers to talk to each other. 94


One of the most important of these protocols is the Transmission Control Protocol (TCP) which defines how packets of information should be sent between computers. Ideally, one would like to send information as fast as possible, with no errors, or lost packets. In reality, bandwidth is limited, there are delays in propagation due to routers and physical limitations, and sometimes packets are lost. Plus, there’s a whole host of other computers all competing for the same bandwidth. The purpose of TCP is to impose some order on this struggle for bandwidth, so that resources are allocated in a well-defined manner. Clearly, the performance of the internet affects everyone nowadays, and so a rigorous analysis of TCP-type protocols could have far-reaching benefits. Very fortuitously, a recent analysis of TCP in synchronised communication networks [Shorten et al., 2003], [Berman et al., 2004] gave rise to a positive matrix with special properties that allows us to solve the IFPP in a novel and elegant way. We outline that analysis here. More details on TCP can be found in any standard text on computer networks, such as the book by Tanenbaum [Tanenbaum, 2002]. A network in this context consists of data sources and sinks connected together with links and routers. In their paper, Shorten and co-workers assume that there are n sources competing for some finite bandwidth, and all these sources are operating the TCP congestion control algorithm in the presence of a drop-tail buffer bottleneck. (A buffer stores a certain finite number of packets. When the buffer is full, any new packets will drop of the end of the buffer, hence the name.) Further, they assume that when the network is congested, each source experiences a packet drop, and that each source has the same round-trip time (RTT). TCP is specifically designed to cope with unreliable networks, and includes in its definition a variable quantity called the congestion window (cwnd), which reflects the number of bytes each sender may transmit. Each source also starts a timer each time a (data) segment is sent. If an acknowledgment is received before

95


congestion window size

wi kth congestion epoch

wi(k+1)

wi(k)

αi(tc(k)-ta(k))

βwi(k)

ta(k)

tb(k) tc(k)

Time (RTT)

Figure 4.3: Evolution of Congestion Window the timer elapses, the source will increase the size of its congestion window. In other words, a source will gradually increase the amount of data it sends, if it seems that the network can bear it. However, if a segment goes unacknowledged, then the source will rapidly reduce the amount of data sent because it assumes the network is congested (the buffer is full). This is known as an Additive-Increase Multiplicative-Decrease (AIMD) Algorithm. Following the notation in [Shorten et al., 2005], we let wi (k) be the congestion window size of source i just before the k th congestion event. (A congestion event is a point in time at which packets are dropped from the buffer because it is full.) Define αi as the additive-increase amount for source i in the event of a successful acknowledgment, and define βi as the multiplicative-decrease parameter in the event of congestion. Also define the event times ta (k), tb (k) and tc (k) during each congestion epoch as, respectively, the time at which the number of unacknowledged packets equals βi wi (k); the time at which the pipe is full; and the time at which packet

96


drops are detected by the sources. The typical evolution of a congestion window is shown schematically in Figure 4.3. From the diagram, using simple geometry, we find that wi (k + 1) = βi wi (k) + αi [tc (k) − ta (k)]

(4.25)

If the maximum possible number of packets in transit in the network is denoted P , then at a congestion event, we will have n X

wi (k) = P +

i=1

n X

n X

αi =

i=1

wi (k + 1)

(4.26)

i=1

Rearranging equation 4.25, summing over the i0 s, and using equation 4.26, we can show that Pn

(1 − βi )wi (k) i=1P n i=1 αi

tc (k) − ta (k) =

(4.27)

Plug equation 4.27 back into equation 4.25 to get our final result, a dynamical update equation in wi : wi (k + 1) = βi wi (k) +

αi

Pn

(1 − βi )wi (k) i=1 Pn i=1 αi

(4.28)

The dynamics of the network as described by equation 4.28 can be written in matrix form: W (k + 1) = AW (k)

(4.29)

where W (k) is just a vector of the wi (k). The matrix A has the form:  β1 0 · · ·    0 β2 0 A=  .. .. . . 0  0

0

···

 0

  0  + Pn1  i=1 αi 0  βn



 α1

   ³  α2    1 − β 1 − β ··· 1 2  ..   .    αn

´ 1 − βn

(4.30) The matrix A has many interesting properties which we outline here: 1. Matrix A is column stochastic (which means each column sums to 1). 97

4.4 Synthesizing 1-D maps with arbitrary piecewise-constant invariant densities 2. The matrix is a positive matrix (all entries are positive real numbers). 3. The matrix has a single dominant eigenvalue of value 1. 4. There is a single eigenvector of A in the positive orthant called the Perron eigenvector, corresponding to the dominant eigenvalue, whose value is given by: xTP = [

α1 α2 αn , ,..., ] 1 − β1 1 − β2 1 − βn

(4.31)

5. If the eigenvalues (λi ) and the βi are arranged in decreasing order, then the following interlacing scheme holds 1 = λ1 > β1 ≥ λ2 ≥ β2 ≥ . . . ≥ λn ≥ βn

(4.32)

Luenberger [Luenberger, 1979] gives a nice account of Markov operators, positive stochastic matrices and the theorem of Frobenius-Perron. The interlacing result and the form of the Perron eigenvector are given in the paper by Wirth [Wirth et al., 2005], and are based on standard results on the symmetric eigenvalue problem (see for example [Golub and van Loan, 1996], or [Horn and Johnson, 1985]).

4.4

Synthesizing 1-D maps with arbitrary piecewiseconstant invariant densities

Synthesis of something implies that the procedure is mechanical and repetitive, ideal for computer implementation. Here we introduce our synthesis procedure for 1-D chaotic maps. We are trying to create maps in which we can specify in advance where the chaotic trajectory will be concentrated in the state-space. The invariant densities that our family of maps will have are piecewise-constant, but this is not a limitation, as we can approximate a continuous function as closely as desired by using a large enough matrix. 98

4.4 Synthesizing 1-D maps with arbitrary piecewise-constant invariant densities

The key to the synthesis method lies in the properties of the transition matrix (A matrix). Firstly, the A matrix is positive and column stochastic, and can therefore represent a Markov process. A Markov matrix describes the transition probabilities from some state to another state. We will be using the A matrix to describe the transition probabilities between intervals in a partition. At the heart of our synthesis approach is the Perron eigenvector of A: that it is parameterized in terms of the αi and βi unlocks the Inverse Frobenius-Perron Problem. Ulam’s conjecture was that the principle eigenvector of a Markov process is the invariant density of that process, and that transformations on the interval could be approximated using this matrix approach. We have a way of choosing our invariant density first, and automatically determining the Markov process that gave rise to it. The Markov process can then easily be turned into a 1-D map.

4.4.1

Synthesis Procedure

Suppose that the desired invariant density (Perron eigenvector) xd is: xTd = [δ1 , δ2 , . . . , δn ]

(4.33)

Choose the βi subject to the constraint: 0 < βi < 1. (The βi control how rapidly the map converges on the invariant density.) Often, we find it convenient to keep all the βi equal. Having chosen the βi , determine the αi as follows: α1 = δ1 (1 − β1 ) α2 = δ2 (1 − β2 ) .. . αn = δn (1 − βn ) 99


Now form the matrix A from the αi and βi :     α1 β1 0 · · · 0        ³  α2   0 β2 0 0 1    1 − β 1 − β ···  P + A= . n 1 2   ..  .. α i   .   .. i=1 . 0 0    0

0

···

βn

´ 1 − βn

αn

Next, we let the A matrix represent a 1-D map on the unit interval to itself. We partition the unit interval into n equal subintervals, {I1 , . . . , In } (assuming A is an n × n matrix). Note that the partition can also be nonuniform (to be illustrated later). Let entry aji of A denote the probability of a transition from subinterval Ii to Ij . (The order of the subscripts is important here.) To construct the map, place a line segment of slope ±1/aji in the square defined by the subintervals Ii , Ij , as illustrated in Figure 4.4. By controlling the slope of the line segment, we can control how much of the overall subinterval will interact with that portion of the map, which in turn relates to the transition probabilities. The most straightforward way of constructing the map is to start at the origin, and add the line segments end to end. In the figure, it can be seen how the probability of a transition from I1 to Ij is 0.5, corresponding to a line of slope 2 in that region. In other words, 50% of the points in I1 are mapped to Ij . The slope could have been -2, and it could have been positioned anywhere within the square defined by those subintervals. This would affect the actual trajectory of the chaotic iterates, but it would not change the invariant density. A positive slope maps the extremities of an interval to the corresponding extremity (e.g inf(Ii ) → inf(Ij )), while a negative slope maps each extremity to its opposite (e.g. inf(Ii ) → sup(Ij )).

100


xn+1 1 In . . . Ij . . . I1 1 0

I1

...

Ij

...

In

xn

Figure 4.4: Illustration of the construction of a 1-D map from a Markov matrix

101


4.4.2

Examples

We now give a mixture of examples of the synthesis procedure in action, using simple MATLAB code. The values of the βi are the same in each example. (a) Invariant density xd = [1, 2, 3]. We let all the βi = 0.1 for convenience. The values of αi are computed to be [0.9, 1.8, 2.7]. The transition matrix A is then found to be:

  0.25 0.15 0.15     A =  0.3 0.4 0.3    0.45 0.45 0.55

(4.34)

A is clearly column stochastic, and has eigenvalues of [1, 0.1, 0.1], which we could have deduced from the interlacing property mentioned earlier. Figure 4.5 shows the one-dimensional map corresponding to matrix A and constructed in the manner outlined above. Figure 4.6 is the invariant density of the map after 20000 iterations. The y − axis has been scaled to allow ready comparison with xd . A typical chaotic time-series from the map is shown in Figure 4.7. (b) The invariant density of the synthesized map has the shape of a sine-wave, but with an offset so that the values are all positive (see Figure 4.9. The unit-interval is partitioned into 13 subintervals, giving a 13 × 13 transition matrix. (c) In this example, we illustrate how a synthesized map could generate random numbers with useful statistics. The invariant density of the synthesized map has the shape of a normal (Gaussian) distribution (see Figure 4.11). The transition matrix used in this example is 121×121. The points of the time-series are shown in Figure 4.12, with the maximum density clearly in the mid-point of the unit-interval.

102


Figure 4.5: One-dimensional chaotic map with partition on unit-interval shown

103


Figure 4.6: Invariant density of map in Figure 4.5

Figure 4.7: Time-series of chaotic map in Figure 4.5

104


Figure 4.8: One-dimensional chaotic map with sinusoidal invariant density

Figure 4.9: Invariant density of map in Figure 4.8 105


Figure 4.10: Time-series of chaotic map in Figure 4.8

106


Figure 4.11: Gaussian-shaped invariant density generated from a 121 × 121 transition matrix

107


Figure 4.12: Time-series of chaotic map from example (c), corresponding to invariant density in Figure 4.11

108


Initial condition

Perron Eigenvector Figure 4.13: Illustration of a vector evolving towards the Perron eigenvector

4.4.3

The role of the β’s

In the examples above, the values of the βi parameters were just taken to be some small value without any further explanation. If matrix A represents a linear system xn+1 = Axn , then the βi control how quickly the map converges to the equilibrium state upon iteration. Say we order the eigenvalues of the A matrix: [1, λ2 , λ3 , . . . , λn ], and let the corresponding normalized eigenvectors be [V1 , V2 , . . . , Vn ]. Any initial condition x0 can be written in terms of the eigenvectors: x0 = γ1 V1 + γ2 V2 + . . . + γn Vn

(4.35)

Thus after k iterations we get Ak x0 = γ1 V1 + γ2 λk2 V2 + . . . + γn λkn Vn

(4.36)

Since the eigenvalues are interlaced with the βi , and all of the βi < 1, it is clear from Equation 4.36 that all of the terms except the first will rapidly decay to zero. The equilibrium state is thus the Perron eigenvector, 109


as expected. The second largest eigenvalue imposes an upper bound on how fast the system approaches the equilibrium state (see Figure 4.13). The βi play a different role when matrix A represents a map. Before we start iterating the map, the density is zero, and so the initial condition is at the origin. As the map is iterated, the density evolves along the Perron eigenvector. The invariant density is a statistical concept: we do not expect it to exactly equal the desired value at all times. If iterates spend long times in each subinterval before being mapped to another subinterval, then the invariant density can diverge markedly from the desired value at times. We find that it is the values of the βi that determine the length of time iterates spend in each subinterval. We now explore the mechanism giving rise to this. 1. Large values of β If we assume that all of the βi are equal and close to 1, then we find that the diagonal values of the A matrix are approximately equal to the βi , and the off-diagonal entries are all very small, due to the 1 − βi factors. In terms of transitions between subintervals, the probability of an iterate staying in the same subinterval is very high. Iterates are mapped to other subintervals very infrequently. Looked at dynamically, an n × n matrix will have n slightly unstable fixed points, or repellors (see Figure 4.15 (a)). Iterates near these fixed points move away very slowly. This is also evident in the state-space plot (see Figure 4.15 (b)). 2. Small values of β If the βi are small, then the transition matrix is dominated by the αi terms. The precise values of the matrix entries are of course dependent on the desired invariant density, but it would be unusual to encounter a situation where iterates spent most of their time in one subinterval. The values of the slopes in the 1-D map tend to be large, and so the 110


Figure 4.14: Detailed plot of a 1-D map with large β values: trajectories become trapped in subintervals and transitions occur infrequently. repelling fixed points are usually very unstable. 3. Various values of β It is possible to engineer a situation where some subintervals contain weakly repelling fixed points, and others have strongly repelling fixed points, just by choosing a large or small value of β for that subinterval. While the 1-D maps obtained using large values of β still give rise to chaotic trajectories, these trajectories have long laminar regions, reminiscent of intermittency (Figure 4.14). Two initially close points may remain close for a long time.

111


Weak repellors

Strong repellors

Figure 4.15: (a), (b)1-D map and state-space plot with βi = 0.9 ; (c), (d) βi = 0.1

112


4.4.4

Lyapunov Exponents

The Lyapunov exponent of a 1-D map gives the average rate of divergence of trajectories over the attractor. It is particularly straightforward to calculate for piecewise linear maps because it is related to the slopes of the map segments. For a 1-D map the Lyapunov exponent is defined as follows, using the chain rule in line 2: ¯ ¯ 1 ¯¯ df N (x0 ) ¯¯ λ = lim ln ¯ N →∞ N dx0 ¯ ¯N −1 ¯ ¯ 1 ¯¯ Y 0 ¯ = lim ln ¯ f (xi )¯ ¯ ¯ N →∞ N =

lim

N →∞

1 N

i=0 N −1 X

ln |f 0 (Xn )|

(4.37)

n=0

Essentially, it’s the sum of the logs of the slopes averaged over the attractor. There are methods for computing the exponent numerically, but in this case, we know the slopes of the map, and the density of points in each subinterval, so it is possible to derive an analytic expression for the exponent. For the 2 × 2 case, if we assume β1 = β2 , then the invariant density ρ = {α1 , α2 }, and α1 + α2 = 1. If we denote the entries of the transition matrix aij , then the Lyapunov exponent can be written as: λ = α1 a11 ln

1 1 1 1 + α1 a21 ln + α2 a12 ln + α2 a22 ln a11 a21 a12 a22

(4.38)

This is a direct application of equation 4.37, except that the density of points in each subinterval is known a priori (given by the αi aij terms), so we do not need to take an average over the attractor. Substituting in for

113


the values of the aij , we get: ½ ¾ 1 λ = α1 β1 + α1 (1 − β1 ) ln β1 + α1 (1 − β1 ) ½ ¾ 1 +α1 α2 (1 − β1 ) ln β1 + α2 (1 − β1 ) ¾ ½ 1 +α2 β1 + α2 (1 − β1 ) ln β1 + α1 (1 − β1 ) ½ ¾ 1 +α2 α2 (1 − β1 ) ln β1 + α1 (1 − β1 )

(4.39)

If we let β1 → 0 (a reasonable assumption), then we get a rather nice looking expression for λ: λ = α12 ln

1 1 1 + α22 ln + α1 α2 ln α1 α2 α1 α2

(4.40)

For a 3 × 3 matrix, the corresponding expression is: 1 1 1 + α22 ln + α32 ln + α1 α2 α3 1 1 1 α1 α2 ln + α2 α3 ln + α3 α1 ln α1 α2 α2 α3 α3 α1

λ = α12 ln

(4.41)

Generalising to an n × n matrix, the Lyapunov exponent would be given by: λ=

n X i=1

αi2 ln

n X Y 1 1 αi αj ln + αi αi αj

(4.42)

i6=j i,j=1

Figure 4.16 shows how the Lyapunov exponent varies with β for a 2 × 2 map with α1 = α2 = 0.5. As expected, the maximum value of λ occurs as β → 0, and equals ln 2.

4.4.5

Nonuniform Partitions

The method may be easily modified to generate maps on nonuniform Markov partitions of the unit interval. Indeed, the synthesis method remains unchanged, as the transition matrix concerns only transitions between states. The states may be defined arbitarily, and for one-dimensional maps, we may define the partition arbitrarily. The only difference is in the construction of 114


0.7 0.6 0.5

λ

0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

β

0.6

0.7

0.8

0.9

1

Figure 4.16: Variation in Lyapunov Exponent with β, and α1 = α2 = 0.5 the one-dimensional map, as illustrated in Figure 4.17. The unit interval is partitioned into subintervals of width w1 , w2 , . . . , wn . Each of these subintervals, wi is then further partitioned according to the entries of column i of the transition matrix A. The column entries determine the widths of the piecewise linear elements of the map. For a transition from interval i to interval j, the slope of that element of the map, mji , is given by: mji = ±

1 wj · aji wi

(4.43)

This reduces to mji = ±a−1 ji when the partition is uniform, as we saw previously.

In Figures 4.18 and 4.19, we show an example of the synthesis

of a one-dimensional map with ρ = [1, 2, 3, 2, 1] and with interval widths wi = [0.4, 0.2, 0.1, 0.2, 0.1]. The only noteworthy effect of the nonuniform partition is that the map may no longer be everywhere expanding.

115


Xn+1

w2

a11w1

a12w2 a21w1

a22w2

w1

w1

w2

Figure 4.17: Construction of 1-D map with a nonuniform partition

116

xn


1 0.9

0.8

0.7

xn+1

0.6

0.5

0.4

0.3

0.2

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

xn

Figure 4.18: Synthesized non-uniform map 1 0.9 0.8 0.7

x

n

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.5

1

1.5 iterations

2

2.5

3 4

x 10

Figure 4.19: Time-series of map, with different interval densities visible 117

4.5 Results on switching between maps

4.5

Results on switching between maps

An interesting extension of the above work is to consider what happens when we switch randomly between some set of 1-D maps at each iteration of the process. This question has been considered by Boyarsky and Gora in cite here. Indeed it is of physical interest, as they use this process to model the famous two-slit experiment in quantum physics [Boyarsky and Góra, 1992]. It is also interesting that a whole series of invariant densities may be produced by switching between a set of fixed maps, acting almost like basis functions. Theorem 4.1 Let A(k) ∈ A1 , A2 and let pk = p(A(k)) being chosen from the set. Assume the values of the βi to be the same for both maps. Let ρ1 and ρ2 be the invariant densities of the two maps. If we choose either A1 or A2 randomly (i.i.d.) at each step of an iterative process with fixed probabilities p1 and p2 respectively, then the invariant density of the resultant orbit, ρ is given by ρ = p1 ρ1 + p2 ρ2 . Proof: 

 β1

  0 A1 =   .. .  0

0

β2 0

0 .. .

0

···

      0   α2  ³  +   1 − β 1 − β ··· 1 2   ..    0   .  βn αn

0

···

0

β2 0

0 .. .

0

···

α1



  0 A2 =   .. .  0



···

 β1



0



´ 1 − βn

 α ˆ1

     ³ ˆ  0  α  +  2 1 − β 1 − β · · · 1 2   ..   .  0    α ˆn βn

´ 1 − βn

It is well-known that the expected value of the transition matrix which results when switching randomly between two matrices is given by A0 = E(Πk ) = p1 A1 + p2 A2 118


By simple substitution we can show that     β1 0 · · · 0 p1 α1 + p2 α ˆ1          0 β2 0 0   p1 α2 + p2 α ˆ2  ³  1 − β 1 − β ···    A0 =  . 1 2  + .. ..   ..   . 0 . 0     0

0

···

βn

´ 1 − βn

p 1 αn + p 2 α ˆn

A0 thus has a perron eigenvector of xTP = [

p 1 α1 + p 2 α ˆ 1 p 1 α2 + p 2 α ˆ2 p1 αn + p2 α ˆn , ,..., ] 1 − β1 1 − β2 1 − βn

(4.44) ¤

Theorem 4.1 shows that when the values of β are the same in each matrix, the overall invariant density is just a weighted sum of the invariant densities of the original maps. This result also holds when switching between any number of maps. We now look at a more general case where the values of βi are not necessarily the same in each matrix. Theorem 4.2 Let A(k) ∈ A1 , A2 , . . . , Am and let pk = p(A(k)) being chosen from the set. Let Πk = A(k)A(k − 1) . . . A(1). If A(k) represents a chaotic map, then the expected invariant density obtained by switching randomly between the A(k) is given by the Perron eigenvector of the following matrix: B = p1 A1 + p2 A2 + . . . + pm Am . Proof: The expected value, E(Πk ) = (p1 A1 + p2 A2 + . . . + pm Am )k = B k for any k. Given a stochastic matrix P , there exists a unique probability vector pˆ > 0 such that P pˆ = pˆ. Let x0 be some initial condition. We then have that the eigenvector pˆ = lim P k x0 k→∞

(4.45)

If P represents a map, then pˆ is the invariant density of that map (Ulam’s conjecture). As long as B is a stochastic matrix (easy to show), the result follows. 119


¤ Unfortunately, Theorem 4.2 does not give us a closed form expression for the expected invariant density. It is possible however, to hack out some expressions when the number and dimension of the matrices is small. 1. Switching between two 2 × 2 matrices As before, we let the probability of choosing A1 be p1 and that of choosing A2 be p2 , where p1 + p2 = 1. It is assumed that the values of βi are different for both matrices. The expected value of the overall transition matrix resulting from the switching is: B = p1 A1 + p2 A2      ´ β1 0 α1 ³  +   1 − β1 1 − β2  = p1  0 β2 α2      ´ β¯1 0 α ¯1 ³  +   1 − β¯1 1 − β¯2  +p2  0 β¯2 α ¯2 To find the Perron eigenvector (x, y)T , we solve the following matrix equation, where the terms a . . . d represent the complicated expressions when the above equation is multiplied out:      a b x x    =   c d y y

(4.46)

Solving equation 4.46, arbitrarily assuming y = 1, we find that the Perron eigenvector has the following form:     p1 α1 (1 − β2 ) + p2 α ¯ 1 (1 − β¯2 ) x ¯ 2 (1 − β¯1 )]   =   p1 α2 (1 − β1 ) + p2 α y 1

(4.47)

2. Switching between three 2 × 2 matrices If we have three matrices A1 , A2 , A3 with asssociated probabilities

120


p1 , p2 , and p3 , it is straightforward to show that the Perron eigenvector is as follows, again assuming that y = 1:     p1 α1 (1 − β2 ) + p2 α ˆ 1 (1 − βˆ2 ) + p3 α ¯ 1 (1 − β¯2 ) x  = ˆ 2 (1 − βˆ1 ) + p3 α ¯ 2 (1 − β¯1 )   p1 α2 (1 − β1 ) + p2 α  y 1

(4.48)

Comparing this with equation 4.47, a pattern emerges, allowing us to write down a general expression for switching between N different 2×2 matrices:



N X



 pi α1i (1 − β2i )        i=1   N x    = X  pi α2i (1 − β1i )    y   i=1   1

(4.49)

3. Switching between two 3 × 3 matrices As the matrices get bigger, the calculations become more laborious. For the 3 × 3 case, we find the Perron eigenvector (x, y, z)T to have the following form:   ˆ 2 + p1 p2 (1 − β)(1 − β)(α ˆ 1+α p21 α1 (1 − β)2 + p22 α ˆ 1 (1 − β) ˆ1)   2 ˆ 2 + p1 p2 (1 − β)(1 − β)(α ˆ 3+α p α (1 − β)2 + p22 α ˆ 3 (1 − β) ˆ3)      1 3    x       2 2 2 2    p α2 (1 − β) + p α ˆ ˆ ˆ2)  2 ˆ 2 (1 − β) + p1 p2 (1 − β)(1 − β)(α2 + α y  =  1     p2 α (1 − β)2 + p2 α ˆ2 ˆ ˆ3)  1 3 2 ˆ 3 (1 − β) + p1 p2 (1 − β)(1 − β)(α3 + α   z       1 (4.50) Here, we have assumed z = 1 to fix the vector, and we have also assumed β1 = β2 = β3 = β for matrix A1 , and for matrix A2 all ˆ There is a pleasing symmetry to the of the β values are equal to β. expressions, although we haven’t been able to write it in a simpler form.

121


Another interesting case is what is the resulting invariant density when we switch periodically between two of the synthesized chaotic maps. It turns out that it doesn’t matter what order the maps are iterated in, periodic switching and random switching lead to the same result. We will have cause to use this result in Chapter 5, and outline the proof here. Theorem 4.3 Let A1 and A2 be two transition matrices with corresponding chaotic maps f1 and f2 . Let the transition matrices be rank 1 matrices (or close to rank 1 matrices), e.g. βi → 0. Let f1 and f2 possess invariant densities of ρ1 and ρ2 respectively. Suppose we iterate, switching periodically between maps f1 and f2 . If t1 and t2 are the fractions of time spent iterating maps f1 and f2 respectively, then the resultant invariant density is given by ρ = t1 ρ1 + t2 ρ2 . Outline Proof: We consider two different situations. First, iterate map f1 N times, and then iterate map f2 N times, so that the overall period is 2N . Starting from an initial condition x0 , we have: (2)

(N )

x0 → f1 (x0 ) → f1 (x0 ) → · · · → f1

(x0 )

The N iterates will approach the invariant density ρ1 asymptotically for large N , by definition. Now switch to map f2 and iterate for N iterations: (N )

f1

(2)

(N )

(x0 ) = x00 → f1 (x00 ) → f1 (x00 ) → · · · → f1

(x00 )

The N iterates of f2 will approach the invariant density ρ2 , but considering all 2N iterates together, the invariant density is ρ = (ρ1 + ρ2 )/2. This is clearly true for large N but what happens if N is small? Let N = 1, and consider the iterates of f1 as supplying initial conditions for f2 and vice-versa. An ensemble of initial conditions, when mapped under f1 , will have an invariant density ρ1 , and similarly f2 will have an invariant density ρ2 . The initial conditions will not be uniformly distributed across the interval [0, 1], but will be piecewise constant across the subintervals of 122

4.6 Comparison with other methods

the Markov partition. (It is a standard result that piecewise affine maps have piecewise constant invariant densities.) In rank 1 matrices all of the columns are equal, and thus the slopes in each subinterval are the same. It makes no difference which subinterval the initial conditions are in, so long as they are uniformly distributed in that subinterval (see Figure 4.20). So, switching with N = 1 for, say, M periods, will lead to the ensemble of M initial conditions from map f1 being mapped under f2 resulting in an invariant density ρ2 , and similarly the other M initial conditions will result in an invariant density ρ1 . Taking all 2M iterates together, the invariant density is, once again, ρ = (ρ1 + ρ2 )/2. ¤ Clearly, this result holds for any type of periodic sequence between any number of maps, so long as they are rank 1 matrices (or close to rank 1 matrices). The result also implies that periodic switching is just a special case of random switching for these types of maps.

4.6

Comparison with other methods

As was mentioned earlier, there are several approaches to the IFPP described in the literature. They generally fall under three main headings: 1. Integration of the FPO Examples of this approach can be seen in the work of Pingel and the work of Kohda ([Pingel et al., 1999], [Kohda, 2002]). Usually, the map and the density are assumed to have a certain form, or belong to a class of functions. For certain cases, such as unimodal maps, this yields closed form solutions to the Frobenius-Perron equation. 2. Conjugate-function approach In this approach, which was suggested by Ulam [Ulam, 1960], one 123


xn+1

xn 1

Distribution of initial conditions

0

xn 1

2 Subintervals

3

Figure 4.20: Piecewise constant distribution of initial conditions being applied to chaotic map derived from a rank 1 transition matrix

124


tries to find a known invariant density (belonging to a known map) which can be conjugated (transformed via a simple function) to the desired invariant density. The required map can then be found via the conjugating function. This method is described in detail by Grossman and Thomae [Grossmann and Thomae, 1977]. 3. Matrix-based approach These methods rely on the Ulam conjecture. Indeed, Bollt [Bollt, 2000] calls this approach to the IFPP the Inverse Ulam problem (IUP). The only approach which is directly comparable to ours is the method of Gora and Boyarsky.

Their matrix method is outlined in their

1993 paper [Gora and Boyarsky, 1993] and more recently in their book [Boyarksy and Góra, 1997].

125


4.6.1

3-band matrix solution to the IFPP

In [Gora and Boyarsky, 1993], Gora and Boyarsky introduce a new class of piecewise linear transformation called a semi-Markov process, and a special matrix called a 3-band matrix. They go on to prove a number of theorems around these new structures, showing that given some piecewise constant density on intervals of a partition, it’s always possible to find a semi-Markov transformation that leaves the density invariant. We will show how they generate the 3-band matrix, and then compare their method with our own using some examples. Definition A semi-Markov piecewise-linear transformation, f , is a 3-band transformation if its transition matrix A = (aij ) satisfies: for any 1 ≤ i ≤ N ,aij = 0 if |i − j| > 1. Theorem 4.4 (Gora and Boyarksy) Let f be a 3-band transformation with transition matrix A = (aij ). Let ρ be any f −invariant density and ρi = ρ|Ri , i = 1, . . . , N , then for 2 ≤ i ≤ N we have: ai,i−1 · ρi = ai−1,i · ρi−1

(4.51)

A 3-band matrix is one in which all the entries are zero except for the main diagonal entries and the entries adjacent to the main diagonal on either side. In terms of transitions between intervals on a partition, points may only be mapped to adjacent intervals, or stay in the same interval. The transition matrix is not symmetric in general. Also, there exist infinitely many 3-band transformations which preserve a given density function. This is in contrast to our method where the transition matrix is unique, given the αi and βi . Equation 4.51 imposes a condition on the off-diagonal non-zero entries. Once this condition is satisfied, the rest of the entries may be chosen arbitrarily, ensuring that the matrix is column (or row) stochastic, of course.

126


Example: We will synthesize a chaotic map with the following invariant density: ρ =

5 16 (1, 8, 4, 2, 1)

first using the 3-band approach, and then using

our approach. Applying equation 4.51, we get the following conditions: 40/16 a21 = 5/16 a12 ⇒ 8a21 = a12 20/16 a32 = 40/16 a23 ⇒ a32 = 2a23 10/16 a43 = 20/16 a34 ⇒ a43 = 2a34 5/16 a54 = 10/16 a45 ⇒ a54 = 2a45 Now we arbitrarily choose the entries as follows: a21 = 0.1 ⇒ a12 = 0.8, a32 = 0.4 ⇒ a23 = 0.2, a43 = 0.4 ⇒ a34 = 0.2, a54 = 0.8 ⇒ a45 = 0.4. The transition matrix now looks like: 

 0.8

0

0

0

    0.1 0.2 0 0     A =  0 0.4 0.2 0      0 0 0.4 0.4   0 0 0 0.8 Next, we fill in the gaps, ensuring the  0.2 0.8   0.1 0.7   A =  0 0.4   0 0  0 0

matrix is row stochastic, to get:  0 0 0   0.2 0 0   0.4 0.2 0    0.4 0.2 0.4  0 0.8 0.2

Matrix A has the following eigenvalues: [1.0, 0.786, 0.31, 0.031, −0.42], and its dominant eigenvector corresponds to the desired invariant density. The chaotic map arising from A is shown in the Figure 4.21, and a chaotic map generated using our procedure (using βi = 0.1) is shown in Figure 4.22. It is extraordinary that such that different chaotic maps give rise to the same invariant density. Indeed, there are an infinite number of possible 127


1 0.9 0.8 0.7

x

n+1

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

n

Figure 4.21: Chaotic map synthesized using the matrix method of Gora and Boyarsky

128

4.7 Conclusions

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 4.22: Chaotic map with same invariant density as that in Figure 4.21 maps, based both on our method, and that of Gora and Boyarsky, that would give the same density. The main disadvantage of their method is that there are N − 1 independent parameters that must be chosen. In [Gora and Boyarsky, 1993], the authors do mention that additional criteria (such as Lyapunov exponents) need to be used to ensure the map is unique.

4.7

Conclusions

In this chapter, we have presented an original solution to the Inverse FrobeniusPerron problem based on positive matrix theory. The method presented has many clear advantages over the other methods for solving the IFPP in the literature. First of all, it is very straightforward: at all times, it is transparently obvious how the method works and what role the various parameters play. It is hard to overestimate how important this is. Engineers and other

129

4.7 Conclusions

researchers are naturally drawn to simple and elegant solutions such as this one. Another advantage is that we have full control over the properties of the map: we can alter the mixing properties, or the Lyapunov exponent simply by changing the values of the βi .

130

Chapter 5

Synthesis of Higher-Dimensional Maps and Parameter-Space Structures, and Potential Applications Man who says it cannot be done should not interrupt man doing it. Old Chinese proverb

5.1

Introduction

We have shown how to synthesize one-dimensional maps using the method based on positive matrix theory, and now wish to extend this method to two-dimensional maps, and possibly to N -dimensional maps. We describe two possible ways of doing this. The first method shown below generates a 131

5.2 Pseudo 2-D map from 1-D map

pseudo two-dimensional chaotic map from a one-dimensional map. A second method is given by Bollt [Bollt, 2000] and is based around affine functions. These methods are easily extendable to higher-dimensions. We also describe a method for generating arbitrary chaotic regions in the parameter space of a chaotic map, which could have potential applications in pattern classification. Additonally, we introduce some possible applications of chaotic maps. Some of the applications are based on the synthesized maps from Chapter 4. Our purpose is to to be novel either in the use of chaos, or the particular application to which we are applying the chaos. In this way, we hope to stimulate others into researching this innovative area.

5.2

Pseudo 2-D map from 1-D map

We start by partitioning the unit square into N 2 smaller squares each of side 1/N . We then number the squares consecutively from 1 to N 2 , and arrange the squares in a line, and then rescale the squares so that they are all contained in the unit interval. We form a 1-D vector of the desired densities, and use the synthesis method to form a transition matrix, as outlined elsewhere. The transition matrix is transformed into a 1-D chaotic map, producing some sequence of iterates pn . We generate the 2-D map essentially by transforming this 1-D time series into our required form, by mapping each iterate to a point in the unit square. The xn iterates are produced from the pn using a simple modulo operation. We generate the yn iterates by performing a Bernoulli shift on each pn iterate. This ensures that the sequence of y iterates is also chaotic. The transformations required are:

xn = N pn mod 1

132

(5.1)

5.3 Bollt’s Affine Function Method

yn =

5.2.1

kpn − bkpn c + bN pn c ,k À 1 N

(5.2)

N-Dimensional maps

Clearly this method is easily extendable to N -dimensional maps. For the 3-D case, we partition the unit cube into N 3 smaller cubes, and form a onedimensional vector of desired densities in each little cube. We then work backwards, this time applying two Bernoulli shifts to each chaotic iterate to generate a three-dimensional chaotic trajectory.

Figure 5.1: Synthesized 2-D map with Chequer-board Invariant Density

5.3

Bollt’s Affine Function Method

In [Bollt, 2000], Bollt introduces piecewise-affine transformations fn : Q → Q, Q ∈ 1 and is unstable. The second fixed point x∗2 is stable for r < 1, and becomes unstable for r > 1. In Figures 5.9, 5.10 and 5.11, we show the stable, chaotic, and unstable regimes that occur at different values of r. The bifurcation diagram shows the same information more concisely: Figure 5.12 shows the bifurcation structure and the skeleton of the bifurcation diagram (red dashed lines). The skeleton of the bifurcation diagram is obtained by applying the map f repeatedly to the critical point of the map

142

5.4 Synthesising Maps with Arbitrary Chaotic Regions in Parameter Space

xn+1 1

0.8

0.6

0.4

0.2

xn -0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Figure 5.10: Tent map with r = 1.5 - chaotic region

1.5

xn+1 1

0.5

0

-0.5

-1

-1.5 -0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Figure 5.11: Tent map with r = 2.1 - unstable

143

1

xn


Figure 5.12: Bifurcation diagram of the tent map xc = 12 . We obtain successively: f (xc ) = 1 r 2 r r2 f (3) (xc ) = 1 + − 2 2 2 r r r3 f (4) (xc ) = 1 − − + 2 2 2

f (f (xc )) = 1 −

This process can be repeated ad infinitum to generate the outlines of the various windows near the bifurcation point. Note that by setting f (3) (xc ) = f (4) (xc ), we can find the location of the main band-merging point to be √ r = 2.

144


2 1 .8

U nstable

1 .6 1 .4 1 .2 1

C haos

0 .8 0 .6 0 .4

F ixed Point

0 .2 0

0

0.2

0.4

0 .6

0 .8

1

1.2

1.4

1.6

1 .8

2

Figure 5.13: Parameter-space plot of Equation 5.10 with g(a, b) = a + b

5.4.2

Candidate functions for various parameter-space structures

To create a two-parameter map, f (x; a, b), we need to replace r in Equation 5.5 with some function of a and b, g(a, b), where a and b are the new parameters. The map will behave as follows for different values of g: 0 < g(a, b) ≤ 1 → fixed point 1 < g(a, b) ≤ 2 → chaos g(a, b) > 2 → unstable(−∞) Next, we consider some candidate functions for the role of g. First, we show parameter space plots for the following functions: (i) g(a, b) = a + b; (ii) g(a, b) = ab; (iii) g(a, b) = a2 + b2 . The overall iterated map is 1 xn+1 = 1 − g(a, b)|xn − | 2 145

(5.10)


Figure 5.14: Parameter-space plot of Equation 5.10 with g(a, b) = ab

146


Figure 5.15: Parameter-space plot of Equation 5.10 with g(a, b) = a2 + b2

147


Figure 5.16: Periodic structure created with the modulo function Figures 5.13, 5.14, and 5.15 show the parameter-space plots of the map with the three candidate functions. The chaotic region is coloured red, period-2 points are dark blue and the unstable region is in white. As it stands, the structure of the parameter space is not particularly useful, mainly because of the unstable regions. By introducing the modulo function, we can remove these regions to create more interesting and practical structures. For instance, the map xn+1 = 1 − (a + b)|xn − 12 | becomes unstable when a + b > 2. If we replace a with a mod 1 and b with b mod 1, we remove the unstable region, but allow the fixed point/chaotic regions to repeat, as shown in Figure 5.16. A variation on this theme is shown in Figure 5.17 where g(a, b) = (a2 + b2 ) mod 2. The repeating structures can be scaled and positioned in the parameter

148


Figure 5.17: Concentric chaotic circles in parameter-space

149


space quite easily by introducing some scaling factors. The following map shifts the origin to (a0 , b0 ) and stretches the parameter space structure by a factor µa in the a-direction, and by µb in the b-direction: ·µ ¶ µ ¶ ¸ a − a0 b − b0 1 xn+1 = 1 − mod 1 + mod 1 |xn − | µa µb 2

(5.11)

An even more general way to create arbitrary chaotic regions would be 2

to use localized Gaussian functions such as e−x . One can use functions such as this to place a chaotic region at any point in the space. In Figure 5.18, we show a parameter-space in which there are Gaussian functions centred at (0,0), (0,2) and (3,4). The function g(a, b) has the following form: g(a, b) = 1.2e−(a

2 +(b−2)2 )

2 +(b−4)2 )

+ 2e−((a−3)

+ 1.5e−(a

2 +b2 )

It is also possible to extend this method to an n-dimensional parameterspace, although the application of this is unclear.

5.4.3

The Spiral Problem

The spiral problem - the problem of identifying a spiral - is a very difficult pattern classification task for many types of neural network, although it may be solved using a fuzzy nearest neighbour classifier [Singh, 2001]. Spirals are highly nonlinear and resistant to shape transformation under typical scalar operations like rotation and scaling. They also occur frequently in nature, e.g. spiral galaxies and DNA double helix. As such, they pose an interesting benchmark problem for pattern classifiers. Although we have not implemented a spiral classifier, it may be possible to train a chaotic map to recognise a spiral, in a similar manner to that outlined in Chapter 3. The procedure would be to use training data to adjust the parameters of the map to give the correct size and shape of spiral. The inputs to the classifier would be the parameters of the map, and actual spiral data would cause 150


Figure 5.18: Chaotic regions created using Gaussian functions

151

5.5 System Identification and Modelling

the map to give a chaotic output. Data not on the spiral would give a fixed point output. In Figure 5.19, we show a chaotic spiral in parameter-space. This is generated using the following algorithm: Function Inputs: xn , a, b, k(constant) α = arctan(b/a) p r = (a2 + b2 )

angle from (0,0) to point (a,b) distance to point (a,b)

loop p = 0 if |r − αk + 2πpk| < πk break increment p

p is no. revolutions to point (a,b)

end loop α = α + 2πp

correcting α to incorporate p

x1 = kα cos α y1 = kα sin α p d = ((x1 − a)2 + (y1 − b)2 ) distance from (x1 , y1 ) to point (a,b) 2

xn+1 = 1 − 2e−d |xn − 0.5| Return The algorithm takes in a point (a,b) in parameter-space, as well as the iterate xn of the chaotic map, and determines how close (radially) the point is to the nearest point on a pre-defined spiral. The distance d is then used to modulate the exponential function, and (thus the tent map) so that we get a chaotic region (coloured in red in Figure 5.19) on the spiral.

5.5

System Identification and Modelling

An important topic in applied physics and control engineering is system identification. System identification is the estimation of system parameters 152


Figure 5.19: A chaotic spiral in parameter space

153


from a knowledge of system inputs and outputs. For example, by applying a step input to a linear system with second-order dynamics, it would be possible to estimate the gain, the natural frequency, and the damping ratio of the system. These parameters could then be used to approximate the transfer function of the system. An interesting variation on system identification, which involves a reversal of the synthesis procedure of Chapter 4, is to try to estimate the parameters of a chaotic system just from the iterates of the system. In other words, we wish to try and estimate the values of the αi and βi just by looking at the chaotic sequence xn . Clearly, it is quite easy to determine the invariant density, and the transition matrix, from the time-series, if we have prior information, or make some assumptions, about the dimensions of the matrix. Even if there is no information about the matrix dimensions, it would be possible to make a parsimonious estimate of the number of subintervals required to account for the observed variations in density. If the density appears to assume three distinct values, then we would assume that the transition matrix was 3 × 3, not 9 × 9, for instance. However, we shall assume that the dimension of the transition matrix is known. We will also assume that the βi are all the same, so as to be consistent with the work in Chapter 4. A useful application of this type of chaotic system identification is in modelling real-world data, such as arises in areas such as telecommunications. Of course, the models so derived would be based on Markov processes, and so could well differ from the underlying system dynamics, but one could model the invariant density and the mixing properties of the data as closely as one wished (by choosing a large enough transition matrix).

154


5.5.1

System Identification Examples

To show how system identification is possible, we shall describe some examples of the procedure in action. The first example takes 15000 iterates from a map based on a 4 × 4 transition matrix. The value of the βi = 0.2, and the synthesized map has an invariant density of ρ = [0.1, 0.2, 0.3, 0.4]. The transition matrix is as follows:  0.28   0.16 A=  0.24  0.32

 0.08 0.08 0.08

  0.36 0.16 0.16   0.24 0.44 0.24  0.32 0.32 0.52

(5.12)

The identification procedure is straightforward. The invariant density can be estimated by counting the number of iterates in each of 4 bins. In this case, the estimated invariant density is ρest = [0.101, 0.196, 0.309, 0.394]. The transition matrix can also be estimated by counting the transitions between different subintervals, and then by normalizing the matrix accordingly. A is estimated to be:

Aest

 0.2752   0.1531 =  0.2548  0.3168

0.0812 0.0821 0.3632 0.1518 0.2507 0.4521 0.3049 0.3140

 0.0809   0.1574   0.2393  0.5225

(5.13)

The next step is to determine the value of the βi , which in this example are all the same. One simple approach is to vary the value of β, and generate a transition matrix, B using ρest as the invariant density, and compare matrix B to matrix Aest using some appropriate metric. The metric used here was to look at the total squared error or difference between the two matrices:

N N X X (Aij − Bij )2 Total squared error = i=1 j=1

155

(5.14)


0.15

Total squared error

0.1

0.05

0 0.05

0.1

0.15

0.2

0.25

β

0.3

0.35

0.4

0.45

0.5

Figure 5.20: The minimum of the total least squared error corresponds to the correct value of β, 0.2 One then chooses the value of β that gives the minimum total squared error. In Figure 5.20, we plot the variation in the total squared error against β. The minimum corresponds closely to the actual value of β = 0.2 that was used to generate the data. As a second example, we look at 15000 iterates of a 9 × 9 map with βi = 0.3 and ρ = [1, 2, 3, 4, 5, 4, 3, 2, 1]/25. In Figure 5.21, we show the variation of the total squared error against the assumed dimensions of the matrix. It is clear from the graph that if the dimensions of the matrix were unknown, it would be possible to deduce that the data was generated either by a 4 × 4 matrix or a 9 × 9 matrix. Indeed, one would expect the total squared error to naturally grow as O(N 2 ). The unexpected drop in the squared error at N = 9 is a strong clue as to the dimension of the generating 156


1.8

1.6

1.4

Total Squared Error

1.2

1

0.8

9x9 Transition Matrix actually generated the data

0.6

0.4

0.2

0

4

6

8

10

12

14

16

18

Matrix Dimensions NxN

Figure 5.21: Variation of total squared error with matrix dimension matrix.

5.5.2

Modelling Time-Series Data

It is possible to synthesize a map when given chaotic data that arose from some process other than the Markov maps used above. However, while the resulting map will be a good approximation in terms of the invariant density, it will not be a good approximation of the state-space dynamics, or indeed of the transition matrix. This is because there is an infinite number of stochastic matrices that have the same dominant eigenvector, as we saw in Chapter 4 with our class of TCP-type matrices, and the three-band matrices of Gora and Boyarsky. Undoubtedly, other classes of matrix exist too. By varying the values of the βi it is possible to control the Lyapunov exponent of the synthesized map, independent of the invariant density. We 157


1

0.9

0.8

0.7

xn

0.6

0.5

0.4

0.3

0.2

0.1

0 0

10

20

30

40

50

60

70

80

90

100

Iterations

Figure 5.22: Time-series from the logistic map (black) and a model of the logistic map (red) having the same Lyapunov exponent have found that if we make the Lyaponov exponent of the synthesized map the same as that of the time-series being modelled, the time-series from the synthesized map looks nothing like the time-series being modelled. As an example, consider Figure 5.22, where the black line is the time-series from the logistic map with r = 4 and Lyapunov exponent of ln 2, and the red line is a model of the time-series having a similar invariant density and similar Lyapunov exponent of around 0.7. Clearly, one could never be mistaken for the other. The explanation for this behaviour is that the values of the βi need to be quite large to reduce the Lyapunov exponent down to 0.7, and so there are long laminar regions where the iterates are trapped near unstable fixed points. When the trajectory does jump, it tends to jump a long distance, leading to a large exponential divergence. 158

5.6 Adaptive control of chaos

In contrast to this, Figure 5.23 shows the same two time-series except the synthesized map has βi = 0.01, and a corresponding Lyapunov exponent of 2.1985. The modelled time-series now looks a better approximation of the logistic time-series, even though its Lyapunov exponent is much greater. The upshot of this is that we can model individual properties of maps using our approach, properties such as the invariant density, or time-series, or the Lyapunov exponent. But we cannot control all of the properties to give an exact reproduction of the system. The only system that can exactly reproduce the behaviour of the Logistic map (for instance) is the Logistic map itself. This has echoes of Joseph Ford’s dictum that a chaotic system is its own shortest description and its own fastest computer [Ford, 1988]. The matrix approach assumes that the underlying dynamics are Markov, that each state depends only on the previous state. Also, if we consider the Logistic map, we see that it is not expanding everywhere, but our synthesized models are expanding everywhere. This alone will account for much of our inability to exactly model chaotic time-series.

5.6

Adaptive control of chaos

The field of chaos control is one of the most active areas of chaos research [Chen and Dong, 1997]. Usually, chaos control refers to stabilising a chaotic trajectory onto some unstable fixed point, as was mentioned in Chapter 2 [Ott et al., 1990]. An interesting possible offshoot of the work in Chapter 4 is to control the invariant density of a map whose parameters are slowly time-varying. This is known as adaptive control. Many industrial processes are time-varying to some extent, whether it be due to temperature, or component wear-and-tear, or failures within the system [Ikonen and Najim, 2002]. The assumption of time-invariance can often lead to sufficiently good control schemes. In the cases where this assumption is unwise, online system identification can be performed to update the 159


1

0.9

0.8

0.7

xn

0.6

0.5

0.4

0.3

0.2

0.1

0 0

10

20

30

40

50

60

Iterations

Figure 5.23: A better model of the logistic map (red) which has a Lyapunov exponent of 2.1

160


System Identifier

{αi, βi } Time-varying chaotic system

Controller

xn

yn

Figure 5.24: Block diagram of an adaptive controller which maintains a constant invariant density for a time-varying chaotic system system model and controller parameters. In our adaptive chaos control scheme, we will assume that the system is a synthesized chaotic map of the type seen in Chapter 4. A block diagram of the adaptive controller is shown in Figure 5.24. The chaotic map is producing some sequence of iterates xn and has an invariant density ρ1 . Using the system identification system outlined in Section 5.5.1, the invariant density of the iterates can be estimated. We will assume that the invariant density of the chaotic map is slowly varying, and that all of the values of βi are equal and constant. The purpose of the controller is to modify the set of iterates xn so that the invariant density of the iterates is equal to the desired invariant density ρ0 . The controller is easily implemented if we consider Theorem 4.3 from Chapter 4. We let the controller be a chaotic map with invariant density ρ2 . The series combination of the chaotic system and controller acts like a switched system, with the actual output invariant density being ρa =

161


(1/2)ρ1 + (1/2)ρ2 . Rearranging, we find that ρ1 ' 2ρa − ρ2

(5.15)

This equation tells us the invariant density of the chaotic map on its own, in terms of the actual invariant density, and the invariant density of the controller map. Once ρ1 is estimated, the controller can be modified so that the actual invariant density approaches the desired invariant density ρ0 . ρ2 ' 2ρ0 − ρ1

5.6.1

(5.16)

Example

We implemented the adaptive control scheme outlined above for a simple time-varying chaotic map whose (now varying) invariant density was: ρ1 = [0.1 + 0.005 cos(2πn/200), 0.2, 0.3, 0.4 + 0.04 sin(2πn/100)] Note that ρ1 was rescaled at each epoch n to ensure it summed to one. An epoch was taken as 20000 iterations of the system. The desired invariant density was: ρ0 = [0.25, 0.25, 0.25, 0.25] The controlling map invariant density was initialised to be: ρ2 = [0.4, 0.3, 0.2, 0.1] The system was iterated with the controlling map constant, and with the adaptive control (Equations 5.15 and 5.16) switched on. The results are shown in Figures 5.25 and 5.26, which show the variation in the controlling map invariant density vector, and the variation in the overall system invariant density. In Figure 5.25, the controlling map is constant, and so the desired invariant density varies because the chaotic map is varying. In

162

5.7 Encryption using Time-Varying Switched Chaotic Maps

0.45

0.4

Invariant density per subinterval

0.35 Overall invariant density varies with time, when control switched off 0.3

0.25

0.2 Desired invariant density 0.15 No control - controller invariant density constant 0.1

0.05 0

20

40

60

80 100 120 Epoch (20000 Iterations)

140

160

180

200

Figure 5.25: Controller off: Blue lines are the elements of the controller invariant density (constant). Coloured lines are the elements of the (uncontrolled) invariant density which are varying because of the time-varying map.

The desired invariant density is the black line corresponding to

ρ = [.25, .25, .25, .25]. Figure 5.26, the controlling map is varied in order to keep the overall invariant density fixed at the desired value ρ0 . The variations around 0.25 are a result of the stochastic nature of invariant densities: the long-run average of the desired invariant density is equal to ρ0 , but at any particular moment, the desired invariant density will not equal ρ0 .

5.7

Encryption using Time-Varying Switched Chaotic Maps

Some recent work on time-varying unsynchronised communication networks, similar to that which inspired Chapter 4, provides exciting new possibilities

163


Invariant density per subinterval

0.5 Controller invariant densities 0.4

0.3

Controlled invariant density: ρ=[0.25 0.25 0.25 0.25] 0.2

0.1

0 0

20

40

60

80 100 120 Epoch (20000 iterations)

140

160

180

200

Figure 5.26: Controller on: Blue lines are the elements of the controller invariant density. These are varying and are enabling the overall invariant density to be controlled (coloured lines).

164


for information hiding or encryption. Stanojevic [Stanojevic et al., 2005] consider a set of m × m matrices A1 . . . Am which are derived from a matrix A0 by setting all of the βi except one to 1, where matrix A0 is a TCP-type matrix that we looked at in Chapter 4. This set of matrices can be used to model unsynchronised TCP networks. In the context of chaotic maps, the set of matrices correspond to Markov maps which are only expanding in one subinterval. While this may not be particularly useful on its own, it has been shown that if the matrices A1 . . . Am are chosen with probabilities p1 , . . . , pm , then the expected value of the Perron eigenvector (and hence the invariant density) is given by: E(XpT ) = E(ρ) = {

αm α1 ,..., } p1 (1 − β1 ) pm (1 − βm )

(5.17)

If the probabilities are made time-dependent so that they depend on the average values of the densities within each subinterval in the partition according to the relation: pi (k + 1) = τi ρi (k)

(5.18)

we find that the expected value of the pi is given by: E(pi ) ' τi ρi ' τi

αi pi (1 − βi )

(5.19)

Note that it is assumed that the probabilities only evolve very slowly, as the invariant density must be allowed settle down. Rearranging, we find that: pi 2 '

τi α i (1 − βi )

and so √ pi ' τi

r

αi 1 − βi

(5.20)

(5.21)

Equation 5.21 shows that the invariant density may be controlled using the τi , and this suggests the scheme in Figure 5.27 for data hiding or encoding. The sender and receiver both have the same matrices. The sender then chooses a set of τi which will encode a message in the invariant density. To 165


recover the message, the τi are passed to the receiving end. The second chaotic system is iterated using the τi and the message is recovered from the invariant density. This is secure in that knowledge of the τi is required to decrypt the message. To resist a known-plaintext attack, the message might also need to be encoded in some type of scrambled form. The τi must also reach the second chaotic system is some secure form. If the exact makeup of the systems was unknown to an eavesdropper, then the τi could be made public as they would be meaningless without the system. As an example, we will encode the word ‘data’ using a 4 × 4 transition matrix. The most straightforward way to encode the word is to let a = 1, b = 2, . . ., and then rescale the resulting vector so that it sums to one and can represent an invariant density. Doing this gives us ρenc = {.1538, 0.0385, 0.7692, 0.0385}. We now must determine the values of the τi which in turn determine the probabilities of a particular transition matrix being chosen. From Equations 5.17 and 5.21, one can easily show that the required values of τi are given by: τi =

ρ ρ2enc

(5.22)

where ρ is the invariant density without time-varying probabilities, and ρenc is the desired invariant density with the encoded data. In this case, ρ = {1, 2, 3, 4}, and τi = {4.6973, 149.9222, 0.5634, 299.8445}. The τi are then scaled so that they sum to one. Referring to Figure 5.27, System 2 would receive the τi key, and iterate accordingly until the invariant density settles down. We iterated System 2 for 50000 iterations to get the graph shown in Figure 5.28. We have scaled the invariant density so that the encoded ‘data’ is clearly visible.

166


Key: τi

System 1 Matrices: A1..Am Parameters: αi, βi, τ i Encoded in invariant density: ρ={αi/λ(1-βi)}

System 2 Matrices: A1..Am Parameters : αi, βi

Iterate chaotic system

ρ Secret Message

Recover message

Figure 5.27: Hiding a message using time-varying chaotic maps

20

18

16

Invariant Density (Scaled)

14

12

10

8

6

4

2

0

1

2

3

4

Subinterval

Figure 5.28: Recovered message (‘data’: d=4, a=1, etc.) from a timevarying chaotic map 167

5.8 Encryption using Symbolic Dynamics

5.8

Encryption using Symbolic Dynamics

In this section, we present another idea for storing or encoding information, which uses the symbolic sequences associated with a chaotic orbit of the Bernoulli map. We search these sequences for the string which we wish to encode. When found, the number of iterations required to find that string, given some starting point, or initial condition, is recorded. The information can be recovered by iterating the map using the correct initial condition for the correct number of iterations. The Map is described by the equation

xn+1

  xn /S if 0 ≤ x ≤ S =  (x − S)/(1 − S) if S < x < 1 n

(5.23)

or if S = 1/2, the map can be written as xn+1 = 2xn

mod 1

(5.24)

The attractor is a line segment. The system is chaotic and it has been shown [Farmer et al., 1983] that the x-variable is ergodic on the interval [0, 1]. We introduce a course-graining, or a symbolic dynamics description of the chaotic trajectory as follows: Partition the state space into two disjoint regions, {E0 , E1 }, where E0 = {(xi )|xi < 0.5} and E1 = {(xi )|xi ≥ 0.5}. We then assign a symbol σ(n) = i to each point of the chaotic trajectory (xn ), where i = 0 if (xn ) ∈ E0 and i = 1 if (xn ) ∈ E1 . An orbit of the chaotic map can be represented as a symbolic sequence Σ = {σ(0), σ(1), σ(2), . . . , σ(n)}. We shall define a W ord to mean any finite symbolic sequence contained in a chaotic symbolic sequence. (In the symbolic dynamics literature, a word usually corresponds to a stable periodic orbit. [Bai-Lin, 1989]) When S = 0.5, the symbol sequence of a chaotic orbit will be Bernoulli (corresponds with the tossing of a coin). Therefore, every possible word will appear in the symbolic sequence if we wait long enough. For words of reasonable length 168


1

0.9

0.8

0.7

x(n+1)

0.6

0.5

0.4

0.3

0.2

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x(n)

Figure 5.29: (Bernoulli) Shift map, with S = 0.5 (≤ 20) symbols, we only need iterate the map several thousand times to find any particular word. We will use this result to show two different ways in which information could be encoded using the map. Say we wish to encode a picture. We convert the raw picture data to a binary sequence. After chopping this sequence into convenient smaller binary words (of length 10-20 bits), we search for these words in some chaotic sequence Σ, and record the number of iterations required to find each particular word. To retrieve the binary sequences, or words, we must iterate the map for a particular number of iterations, using the same initial condition that we encoded the information with. Also, when decoding, the map parameter must be the same as when encoding, or the information is lost. Alternatively, rather than storing the number of iterations required to reproduce some symbolic sequence, we can record the value of the statevariables at the start of each symbolic sequence. These values then serve as initial conditions and allow the symbolic sequence to be recovered straight away. After imposing the symbolic dynamics description on the output of the 169


chaotic map, with the parameter S = 0.5, the probability of a zero, or a one occurring, P (0) = P (1) = 1/2. Thus the probability of finding an n−bit string, Pn , in the chaotic sequence is Pn = (1/2)n . The average number of iterations of the map required to find the n−bit string is simply 2n . Suppose an image has N pixels, and its colour information is stored in c bits, then the number of n−bit binary words, Wn required to encode the image is Wn = N c/n. The average number of iterations of the map required to encode an image is thus: < tenc > =

Nc n 2 iterations n

(5.25)

For n−bit encoding, we clearly need n bits, or n/8 bytes per iteration number stored. In the ideal case, the average encoded image size is Wn n/8 = N c/8 bytes. In our scheme, we shall use 3 bytes to store each iteration number, giving an average encoded file length of < Encoded f ile >=

3N c bytes n

(5.26)

It can be seen from Equation 5.26 that for this example where we use 3 bytes to encode each iteration number, n ≤ 24 bits, and so the average file size will always be ≥ N c/8 bytes. And so if the image has 256 colours, and c = 8, then the expected file size will be ≥ N bytes. In the ideal case, the encoded file size will be the same size as the original image, irrespective of the content of the image. The chaotic map is being used as a type of trapdoor one-way function [Mackay, 2003], with the parameter S and the initial condition forming the trapdoor. Given the number of iterations required to find a particular string in the chaotic sequence, it would be computationally infeasible to find the corresponding binary string without knowing the value of S or the starting value. We illustrate the method by encoding the well-known grayscale image of Lena. The image resolution is 128x128 and has 64 grayscales. Thus, 170


Table 5.1: Word Length Data Word Length

File Size

10bits

28.9kb

12bits

24.2kb

14bits

20.6kb

16bits

18.0kb

18bits

16.1kb

19bits

15.3kb

20bits

14.6kb

N = 16384 and c = 6 bits. The size of the raw image data is 16384 bytes. The raw image data is converted into a contiguous file consisting of the binary digits representing each grayscale. This file is then chopped into equal length binary words and we search the chaotic sequence for these words. When each word is found, we record the number of iterations required to find that particular word. Note that if the original image is not optimally compressed, as in this case, then the encoded image can end up being smaller than the original, for word lengths of greater than 17. In Figure 5.30, we show the size of the encoded data file vs. encoded word bit length. The map parameters used are: S = 0.49999999999, and the initial condition used is x(0) = 0.111. Figure 5.33 shows the decoded image where the error in the initial x-value is 1 × 10−17 . The correctly decoded image is an exact replica of the original, however, the incorrectly decoded image bears no resemblance to the original. If the error is smaller than 1 × 10−17 , then we do get back the original image, although this is because of the finite precision arithmetic used by the computer. Theoretically, one would need to know the initial condition correct to 2n binary places, as it 171


4

x 10 3

Theoretical 2.8

Actual

File Size / Bytes

2.6

2.4

2.2

2

1.8

1.6

1.4 10

11

12

13

14

15

16

17

18

19

20

Bit Length of Encoded Words

Figure 5.30: Size of encoded data file

9

Time taken to encode image / Iterations

10

8

10

Theoretical

7

10

6

10

4

6

8

10

12

14

16

18

Bit Length of Encoded Words

Figure 5.31: Time taken to encode data

172

20

5.9 Conclusions

20

40

60

80

100

120 20

40

60

80

100

120

Figure 5.32: Original Lena Image

20

40

60

80

100

120 20

40

60

80

100

120

Figure 5.33: Decoded Lena Image, with initial condition error of 1 × 10−17 can be easily shown that in the Bernoulli shift map, one bit of information is lost per iteration.

5.9

Conclusions

In this chapter, we have shown how the invariant density synthesis method may be extended to higher dimensions. We used two different schemes to show this, one based on the work of Bollt, and the other based on the idea of a pseudo-chaotic time series. We also showed how two-dimensional synthetic maps could be used to encode and regenerate images. We also showed how a simple modification to the tent map allowed the generation of arbitrary chaotic and non-chaotic regions in the parameter-space of a map. Following this, we considered some various possible applications of chaos and of the synthesis procedure, including modelling, control, and some methods 173

5.9 Conclusions

of encryption or data-hiding. These applications were presented in a proofof-concept framework, in that the results were based on experimentation. The same can be said of the earlier work in the chapter too. It is hoped that the variety of innovative ideas presented here would lead to more in-depth investigations by others. Indeed the whole thesis is an attempt to promote applications of chaos in non-traditional areas by giving researchers new tools to customize chaotic maps.

174

Chapter 6

Conclusions That I am no skilled mathematician I have had little need to confess. I am ‘advanced in these enquiries no further than the threshold’; but something of the use and beauty of mathematics I think I am able to understand. D’Arcy Thompson, 1917

6.1

Thesis Conclusions

There has been continuing interest in chaos and its applications for many years. Chaos offers many possibilities to researchers in the physical sciences because it is so multi-faceted. Our thesis has been that for chaos applications to flourish, a systematic method must be developed for custom-designing maps . We do not think that the Logistic map, for example, should be shoehorned into every conceivable application. There must be a better way. In Chapter 4 of this work, a small but important addition to the literature on synthesis of chaotic maps was made. We showed that there is a simple method based on positive matrix theory for synthesizing maps with arbitrary piecewise constant invariant densities. We also presented a simple method for creating arbitrary chaotic regions in the parameter-space of a map. In 175

6.2 Ideas for the Future

the literature, there now exists a host of methods for controlling various properties of chaotic systems. If an aspect of chaos is deemed useful for an application, let a map or chaotic system be purposely designed. As we have seen in Chapters 2, 3, and 5, there is no shortage of ways that chaos can be used. The only limitation is the imagination of the researcher. The major problem that chaos applications have encountered, and will continue to encounter, is that of adoption. Why should a chaotic system be given preferential treatment over an existing system that works perfectly well. In the near future, there is no contest: conventional technology will prevail, but as more time and money are thrown into chaos research, hopefully some concrete chaos applications will emerge in the 21st Century.

6.2

Ideas for the Future

There are several bifurcations in the thesis where we took one route though another route was available. Many of these side branches are interesting in their own right, and perhaps merit further attention: • On the pattern classification scheme, we only considered the generalized Baker’s map, though there are many versions of this map in the literature, and many other two-dimensional maps. It would be interesting to see if any had the property of monotonically increasing Lyapunov dimension so useful for solving the XOR problem. It would also be interesting to see if maps could be synthesized so that the fractal dimension of the attractor varies in a particular way. • We built a hardware version of the Baker’s map using off-the-shelf components during our investigations, though it hasn’t been detailed here. We were unable to determine the Lyapunov exponents of the circuit dynamics in hardware, and so were unable to make a hardware version of the pattern classifier. Could this problem be overcome 176


somehow? • There are higher dimensional analogues of the Baker’s map in the literature. What properties do these have, and could they be used to extend the dimensionality of the feature space of our system? • We mentioned the spiral problem in Chapter 5, but did not attempt to train a spiral classifier. One would need to compare how quickly a spiral classifier could be trained compared with other pattern classification schemes. Also, how would one get over the problem of the a priori assumption that the pattern is a spiral. A multiple-model approach might need to be taken. • There is an overwhelming need for a review article on all of the different ways chaotic maps can by synthesized or controlled. This would need to incorporate the matrix methods presented here, and the matrix methods of Gora and Boyarsky, as well as the methods based on conjugating functions, and so on. • There are clearly many different classes of transition matrix that lead to the same invariant density. What is the relationship between our matrix method, and that of Gora and Boyarsky’s tri-band matrix method? What other classes are available? Can other classes be cast in a parameterizable form? • Is there a way in which our synthesis method could be modified to give a piecewise linear invariant density, rather than piecewise constant? It would be necessary to change the linear segments in the graph of the chaotic map into some other shape (perhaps quadratic). • What is the relationship between the values of the βi in our synthesis method and the autocorrelation of the chaotic time-series? Can the autocorrelation be controlled? 177


• Can a good application be found for the synthesized parameter-space structures? Can this general scheme be based on maps other than the tent map? • We found that the adaptive control scheme sometimes becomes unstable if the desired invariant density is unreachable. It would be interesting to characterize this instability and determine its limits. Also, could the adaptive control scheme be used to control any map (Logistic map, Sine map, Cubic map etc.), not just a Markov type chaotic map?

178

Bibliography [Abarbanel and Linsay, 1993] Abarbanel, H. D. and Linsay, P. S. (1993). Secure communications and unstable periodic orbits of strange attractors. IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, 40:643–645. [Abel and Schwarz, 2002] Abel, A. and Schwarz, W. (2002). Chaos communications - principles, schemes and system analysis. Proceedings of the IEEE, 90(5):691–710. [Abraham and Ueda, 2000] Abraham, R. and Ueda, Y., editors (2000). The chaos avant-garde: memories of the early days of chaos theory. World Scientific. [Alvarez et al., 1999] Alvarez, E. et al. (1999). New approach to chaotic encryption. Physics Letters A, 263:373–375. [Amstead, 1997] Amstead, B. H. (1997). Manufacturing Processes. Wiley. [Andreyev et al., 1997] Andreyev, Y. V., Dmitriev, A. S., and Starkov, S. O. (1997). Information processing in 1-d systems with chaos. IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, 44(1):21–28.

179

BIBLIOGRAPHY

[Andreyev et al., 1996] Andreyev, Y. V. et al. (1996). Information processing using dynamical chaos: Neural networs implementation. IEEE Transactions of Neural Networks, 7(2):290–299. [Arrowsmith and Place, 1992] Arrowsmith, D. K. and Place, C. M. (1992). Dynamical Systems: Differential Equations, Maps, and Chaotic Behaviour. Chapman and Hall. [Bai-Lin, 1988] Bai-Lin, H., editor (1988). Directions in Chaos, Volume 2. World Scientific. [Bai-Lin, 1989] Bai-Lin, H. (1989). Elementary Symbolic Dynamics. World Scientific. [Baptista, 1998] Baptista, M. S. (1998). Cryptography with chaos. Physics Letters A, 240:50–54. [Baranovsky and Daems, 1995] Baranovsky, A. and Daems, D. (1995). Design of one-dimensional chaotic maps with prescribed statistical properties. International Journal of Bifurcation and Chaos, 5(6):1585–1598. [Barrow-Green, 1997] Barrow-Green, J. (1997). Poincaré and the three-body problem. American Mathematical Society. [Berman et al., 2004] Berman, A., Shorten, R., and Leith, D. (2004). Positive matrices associated with synchronised communication networks. Linear Algebra and Its Applications, 393:47–54. [Bishop, 1995] Bishop, C. (1995). Neural Networks for Pattern Recognition. Clarendon Press. [Bollt, 2000] Bollt, E. M. (2000).

Controlling chaos and the inverse

frobenius-perron problem: Global stabilization of arbitrary invariant measures. International Journal of Bifurcation and Chaos, 10(5):1033–1050.

180

BIBLIOGRAPHY

[Bollt and Meiss, 1995] Bollt, E. M. and Meiss, J. D. (1995).

Target-

ing chaotic orbits to the moon through recurrence. Physics Letters A, 204:373–378. [Boyarksy and Góra, 1997] Boyarksy, A. and Góra, P. (1997).

Laws of

Chaos. Birkhäuser. [Boyarsky and Góra, 1992] Boyarsky, A. and Góra, P. (1992). A dynamical system model for interference effects and the two-slit experiment of quantum physics. Physics Letters A, 168:103–112. [Boyarsky and Góra, 2002] Boyarsky, A. and Góra, P. (2002). Chaotic maps derived from trajectory data. Chaos, 12(1):42–48. [Buchmann, 2000] Buchmann, J. A. (2000). Introduction to Cryptography. Springer-verlag. [Calude et al., 1998] Calude, C. S., Casti, J., and Dinneen, M. J., editors (1998). Unconventional Models of Computation. Springer-Verlag. [Chen and Dong, 1997] Chen, G. and Dong, X. (1997). From Chaos to Order. World Scientific. [Chen and Ueta, 2002] Chen, G. and Ueta, T., editors (2002). Chaos in Circuits and Systems. World Scientific. [Chua, 1999] Chua, L. O. (1999). Passivity and complexity. IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, 46(1):71–82. [Cvitanović, 1989] Cvitanović, P., editor (1989). Universality in Chaos. Institute of Physics, 2nd edition. [Diakonos et al., 1999] Diakonos, F. et al. (1999). A stochastic approach to the contruction of one-dimensional chaotic maps with prescribed statistical properties. Phyics Letters A, 264(2-3):162–170. 181

BIBLIOGRAPHY

[Diffie and Hellman, 1976] Diffie, W. and Hellman, M. E. (1976). New directions in cryptography. IEEE Transactions on Information Theory, 22(6):644–654. [Dmitriev et al., 1991] Dmitriev, A. S. et al. (1991). Storing and recognizing information based on stable cycles of one-dimensional maps. Physics Letters A, 155(8,9):494–499. [Duda et al., 2001] Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification. Wiley-Interscience, 2nd edition. [Farmer et al., 1983] Farmer, J. D., Ott, E., and Yorke, J. A. (1983). The dimension of chaotic attractors. Physica D, 7:153–180. [Feigenbaum, 1980] Feigenbaum, M. J. (1980). Universal behaviour in nonlinear systems. Los Alomos Science, 1:4–27. [Ford, 1988] Ford, J. (1988). Quantum Chaos, Is there any?, pages 128–147. Volume 1 of [Bai-Lin, 1988]. [Freeman, 1991] Freeman, W. J. (1991). The physiology of perception. Scientific American, 264(2). [Freeman, 1994] Freeman, W. J. (1994). Neural networks and chaos. Journal of Theoretical Biology, 171(1):13–18. [Freeman, 1995] Freeman, W. J. (1995). Chaos in the brain: Possible roles in biological intelligence. International Journal of Intelligent Systems, 10(1):71–88. [Gauthier, 2003] Gauthier, D. J. (2003). Controllin chaos. American Journal of Physics, 71(8):750–759. [Giovanardi and Mazzini, 2001] Giovanardi, A. and Mazzini, G. (2001). Frequency domain chaotic masking. IEEE symposium on circuits and systems, 2:521–524. 182

BIBLIOGRAPHY

[Golub and van Loan, 1996] Golub, G. and van Loan, C. (1996). Matrix Computations. Johns Hopkins University Press. [Gora and Boyarsky, 1993] Gora, P. and Boyarsky, A. (1993). A matrix solution to the inverse frobenius-perron problem. Proceedings of the American Mathematical Society, 118(2):409–414. [Grassberger, 1983] Grassberger, P. (1983).

Generalized dimensions of

strange attractors. Physics Letters, 97(6):227–230. [Grossmann and Thomae, 1977] Grossmann, S. and Thomae, S. (1977). Invariant distributions and stationary correlation functions of onedimensional discrete processes. Zeitschrift f¨ ur Naturforschung a, 32:1353– 1363. [Gupta, 1975] Gupta, M.-S. (1975). Applications of electrical noise. Proceedings of the IEEE, 63(7):996–1010. [Hastie et al., 2001] Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Series in Statistics. Springer. [Haykin, 1999] Haykin, S. (1999). Neural Networks. Prentice-Hall, 2nd edition. [Heffernan, 1985] Heffernan, D. M. (1985). Multistability, intermittency and remerging geigenbaum trees in an externally pumped ring cavity laser system. Physics Letters, 108A(8):413–422. [Heffernan et al., 1992] Heffernan, D. M. et al. (1992). Characterization of chaos. International Journal of Theoretical Physics, 31(8). [Hentschel and Procaccia, 1983] Hentschel, H. and Procaccia, I. (1983). The infinite number of generalized dimensions of fractals and strange attractors. Physica D, 8:435–444. 183

BIBLIOGRAPHY

[Hilborn, 1994] Hilborn, R. C. (1994). Chaos and Nonlinear Dynamics. Oxford University Press. [Holmes, 1990] Holmes, P. (1990). Poincaré, celestial mechanics, dynamicalsystems theory and chaos. Physics Reports, 193(3):137–163. [Hopfield, 1982] Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of National Academy of Sciences, 79:2554–2558. [Horn and Johnson, 1985] Horn, R. and Johnson, C. (1985). Matrix Analysis. Cambridge University Press. [Ikonen and Najim, 2002] Ikonen, E. and Najim, K. (2002).

Advanced

Process Identification and Control. Marcel-Dekker. [Jakimoski and Kocarev, 2001] Jakimoski, G. and Kocarev, L. (2001). Analysis of some recently proposed chaos-based encryption alogrithms. Physics Letters A, 291:381–384. [Kapitaniak, 2000] Kapitaniak, T. (2000). Chaos for Engineers: Theory, Applications and Control. Springer. [Kaplan and Yorke, 1979] Kaplan, J. and Yorke, J. (1979). Functional Differential Equations and the Approximations of Fixed Points, volume 730 of Lecture Notes in Mathematics, chapter Chaotic Behaviour of multidimension difference equations, pages 204–207. Springer. [Kautz, 1999] Kautz, R. L. (1999). Using chaos to generate white noise. Journal of Applied Physics, 86(10):5794–5800. [Keating and Noonan, 1994] Keating, J. and Noonan, D. (1994). The structure and performance of trained boolean networks. In Orchard, G., editor, Neural Computing: Research and Applications II, pages 79–86. Irish Neural Networks Association. 184

BIBLIOGRAPHY

[Kennedy and Chua, 1986] Kennedy, M. P. and Chua, L. O. (1986). Van der pol and chaos. IEEE Transactions on Circuits and Systems, 33(10):974– 980. [Kennedy and Dedieu, 1993] Kennedy, M. P. and Dedieu, H. (1993). Experimental demonstration of binary chaos shift keying using self-synchronising chua’s circuits. Proceedings of the International Workshop on Nonlinear Dynamics in Electronic Systems NDES, pages 67–72. [Kennedy et al., 1998] Kennedy, M. P. et al. (1998). Recent advances in communicating with chaos. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems, ISCAS, 4:461–464. [Kennedy et al., 2000] Kennedy, M. P., Rovatti, R., and Setti, G., editors (2000). Chaotic Electronics in Telecommunications. CRC Press. [Kirkpatrick et al., 1983] Kirkpatrick, S. et al. (1983). Optimization by simulated annealing. Science, 220:671–680. [Kocarev, 2001] Kocarev, L. (2001). Chaos-based cryptography: A brief overview. IEEE Circuits and Systems Magazine, 1(3):6–21. [Kocarev and Jakimoski, 2001] Kocarev, L. and Jakimoski, G. (2001). Logistic map as a block encryption algorithm. Physics Letters A, 289:199– 206. [Kohda, 2002] Kohda, T. (2002). Information sources using chaotic dynamics. Proceedings of the IEEE, 90(5):641–661. [Kojima, 1998] Kojima, K. (1998). Dynamical learning of neural networks based on chaotic dynamics. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 4:3674–3679. [Kojima and Ito, 1999a] Kojima, K. and Ito, K. (1999a).

Autonomous

learning of novel patterns by utilizing chaotic dynamics. Proceedings of 185

BIBLIOGRAPHY

the IEEE International Conference on Systems, Man and Cybernetics, 1:284–299. [Kojima and Ito, 1999b] Kojima, K. and Ito, K. (1999b). Dynamical distributed memory systems. Proceedings of the 4th International Symposium on Integration of Heterogeneous Systems, pages 374–377. [Kolumban, 1997] Kolumban, G. (1997). The role of synchronization in digital communications using chaos - part 1:fundamentals of digital communications. IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, 44(10):927–936. [Kolumban, 2002] Kolumban, G. (2002). Chaotic communications with correlator receivers: Theory and performance limits. Proceedings of the IEEE, 90(5):711–732. [Kozma and Freeman, 2001] Kozma, R. and Freeman, W. J. (2001). Chaotic resonance - methods and applications for robust classification of noisy and variable patterns. Internation Journal of Bifurcation and Chaos, 11(6):1607–1629. [Lai et al., 1999] Lai, Y.-C., Bollt, E., and Grebogi, C. (1999). Communicating with chaos using two-dimensional symbolic dynamics. Physics Letters A, 255:75–81. [Lasota and Mackey, 1994] Lasota, A. and Mackey, M. (1994). Chaos, Fractals, and Noise, volume 97 of Applied Mathematical Sciences. SpringerVerlag. [Lawrence and Mauch, 1988] Lawrence, P. D. and Mauch, K. (1988). Realtime microprocessor system design: An introduction. McGraw-Hill. [L.Devaney, 1989] L.Devaney, R. (1989). An Introduction to Chaotic Dynamical Systems, 2nd Edition. Perseus Books. 186

BIBLIOGRAPHY

[Li et al., 2003] Li, S. et al. (2003). Problems with a probabilistic encryption scheme based on chaotic systems. International Journal of Bifurcation and Chaos, 13(10). [Li, 1976] Li, T.-Y. (1976). Finite approximation for the frobenius-perron operator: A solution to ulam’s conjecture. Journal of Approximation Theory, 17:177–186. [Li and Yorke, 1975] Li, T.-Y. and Yorke, J. (1975). Period three implies chaos. American Mathematical Monthly, 82:985. [Lorenz, 1963] Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20:130–141. [Lorenz, 1993] Lorenz, E. N. (1993). The Essence of Chaos. University of Washington Press. [Luenberger, 1979] Luenberger, D. G. (1979). Introduction to Dynamic Systems. Wiley. [Mackay, 2003] Mackay, D. J. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge. [May, 1976] May, R. M. (1976). Simple mathematical models with very complicated dynamics. Nature, 261:459–467. [Various Authors, 1995] Various Authors (1995). Special issue on nonlinear phenomena in power systems. Proceedings of the IEEE, 83(11). [Metropolis et al., 1973] Metropolis, M., Stein, M., and Stein, P. (1973). On finite limt sets for transformations on the unit interval. Journal of Combinatorial Theory, 15:25–44. [Metropolis et al., 1953] Metropolis, N. et al. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21:1087–1092. 187

BIBLIOGRAPHY

[Mondragon, 1999] Mondragon, R. J. (1999). A model of packet traffic using a random wall model. International Journal of Bifurcation and Chaos, 9(7):1381–1392. [Moore, 1990] Moore, C. (1990). Unpredictability and undecidability in dynamical systems. Physical Review Letters, 64(20):2354–2357. [Moore, 1998] Moore, C. (1998). ers:

Finite Dimensional Analog Comput-

Flows, Maps and Recurrent Neural Networks.

Volume 1 of

[Calude et al., 1998]. [Narendra, 1996] Narendra, K. (1996). Neural networks for control theory and practice. Proceedings of the IEEE, 84:1385–1406. [Ott, 2002] Ott, E. (2002). Chaos in Dynamical Systems, 2nd Edition. Cambridge University Press. [Ott et al., 1990] Ott, E., Grebogi, C., and Yorke, J. (1990). Controlling chaos. Physical Review Letters, 64:1196–1199. [Pecora and Carroll, 1990] Pecora, L. M. and Carroll, T. L. (1990). Synchronization in chaotic systems. Physics Review Letters, 64:821–823. [Pingel et al., 1999] Pingel, D. et al. (1999). Theory and examples of the inverse-frobenius-perron problem for complete chaotic maps.

Chaos,

9(2):357–366. [Press et al., 2002] Press, W. H. et al. (2002). Numerical Recipes in C++. Cambridge University Press. [Ripley, 1996] Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press. [Rogers et al., 2004] Rogers, A., Shorten, R., and Heffernan, D. M. (2004). Synthesizing chaotic maps with prescribed invariant densities. Physics Letters A, 330:435–441. 188

BIBLIOGRAPHY

[Rogers et al., 2003] Rogers, A. R., Keating, J., and Shorten, R. (2003). A novel pattern classification scheme using the baker’s map. Neurocomputing, 55:779–786. [Rogers et al., 2002] Rogers, A. R., Keating, J. G., Shorten, R., and Heffernan, D. M. (2002). Chaotic maps and pattern recognition - the XOR problem. Chaos, Solitons and Fractals, 14:57–70. [Rosenblatt, 1962] Rosenblatt, F. (1962).

Principles of Neurodynamics.

Spartan. [Roy et al., 1992] Roy, R. et al. (1992). Physical Review Letters, 68:1259. [Ruelle and Takens, 1971] Ruelle, D. and Takens, F. (1971). On the nature of turbulence. Communications in Mathematical Physics, 20:167. [Scharinger, 1998] Scharinger, J. (1998). Fast encryption of image data using chaotic kolmogorov flows. Journal of Electronic Imaging, 7(2):318– 325. [Schuster, 1989] Schuster, H. G. (1989). Deterministic Chaos. VCH. [Setti et al., 2002] Setti, G. et al. (2002). Statistical modelling of discretetime chaotic processes - basic finite-dimensional tools and applications. Proceedings of the IEEE, 90(5):662–690. [Shannon, 1949] Shannon, C. E. (1949). Communication theory of secrecy systems. Bell System Technical Journal, 28(4):656–715. [Shinbrot et al., 1993] Shinbrot, T. et al. (1993). Using small perturbations to control chaos. Nature, 363:411–417. [Shorten et al., 2005] Shorten, R. et al. (2005).

Analysis and design of

AIMD congestion control algorithms in communication networks. Automatica, 41:725–730. 189

BIBLIOGRAPHY

[Shorten et al., 2003] Shorten, R., Leith, D., Foy, J., and Kilduff, R. (2003). Analysis and design of synchronised communication networks. In Proceedings of 12th Yale Workshop on Adaptive and Learning Systems. [Singh, 2001] Singh, S. (2001). Quantifying structural time varying changes in helical data. Neural Computing and Applications, 10(2):148–154. [Sinha and Ditto, 1998] Sinha, S. and Ditto, W. L. (1998). Dynamics based computation. Physical review Letters, 81(10):2156–2159. [Sinha and Ditto, 1999] Sinha, S. and Ditto, W. L. (1999). Computing with distributed chaos. Physical Review E, 60(1):363–377. [Skarda and Freeman, 1987] Skarda, C. A. and Freeman, W. J. (1987). How brains make chaos in order to make sense of the world. Behavioural and Brain Sciences, 10(2):161–195. [Sloane and Wyner, 1993] Sloane, N. J. A. and Wyner, A. D., editors (1993). Claude Elwood Shannon - Collected Papers. IEEE Press. [Stanojevic et al., 2005] Stanojevic, R. et al. (2005).

On the design of

AQM’s to achieve asymptotic fairness between competing tcp flows. submitted to IEEE Transactions on Networking. [Stewart, 1990] Stewart, I. (1990). Does God Play Dice? Penguin. [Strichartz, 2000] Strichartz, R. S. (2000). The Way of Analysis. Jones and Bartlett. [Tan and Ali, 1998] Tan, Z. and Ali, M. (1998). Pattern recognition in a neural network with chaos. Physical Review E, 58(3). [Tanenbaum, 2002] Tanenbaum, A. S. (2002). Prentice-Hall, 4th edition.

190

Computer Networks.

BIBLIOGRAPHY

[Teplinsky et al., 1999] Teplinsky, A., Feely, O., and Rogers, A. (1999). Phase-jitter dynamics of digital phase-locked loops. IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, 46(5):545– 558. [Theodoridis and Koutroumbas, 1999] Theodoridis, S. and Koutroumbas, K. (1999). Pattern Recognition. Academic Press. [Tsuda, 1992] Tsuda, I. (1992). Dynamic link of memory — chaotic memory map in nonequilibrium neural networks. Neural Networks, 5:313–326. [Ulam and von Neumann, 1947] Ulam, S. and von Neumann, J. (1947). On combinations of stochastic and deterministic processes. Bulletin of the American Mathematical Society, 53:1120. [Ulam, 1960] Ulam, S. M. (1960). A Collection of Mathematical Problems. Interscience. [Voyatzis and Pitas, 1996] Voyatzis, G. and Pitas, I. (1996). Applications of toral automorphisms in image watermarking. Proceedings of the International Conference on Image Processing, 2:237–240. [Voyatzis and Pitas, 1999] Voyatzis, G. and Pitas, I. (1999). The use of watermarks in the protection of digital multimedia products. Proceedings of the IEEE, 87(7):1197–1207. [Wang et al., 1999] Wang, G. et al. (1999). The application of chaotic oscillators to weak signal detection. IEEE Transactions on Industrial Electronics, 46(2):440–444. [Wang et al., 1994] Wang, H. O. et al. (1994). Bifurcations, chaos and crises in voltage collapse of a model power system. IEEE Transactions on Circuits and Systems–I:Fundamental Theory and Applications, 41:294–302.

191

BIBLIOGRAPHY

[Widrow and Hoff, 1960] Widrow, B. and Hoff, M. (1960). Adaptive switching circuits. In 1960 IRE WESCON Convention Record, volume 4, pages 96–104. IRE. [Wilson, 1971] Wilson, K. G. (1971). The renormalization group and critical phenomena i and ii. Physical Review B, 4:3174. [Wirth et al., 2005] Wirth, F. et al. (2005). Stochastic equilibria of aimd communication networks. accepted to SIAM Journal on Matrix Analysis and Applications.

192