March 28, 2006
10:4
00308
Fractals, Vol. 14, No. 2 (2006) 87–99 c World Scientific Publishing Company
CHAOS GAME CHARACTERIZATION OF TEMPORAL PRECIPITATION VARIABILITY: APPLICATION TO REGIONALIZATION ∗ A. GALVAN ´ ´ and A. S. COFINO ˜ J. M. GUTIERREZ, Departamento de Matem´ atica Aplicada, Universidad de Cantabria E.T.S.I. Caminos, Canales y Puertos Avda. de los Castros, s/n 39005 Santander, Spain ∗
[email protected]
C. PRIMO Instituto de F´ısica de Cantabria CSIC-Universidad de Cantabria 39005, Santander, Espa˜ na Received August 25, 2005 Accepted September 10, 2005
Abstract We present an application of the fractal “chaos game representation” method in climatology for characterizing temporal precipitation aggregation patterns. To this aim, we establish an analogy with linguistic analysis considering precipitation as a discrete variable (e.g. rain, no rain). Each weekly, or monthly, symbolic sequence of observed precipitation is then considered a “word” and the climatological time series observed at a particular gauge defines a “language.” The distribution of different words within the language characterizes the particular precipitation aggregation scheme. In this paper we show that the chaos game representation method provides a graphical representation (a fractal pattern, or fingerprint) of the distribution of words and also gives a quantitative characterization in terms of parameters such as the box-counting dimension and the entropy. We show that different climates exhibit characteristic patterns with different fractal exponents and entropies. As an illustrative application, the method is used for automatic regionalization of a set of gauges in the Iberian peninsula, showing that these new indices outperform standard averaged statistics (monthly means, etc.). Keywords: Iterated Function Systems; Precipitation Temporal Aggregation; Regionalization; Climate Classification. 87
March 28, 2006
88
10:4
00308
J. M. Guti´errez et al.
1. INTRODUCTION The identification of homogeneous regions according to their climate conditions is an important problem in climatology for many practical applications. This task is known as regionalization and it was classically achieved using a great amount of subjective judgment, including geographical convenience, etc. These works on climate classification were limited by the shortage of available observations. In the last few decades, the widespread number of huge climate databases has led to the development of automatic statistical regionalization techniques (see Oliver1 and references therein), which work with a representative number of stations (gauges or sites) within the area of study. The variables used in the analysis must provide sufficient information to enable proper climatological discrimination in the area. Thus, regionalization is performed by inferring groups of sites from data according to some homogeneity, or similarity, condition. Clustering methods are the most simple and popular automatic techniques for this task and have been widely applied in regionalization.2,3 Different combinations of variables have been used for characterizing the climatology of local stations for clustering regionalization applications. Some variables refer to geographical data (such as latitude, longitude, and elevation), and others to statistical data (such as monthly mean or extreme temperatures, precipitation accumulations, days with precipitation, moisture ratio, adjusted potential evapotranspiration, etc.). In particular, information related to precipitation is essential when dealing with hydrological, agricultural and ecological problems.4 In these cases, the particular temporal aggregation scheme of precipitation is of great importance. However, different rainfall aggregation schemes can lead to the same average values — rainfall can be intense and temporal localized, or it may rain more often with lower intensity. Therefore, most automatic climatic studies do not take into account this problem, since they use averages or accumulations as standard climatic variables to characterize precipitation. Other special variables such as “days with precipitation” provide partial information of this problem, but do not characterize the temporal evolution and aggregation of precipitation. In this paper we present an application of the “chaos game representation” method for
characterizing different rainfall aggregation schemes. This method was originally introduced by Jeffrey5 to analyze symbolic genomic sequences. However, it works with any symbolic sequence and, thus, it can be used in any application domain. To apply this method in climatology we first establish an analogy with linguistic analysis, considering precipitation as a discrete variable (e.g. rain, no rain). Thus, each weekly, or monthly, symbolic sequence of observed precipitation is considered as a “word” (in this case, a binary word) which defines a specific weekly rainfall pattern. Each gauge represented by a climatological time series of precipitation defines a “language” characterized by the observed words. Thus, the distribution of words within the language is in correspondence with the precipitation aggregation modes existing in the series. The distribution of words within symbolic languages have been recently analyzed using a fractal method based on Iterated Function Systems (IFS). This method represents graphically a symbolic language, in a way that complex languages with a high variability of words define complex fractal graphical patterns, or fingerprints.6 In this paper we show that the box-counting dimension and the entropy of the resulting patterns can be used as indices to characterize the complexity of the patterns, or equivalently, to identify different aggregation modes. We also give evidence of the climatological discrimination power of the new obtained parameters when applied to global data from a worldwide distributed network of stations. Furthermore, a comparison of the new indices and the standard average parameters is performed in a regionalization problem in the Iberian peninsula, using data from 54 stations. This paper is organized as follows. In Sec. 2 we describe the data used in this study. Section 3 describes the method used for representing symbolic sequences. First, a brief introduction of IFS is given in Sec. 3.1 and, then, the application to symbolic sequences is described in Sec. 3.2. In Sec. 4 we apply this technique to graphically and quantitatively represent precipitation time series corresponding to different climates. In Sec. 5 we illustrate an application of the proposed methodology finding homogeneous sets of stations among a set of stations distributed over a geographical area with different co-existing climates (regionalization). Finally, some conclusions and further remarks are given in Sec. 6.
March 28, 2006
10:4
00308
Chaos Game Characterization of Temporal Precipitation Variability
2. DATA For this work we use two different sources of data. On the one hand, we have compiled a global database of precipitation time series representative of different world climates and subclimates based on the K¨ oppen classification (see, e.g. Lutgens and
Fig. 1
89
Tarbuck,7 Chap. 15); these stations are represented in Fig. 1 and further details are given in Table 1. We extracted this information from the European Climate Assessment (www.knmi.nl/samenw/eca) and the Global Climate Observing System (GCOS, www.wmo.ch/web/gcos). The length of the series
Location of the stations used in this study.
Table 1 Stations classified according to K¨ oppen’s climates. A, humid tropical; B, dry; C, humid middle-latitude with mild winters; D, humid middle-latitude with severe winters; and E, polar. The length of available data (in years) and the longitude and latitude of the stations are also displayed (most of the information about subclimates of cities has been obtained from www.urbanclimate.net). Climates
Station (Country)
Years
Lat.
Lon.
Fractal Dim.
Entropy
A
Af Am Aw
Cairns (Australia) Surigao (Philippines) Darwin (Australia)
112 50 130
16.55S 9.45N 12.28S
145.46E 125.30E 130.49E
1.6096 1.6234 1.5906
0.9053 1.0655 0.8239
B
BWh BWk BSh BSk
Alice Springs (Australia) Muren (Mongolia) Carnavon (Australia) Astrahan (Russia)
120 37 93 120
23.42S 43.38N 24.53S 46.17N
133.53E 100.10E 113.40E 48.03E
1.4420 1.4367 1.4249 1.3969
0.4078 0.3998 0.3887 0.3284
C
Cfa Cfb Csa Csb
Milan (Italy) Paris (France) Madrid (Spain) Porto (Portugal)
141 101 55 60
45.27N 48.49N 40.24N 41.09N
9.17E 2.20E 3.40W 8.37W
1.5181 1.4384 1.4510 1.5553
0.7751 0.7180 0.6782 0.8020
D
Dfa Dfb Dfc Dfd Dwa Dwb Dwc Dwd
Des Moines (USA) Moscow (Rusia) Aleks-Sahalinskij (Russia) Kamenskoe (Russia) Inchon (Corea) Vladivostok(Russia) Magadan (Russia) Verkhojanks (Russia)
52 53 102 46 49 79 60 102
41.36N 55.50N 50.09N 62.05N 37.29N 43.01N 59.34N 67.06N
93.36W 37.37E 142.02E 166.02E 126.38E 131.08E 150.48E 133.04E
1.4895 1.4750 1.4474 1.4326 1.4899 1.4648 1.4580 1.4159
0.7638 0.7392 0.6826 0.6292 0.7227 0.6959 0.6469 0.4303
E
ET EF
Barrow (Alaska) Davis (Antarctica)
49 28
71.17N 67.36S
156.47W 62.53E
1.3992 1.3992
0.3833 0.3786
March 28, 2006
90
10:4
00308
J. M. Guti´errez et al.
44
40
Norte Duero Tajo Guadiana Guadalquivir Mediterraneo Ebro
36 -10
-5
0
5
Fig. 2 Network of 54 automatic stations in the Iberian peninsula corresponding to seven main hydrological basins: Norte, Duero, Tajo, Guadiana, Guadalquivir-Sur, Mediterraneo, and Ebro, with different climatologies.
ranges from 50 to more than 100 years, but some special stations with fewer data have been also considered in those regions with lower density of information. On the other hand, to illustrate the utility of the proposed method in the particular problem of regionalization, we have compiled a local database of 54 stations in the Iberian Peninsula containing observations of daily rainfall amount from 1970 to 2000, with no missing data. Figure 2 shows the location of these stations and the main hydrological basins which define the geographical hydrological regions within the Iberian peninsula.
3. LINGUISTIC ANALYSIS OF PRECIPITATION SEQUENCES In recent years, an increasing interest has been focused on the analysis of symbolic systems described by sequences of formal symbols from a given alphabet. These systems appear in a great variety of domains, from linguistics (formal languages), to biology (genomic sequences), and to nonlinear dynamics (chaotic orbits). In this paper we deal with symbolic sequences resulting from daily precipitation time series. To this aim precipitation is discretized, or quantized, into four different states {0, 1, 2, 3} according to four different intervals [0, 0.1), [0.1, 10), [10, 20), and [20, ∞), respectively, with special climatological meanings (the 0.1 threshold defines a “precipitation day”;8 thus, the symbol 0 means “no rain”). For instance, the sequence of daily rainfall amounts measured in
Santander (North basin) since 1979: 3.7, 0.1, 6.8, 13.0, 16.0, 28.1, 15.4, 0.1, 0, 0.2, 0, 0, 1.3, 18.3, . . . , leads to the symbolic sequence: 1 1 1 2 2 3 2 1 0 1 0 0 1 2 . . . , whereas the symbolic sequence corresponding to Almeria (GuadalquivirSur) for the same period is: 2 1 0 0 0 0 0 0 0 0 2 2 3 . . . . Note that both symbolic sequences exhibit different patterns, representative of different climates (or different precipitation modes). In order to perform a linguistic analysis of the symbolic series, the first step is defining the concept of word. We consider sequences of l consecutive letters within the time series, where l is a free parameter (l = 7 for weekly sequences). The observed words are obtained from the precipitation sequence by shifting progressively by 1 day a window of length l. Thus, a precipitation sequence of length N contains N − l + 1 words. In this paper we study weekly precipitation patterns, although the same analysis could be applied to any other temporal scale, such as monthly (l = 30), etc. Thus, a 50-year series contains approximately 18,000 words which define the different weekly aggregation patterns existing within the time series (or the aggregation scheme characteristic of the corresponding climate). Note that, since the alphabet has four letters, the number of different words is 4l = 47 = 16,384. The lack of statistical resolution due to overlaps for larger values of l is analyzed in Guti´errez et al.,9 which introduces a simple modification of the method to deal with these cases. Once the concept of word is introduced, the language defined by a precipitation time series is simply given by the set of words obtained following the above procedure. This symbolic language characterizes the weekly aggregation patterns of precipitation existing in the time series. Different linguistic analysis techniques have emerged to represent and characterize symbolic languages (note that standard statistical techniques are not appropriate for this problem). Among them, Jeffrey5 introduced a method for graphically representing and quantifying the “complexity” of a particular language using fractal IFSs.6 The so-called chaos game representation method provides a fractal pattern which represents a symbolic sequence (the sequence “fingerprint”); the complexity of the resulting patterns can also be quantified using parameters such as the fractal dimension or the entropy, as we will show later. The original method of Jeffrey was developed to work with genomic sequences, but the same ideas can be applied to any
March 28, 2006
10:4
00308
Chaos Game Characterization of Temporal Precipitation Variability
other symbolic language, including symbolic precipitation sequences, as shown below.
3.1. Iterated Function Systems (IFS) An IFS is a finite collection, f0 , . . . , fm , of affine contractive transformations (shrink distances) of the real plane. The contractivity property guarantees the existence of a unique compact set A, called the attractor of the IFS, which satisfies the equation of self-similarity: A=
m
fi (A).
(1)
i=0
This equation states that the attractor is formed by a number of copies of the same object, but at reduced scales: f0 (A), . . . , fm (A) (see Barnsley6 for a detailed introduction to this topic). Depending on the mappings considered, the resulting attractor may have a regular form, or a fractal structure. Regular patterns, such as a square, are obtained with appropriate non-overlapping transformations that fill the square. For instance, Fig. 3a, shows four copies of the unit square obtained by rescaling it by a factor 1/2 and translating the obtained reduced copy to four different corners of the original unit square. Note how the union of the four copies obtained with the corresponding transformations (indicated by numbers in the figure) fills the
91
unit square and, hence, it is the attractor of the IFS formed by the four transformations. IFS models provide a convenient framework for symbolic analysis, since if we apply the transformations twice to the attractor, we obtain a second set of non-overlapping sub-squares: f0 (f0 (A)), f1 (f0 (A)), etc. of size 2−2 which fill the unit square. Therefore, each of these reduced squares can be associated with the symbolic sequence given by the related transformations 00, 01, etc. (see Fig. 3b). If we repeat this process l times we find a correspondence between words {σ1 · · · σl , σi ∈ {0, 1, 2, 3}} and the subsquares of resolution 2−l (called the l-iterators of the attractor). In formal terms, if we use the discrete metric and consider the attractor at a given resolution scale, it is possible to establish a correspondence between the set of finite sequences, or “words” σ1 · · · σl (discrete coding space) and a partition of A formed by the l-iterators fσl (. . . (fσ1 (A)) . . .). Figures 3b and 3c illustrate this process. The chaos game algorithm is a method that was initially developed for efficiently rendering the attractor of an IFS by attaching nonzero probabilities, p0 , . . . , pm , to each of the mappings f0 , . . . , fm . Then, a random orbit generated in the unit square from any initial condition x0 by xn+1 = fσn (xn ), with P (σn = i) = pi , i = 1, . . . , N , fills the attractor of the IFS. The basic idea of this algorithm is that each l-iterator of the attractor has a positive probability pσ1 pσ2 · · · pσn of occurring in the orbit. This justifies why the orbit xn comes arbitrarily close to every point in the attractor.
3.2. Representation of Precipitation Sequences Using IFS
Fig. 3 (a) Transformations of the unit square IFS model. (b) and (c) iteration process to obtain the partition of the unit square into l-iterators (sub-squares of size 2−l ) corresponding to subsequences of length l = 2 and l = 2, . . . , 7, respectively (in this last case, for the sake of clarity, we only show the iterators on the bottom-right corner of the attractor). (d) and (e) Fractal pattern, or fingerprint, associated with weekly precipitation words in Santander and Almeria, respectively.
The above chaos game algorithm can be adapted to visually represent symbolic sequences by driving the orbit not stochastically, but using the symbols in the order appearing in the sequence. Thus, given a symbolic sequence σ1 , σ2 , . . . , σN , the dynamical system xn+1 = fσn (xn ), i = 1, . . . , N , defines an orbit on the attractor of the IFS for an initial condition x0 . This orbit characterizes the symbolic sequence in terms of a pattern, or fingerprint, over the attractor which is independent on the initial condition.6 Note that this orbit does not necessarily fill the whole attractor, but only those regions associated with the words appearing in the symbolic sequence. Thus, we can easily identify the lack of particular subsequences in the sequence by looking
March 28, 2006
92
10:4
00308
J. M. Guti´errez et al.
at the non-filled regions of the attractor (if a given l-iterator is not visited by the orbit, then it means that the corresponding word is not contained into the sequence). This modified method is referred to as “chaos game representation” method in the literature and was first introduced by Jeffrey.5 For example, Fig. 3d shows the fractal pattern associated with the symbolic sequence of Santander introduced above (1 1 1 2 2 3 2 1 0 1 0 0 1 2 . . .), covering the period 1979–2000. The unit square is shown at a grid of resolution 27 × 27 , associated with words of length 7 (a week) or, equivalently, subsquares of size 2−7 (7-iterators). The orbit, and the corresponding pattern, are obtained as follows: after applying seven symbols to an initial condition x0 , we obtain a point x7 located somewhere in the 7-iterator coded by 2322111; in the next step, after applying transformation σ8 = 1 we get a point in the 8-iterator 12322111, which, at the resolution of 2−7 is located somewhere at the 7-iterator coded by the truncated sequence 1232211; repeating this process, we get a sequence of points representative of the weekly words existing in the original symbolic series. The resulting orbit defines a frequency Fi , i = 1, . . . , 47 , over the different 7-iterators forming the attractor at this scale (the frequency is given by the number of times the orbit visits the iterator). Those iterators with frequency Fi = 0 form the pattern, or fingerprint, associated with the sequence. From Fig. 3d we can see how the attractor is not uniformly filled (as it would be using a random symbolic sequence with equiprobable symbols). Thus, the resulting pattern provides an intuitive graphical representation of the different weekly precipitation aggregation schemes in Santander. Figure 3e shows the fingerprint corresponding to Almeria which exhibits a sparse pattern, with a small number of characteristic aggregation schemes.
4. APPLICATION TO CLIMATE CATALOGS In this section we describe the application of the above IFS linguistic analysis tool to the precipitation sequences compiled from global observation catalogs described in Sec. 2. We consider the set of stations shown in Table 1 representative of different K¨ oppen climates: humid tropical (A), dry (B), humid middle-latitude with mild winters (C), humid middle-latitude with severe winters (D), and polar (E) (see Lutgens and Tarbuck7 for more
details). We expect that different climates, with different precipitation aggregation schemes, exhibit different patterns when applying the above chaos game fractal representation. Figure 4 shows the patterns obtained for a representative number of stations (each row shows the patterns for stations of the same climate, from A to E). As it can be visually seen in this figure, stations from the same subclimate (e.g. Majuro and Surigao) exhibit similar patterns. Moreover, some climates are quite homogeneous whereas others exhibit distinct patterns for different subclimates. On the one hand, climates A, B and E are quite homogeneous (all subclimates exhibit similar patterns). Tropical climates A exhibit uniform patterns (corresponding to a high variability of weekly precipitation modes), whereas polar climates E exhibit sparse patterns, indicating low precipitation modes variability. On the other hand, climates C and D exhibit more diverse patterns for different subclimates. For instance, Csa and Cfb patterns show a triangle-like pattern indicative that three out of the four symbols are dominating the climate forming repetitive sequences. On the other hand, Cfa and Csb subclimates show more homogeneous patterns, with higher variability. These similarities (and differences) between subclimates indicate similar (and different) temporal precipitation aggregation schemes and, thus, provide useful information for climatological applications. Climates A and E are quite extreme and present no similarities with other climates. However, different similarities are found between subclimates B, C and D, indicating similar aggregation behaviors. For instance, it can be visually established that subclimates Cfb and Dfc exhibit similar patterns. In order to quantify the similarity of different patterns, we computed the fractal dimension FD and the entropy E of the resulting patterns (see the labels over the patterns in Fig. 4). The fractal dimension is computed using the box-counting algorithm10 and measures the irregularity of the pattern, whereas the entropymeasures the disorder and is obtained as E = fi =0 fi log fi , where fi is the relative frequency associated with each of the 7-iterators of the attractor (the number of times the orbit visits the iterator, over the total length of the orbit). These parameters help us to quantify the similarity of different patterns. In the next section we apply this idea to a regionalization problem in a small geographical area (the Iberian peninsula). We show that the vector v = (FD , E) characterizes the
March 28, 2006
10:4
00308
Chaos Game Characterization of Temporal Precipitation Variability
93
Fig. 4 Fractal patterns corresponding to representative stations of different subclimates from A (top row) to E (bottom row); see Table 1 for details. The numbers above the patterns show the box-counting dimension F D and the entropy E.
different patterns, discriminating among the different climates and subclimates in a simple way.
5. APPLICATION TO REGIONALIZATION In this section we focus on a small geographical area (the Iberian peninsula) to illustrate the application of the linguistic analysis to perform
automatic regionalization. We consider 54 stations in the Iberian peninsula (see Fig. 2) corresponding to seven main hydrological basins. Figure 5 shows the different fractal patterns obtained for a set of 16 stations representing different climatic regions. From this figure we can see how the fingerprints vary from one region to other, being more similar those patterns corresponding to stations with similar climatology.
March 28, 2006
94
10:4
00308
J. M. Guti´errez et al. 44 SANTANDER VITORIA
(2)
(1)
43
SANTIAGO COMPOSTELA
(6)
42
FUENTERRABIA
(3)
(4) ’
’
HUESCA
(7)
BURGOS
BARCELONA
(8)
SALAMANCA
41
(5) (9)
MADRID
(12)
40 CACERES
CASTELLON
CIUDAD REAL
(10)
(11)
39 38
(16) (13)
HUELVA
GRANADA
(14)
37
(15)
MURCIA
ALMERIA
36 -10
-5
0
(1) S. COMPOSTELA
(2)
SANTANDER
(3)
(5)
SALAMANCA
(6)
BURGOS
(7)
(9)
MADRID
(13)
HUELVA
FUENTERRABIA
HUESCA
(10) CACERES
(11)
(14)
(15) ALMERIA
GRANADA
CIUDAD REAL
5 (4)
(8)
VITORIA
BARCELONA
(12)
CASTELLON
(16)
MURCIA
Fig. 5 Fractal patterns obtained for 16 different stations on the Iberian peninsula, covering the different climatic regions. The figures are organized keeping the relative distances between stations, from North to South, and from East to West.
First we briefly describe the standard automatic regionalization procedure used to group similar stations according to statistical parameters derived from the precipitation series.
5.1. Regional Cluster Analysis Cluster analysis has been widely applied in many fields for partitioning a set of data into a number
March 28, 2006
10:4
00308
Chaos Game Characterization of Temporal Precipitation Variability
of homogeneous groups, according to some similarity criterion. In climatology, these techniques have been successfully used for regionalization, especially the so-called SHAN (Sequential, Hierarchical, Agglomerative, and Non-overlapping) methods (see Kalkstein et al.11 for details). Several alternatives are possible, depending on the metric used to define similarity between clusters. On the one hand, the method known as “average linkage” defines the intercluster distance as the average distance between all possible pairs of elements in the two clusters being compared. It has been shown that this method tends to form clusters with similar variances. On the other hand, the “Ward linkage” merges those cluster pairs which minimize the dispersion of the resulting cluster. In this case, the square of the euclidian distance is considered as a dissimilarity measure. The Ward method has been found to be biased towards producing clusters with a relatively similar number of members, sacrificing cluster distinctiveness.11 Given a set of data {v1 , v2 , . . . , vN } we want to obtain a number of clusters Ci , i = 1, . . . , q, characterized by a representative prototype or centroid ci . In this paper we shall use the Ward linkage method to compare the discrimination power of different climatological variables. This method decomposes the total variance V into the variance within the current clusters Ci with centroids ci and weight, or mass, mi (at a given iteration step) and the variance between clusters: V =
q
2
mi ||ci − c|| +
i=1
q
mj ||vj − ci ||2 ,
(2)
i=1 j∈Ci
where c is the global centroid (mean of the data). If two clusters Ci and Cj , with mi and mj masses respectively, are joined into a single cluster, D, with mass mi + mj and centroid d=
mi ci + mj cj , mi + mj
(3)
then the variance Vij of Ci and Cj respect to D can be discomposed by the equation: Vij = mi ||ci − d||2 + mj ||cj − d||2 + m||d − c||2 .
(4)
The last term is the only term that remains constant if we change Ci and Cj for their mass center D.
95
Then, the reduction of the variance will be: ∆Vij = mi ||ci − d||2 + mj ||cj − d||2 .
(5)
Using (3), we have:
mi ci + mj cj 2 ∆Vij = mi ci − mi + mj 2 m c + m c i i j j + mj cj − mi + mj =
mi mj ||ci − cj ||2 . mi + mj
(6)
Then, the strategy followed by this method is joining, at each step, the clusters Ci and Cj which minimize ∆Vij (initially each point is considered as a single cluster). So we can consider ∆Vij as a dissimilarity measure. Note that the individuals with less weight are the first to be joined together. The clustering process can be graphically represented using a “dendrogram” (a tree representing at different levels the hierarchy of unions of individuals or clusters in different steps).
5.2. Regionalization in the Iberian Peninsula In order to compare the discrimination power of the parameters derived from the linguistic analysis with other standard precipitation-averaged parameters, we first consider seasonal precipitation averages PS and PW (for summer and winter seasons, respectively) and characterize each station by a vector v = (PS , PW ). The above variables (in conjunction with others) are commonly used as standard variables in regionalization applications. Figure 6 shows the clusters obtained applying the Ward algorithm to the network of stations on the Iberian peninsula shown in Fig. 2a, where each of the stations is represented by the corresponding vector v = (P S, P W ) — note that the purpose of this paper is not presenting an exhaustive cluster analysis of regionalization, but illustrating the application of a new climate parameter in this context. The algorithm is applied until a maximum of seven groups are formed. The reason for this stopping criterion is the seven main basins where the stations are taken from (in Fig. 2b, each of these basins is represented with a different symbol). The symbols in Fig. 6 correspond to the different clusters obtained, and the dendrogram shows the agglomerative process. The list of the stations corresponding to a cluster is given below the corresponding symbol in the dendrogram; the numbers
March 28, 2006
96
10:4
00308
J. M. Guti´errez et al.
44 42 40 38 36 10
1 LA CORUÑA 1 ROZAS 1 PARAYAS 1 SANTANDER 1 SONDICA
1 GIJON 1 OVIEDO 1 ORENSE
5
1 SAN SEBASTIÁN 1 FUENTERRABÍA
0
1 SANTIAGO 1 VIGO
2 NAVACERRADA
5 SEVILLA_A 5 SEVILLA_B 5 JEREZ 5 TARIFA 5 MÁLAGA
5
2 LEÓN 2 BURGOS 2 SORIA 2 SEGOVIA 2 VALLADOLID_A 2 VALLADOLID_B
7 VITORIA 7 PAMPLONA
4 BADAJOZ 4 HINOJOSA 1 PONFERRADA 4 CIUDAD REAL 3 CÁCERES 4 HUELVA
6 MURCIA_A 6 MURCIA_B 6 CARTAGENA 6 MURCIA_C 6 ALICANTE_A 6 ALICANTE_B 7 LOGROÑO 7 DAROCA 7 ZARAGOZA
6 VALENCIA_A 6 VALENCIA_B 6 CASTELLÓN
2 SALAMANCA 2 ÁVILA 2 ZAMORA
7 TORTOSA 7 HUESCA
3 MADRID 3 TOLEDO
3 MADRID
5 ALMERÍA
5 GRANADA
Fig. 6 Cluster analysis by Ward linkage of the 54 stations in the Iberian peninsula characterized by Winter and Summer averaged precipitation. The dendrogram represents the different groups at a certain depth level, in the horizontal axis, and the distances where the individuals are joined in the different hierarchical levels, in the vertical axis.
preceding the names indicate the original basins corresponding to the stations (Norte, Duero, Tajo, Guadiana, Guadalquivir-Sur, Mediterraneo, Ebro). From this figure we can see how the North basin is clearly separated from the rest of the
basins (this is the main climatological division in the Iberian peninsula, which corresponds with “humid oceanic” climate according to K¨ oppen classification). The only exception is “Navacerrada” which is located on a mountain and
March 28, 2006
10:4
00308
Chaos Game Characterization of Temporal Precipitation Variability
presents similar climatological conditions (at least in an averaged sense) to the stations in the North basin. Moreover, four different groups are formed within the North basin, some corresponding to nearby stations, and others with no clear meaning. On the other hand, the cluster labelled “” corresponds to the “semiarid” climatic region. Finally, the remaining two clusters correspond to the “subtropical dry summer” region. In this case, the cluster labelled by “+” focus mainly on the Guadalqivir-Sur basin (with some exceptions), and
97
the cluster labeled “” is spread out over the Iberian peninsula. Now we characterize each station by a vector v = (FD , E) derived from the fractal pattern obtained from the linguistic analysis, and repeat the same regionalization process, comparing the obtained results. To this aim, the box-counting dimensions FD and the entropy E were computed for each of the stations. Figure 7 shows the seven clusters and the dendrogram obtained in this case. As in the standard case (Fig. 6), a group of clusters
44 42 40 38 36 10
2 LEÓN 2 BURGOS 2 SORIA 2 SEGOVIA 7 LOGROÑO 7 DAROCA 1 PONFERRADA
2 ZAMORA 2 VALLADOLID_A 2 VALLADOLID_B 2 SALAMANCA 2 AVILA 7 HUESCA
5
3 MADRID 3 CÁCERES 3 TOLEDO
0
6 VALENCIA_A 6 VALENCIA_B 6 CASTELLÓN 6 ALICANTE
4 BADAJOZ 5 SEVILLA_A 4 HINOJOSA 4 CIUDAD REAL 5 SEVILLA_B 5 JEREZ 5 MÁLAGA 5 GRANADA 5 TARIFA 3 MADRID 7 ZARAGOZA 4 HUELVA 7 TORTOSA
6 MURCIA_A 6 MURCIA_B 6 CARTAGENA 6 MURCIA_C 6 ALICANTE 5 ALMERIA
5
1 ROZAS 1 LA CORUÑA 1 ORENSE 1 GIJÓN 1 VITORIA 1 OVIEDO 1 PARAYAS 2 NAVACER. 1 SANTANDER 1 SONDICA 7 PAMPLONA 1 SANTIAGO 1 VIGO 1 SAN SEBASTIÁN 1 FUENTERRABÍA
Fig. 7 Cluster analysis by Ward linkage of the 54 stations using the fractal dimension and the entropy of the corresponding fractal patterns to characterize each station.
March 28, 2006
98
10:4
00308
J. M. Guti´errez et al.
associated with the North basin is obtained; however, in this case all the stations in the North are grouped into two clusters, one with the stations by the sea, and the other with the stations in the continental area (including “Navacerrada”, and “Pamplona”). Proceeding in order of importance according to the dendrogram, the clusters “◦” and “×” split up the Duero basin into two different groups (“◦” joins those stations more similar to the North humid oceanic basin, whereas “×” is more related to the continental subtropical dry summer). Moreover, cluster “” corresponds to the Southern Mediterraneo area, cluster “+” is related with Guadalquivir and Northern Mediterraneo, and cluster “” is associated with Tajo, Guadiana, and Mediterranean Ebro. As a consequence of this analysis, we can conclude that the linguistic scaling parameters introduced in this paper provide additional information for characterizing the climatology of stations which leads to a better automatic regionalization.
6. CONCLUSIONS AND FUTURE WORK The main conclusion of this work is that scaling linguistic exponents obtained from symbolic rainfall sequences provide useful climatological information. This information is not related with temporal averages or accumulations (as most standard climatological parameters related to precipitation), but with patterns of temporal variability (aggregation). The resulting exponents can be used (alone or in combination with standard parameters) in a variety of practical application. As a simple illustrative application of this technique we present a regionalization problem in the Iberian peninsula. When applying a clustering algorithm to automatically obtain groups of stations, those with similar climatology were grouped together forming homogeneous groups similar to some climatological regions known in the Iberian peninsula. This work introduces the methods and illustrate them using a simple example; operative applications in several fields are still needed to reveal the utility of this method. However, this is beyond the scope of this paper. In this paper fractals are used as a tool for mapping rainfall sequences into irregular fingerprintlike patterns. The box-counting dimension and the entropy of the resulting patterns are used to
characterize their variability. However, the pattern obtained by the chaos game algorithm can be also viewed as a multifractal measure and, then, multifractal techniques can be applied to further characterize rainfall variability and aggregation.9 Some preliminary work in this direction is encouraging but further analysis is required. Other nonlinear power-law techniques have been recently applied to this field, such as detrended fluctuation analysis,12 obtaining also encouraging results. Thus, the analysis presented in this paper is part of the general framework of applications of nonlinear and fractal techniques in climatology. Finally we want to remark that fractals and multifractals are used in this paper as “projection” tools to characterize symbolic sequences. The resulting patterns are not directly related to the fractal character of precipitation fields described in the literature.13
ACKNOWLEDGMENTS The authors are grateful to the Instituto Nacional de Meteorolog´ıa (INM) for providing us with partial support and the necessary data for this work. The authors are also grateful to the Comisi´ on Interministerial de Ciencia y Tecnolog´ıa (CICYT, CGL200402652 grant) for partial support of this work.
REFERENCES 1. J. Oliver, The history, status and future of climatic classification, Phys. Geogr. 12(3) (1991) 231–251. 2. G. Robert, R. G. Fovell and M.-Y. C. Fovell, Climate zones of the conterminous United States defined using cluster analysis, J. Clim. 6 (1993) 2103–2135. 3. A. T. DeGaetano, Delineation of mesoscale climate zones in the Northeastern United States using a novel approach to cluster analysis, J. Clim. 9 (1996) 1765–1782. 4. O. A. Lucero and D. Rozas, Characteristics of aggregation of daily rainfall in a middle-latitudes region during a climate variability in annual rainfall amount, Atmos. Res. 61 (2002) 35–48. 5. H. J. Jeffrey, Chaos game representation of gene structure, Nucleic Acids Res. 18(8) (1990) 2163– 2175. 6. M. F. Barnsley, Fractals Everywhere, 2nd edn. (Academic Press, New York, 1990). 7. F. K. Lutgens and E. J. Tarbuck, The Atmosphere: An Introduction to Meteorology (Prentice Hall, New Jersey, 2001).
March 28, 2006
10:4
00308
Chaos Game Characterization of Temporal Precipitation Variability
8. WMO, Guide to Climatological Practices, 2nd edn. (WMO, Geneva, 1983). 9. J. M. Guti´errez, M. A. Rodr´ıguez and G. Abramson, Multifractal analysis of DNA sequences using a novel chaos-game representation, Physica A 300 (2001) 271–284. 10. K. Falconer, Fractal Geometry (John Wiley and Sons, Chichester, 1993). 11. L. S. Kalkstein, G. Tan and J. A. Skindlov, An evaluation of three clustering procedures for use in
99
synoptic climatological classification, J. Clim. Appl. Meteorol. 26 (1987) 717–730. 12. M. L. Kurnaz, Application of detrended fluctuation analysis to monthly average of the maximum daily temperatures to resolve different climates, Fractals 12(4) (2004) 365–373. 13. R. Deidda, Rainfall downscaling in a space-time multifractal framework, Water Resour. Res. 36(7) (2000) 1779–1794.