Directional Correlation Coefficient for Channeled Flow ... - AMS Journals

1 downloads 0 Views 230KB Size Report
FEBRUARY 1998. 89. KAUFMANN AND WEBER ... PIRMIN KAUFMANN AND RUDOLF O. WEBER ..... paper, giving values of |chf| very close to its maximum.
FEBRUARY 1998

KAUFMANN AND WEBER

89

Directional Correlation Coefficient for Channeled Flow and Application to Wind Data over Complex Terrain PIRMIN KAUFMANN

AND

RUDOLF O. WEBER

Paul Scherrer Institute, Villigen, Switzerland (Manuscript received 23 April 1996, in final form 26 May 1997) ABSTRACT Analysis of vector quantities or directional data, such as the variables characterizing flow, is of significant interest to geophysical fluid dynamicists. For flows with strong channeling, a new simple correlation coefficient is defined. It is demonstrated by application to a model of channeled flow that the new correlation captures the flow features in the case of channeling better than other correlations taken from the literature. The new correlation coefficient is applied to wind data from a mesoscale network of anemometers in complex terrain. A cluster analysis based on the correlation matrix is used to group observation sites into classes with similar behavior of the channeled flow. Sites within the same class are not necessarily geographically close. A similar behavior of the wind directions indicated by these classes seems to be more closely related to the orographic features and to the altitude of the sites than to the horizontal distance between them.

1. Introduction Meteorologists and oceanographers have dealt with the problem of correlating vector quantities for at least 80 years (Dietzius 1916; Sverdrup 1917; Breckling 1989; Hanson et al. 1992), and statisticians also tackled the problem (Fisher 1993). A vector quantity requires both magnitude and direction for its unique characterization. When the vector is represented by its components in a coordinate system like the Cartesian system or spherical coordinates, correlation coefficients can be defined using these components. Many definitions of a single scalar value describing the correlation of vector quantities have appeared in the literature (see the reviews in Breckling 1989; Hanson et al. 1992; Crosby et al. 1993). When the magnitude of the vector is ignored and its direction alone is studied (which is equivalent to considering vectors of unit length), problems arise because the direction is a circular variable (Mardia 1972; Essenwanger 1986; Fisher 1993). Several definitions of correlation coefficients for circular variables have been published (a review is given in Hanson et al. 1992). In the present paper, the highly channeled near-surface flow of the atmosphere in a region with many valleys is the focus of attention. The determination of a correlation coefficient between the wind measurements at different station locations allows us to compare the flow in the different valleys. The examination of these

correlations enables one to see which valleys or areas experience the same forcing of the wind. A correlation coefficient for wind direction of such channeled flows should have some special properties. When winds in two different valleys are directed down-valley, the correlation between the winds in the valleys should be high. When the flow in one valley is in the down-valley direction and in the up-valley direction in another valley, the correlation between the winds of the two valleys should be negative. If no simultaneous channeling in the two valleys is observed, the correlation should become close to zero. A simple correlation coefficient satisfying these requirements is proposed here. In section 2, several definitions of directional and vector correlation coefficients are reviewed. Section 3 introduces a new definition of correlation for channeled flows, which makes use of the specific properties of channeled flows. The different correlation coefficients discussed in sections 2 and 3 are compared in section 4 by application for an idealized situation of two wind vectors showing pronounced channeling. In section 5, the new correlation coefficient for channeled flows is applied to wind observations from a mesoscale field experiment with the objective being to identify groups of measurement sites with similar behavior of wind directions. 2. Review of some directional and vector correlation coefficients

Corresponding author address: Dr. Rudolf O. Weber, Paul Scherrer Institute, CH-5232 Villigen PSI, Switzerland. E-mail: [email protected]

q 1998 American Meteorological Society

In the analysis of channeled wind, as discussed in section 5, we are only interested in wind direction but not in wind speed or the magnitude of the wind vector.

90

JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY

Therefore, both directional and vector correlation coefficients (applied to vectors of unit length) are suitable in our case. Among the variety of proposed directional and vector correlation coefficients (Hanson et al. 1992 list 17 definitions) we chose four different definitions (Sverdrup 1917; Fisher and Lee 1983; Breckling 1989; Crosby et al. 1993). Let W1 5 (u1 , y 1 ) and W 2 5 (u 2 , y 2 ) be two, two-dimensional vectors representing the horizontal wind vector at two measurement sites. The covariance s (u1 , u1 ) and the cross-covariance s (u1 , u 2 ) are defined in the standard way as

s (x, y) 5 E[xy] 2 E[x]E[y],

(1)

where E[x] is the expectation value of the random variable x. To simplify the equations later, the following covariance matrices are introduced, according to the notation of Crosby et al. (1993).

1

2

s (u1 , u1 ) s (u1 , y 1 ) S11 5 ; s (y 1 , u1 ) s (y 1 , y 1 ) S12 5

1s (y , u ) s (u1 , u2 ) 1

2

2

s (u1 , y 2 ) , etc. s (y 1 , y 2 )

(2)

In the same way the product matrices P11, etc, are defined as P11 5

1

2

p (u1 , u1 ) p (u1 , y 1 ) , etc., p (y 1 , u1 ) p (y 1 , y 1 )

(3)

where the uncentered product moments p (x, y) of two random variables x and y are defined by

p (x, y) 5 E[xy].

(4)

One of the oldest definitions of a vector correlation was given by Sverdrup (1917). He defined a correlation by

rS 5

5

6

[tr(P12 )] 2 1 [p (u1 , y 2 ) 2 p (y 1 , u2 )] 2 tr(P11 )tr(P22 )

1/2

,

(5)

where tr(A) denotes the trace of matrix A. Sverdrup (1917) stresses that the uncentered product moments p (x, y) in (4) must be taken and not the centered covariances s (x, y) in (1). Fisher and Lee (1983) developed a directional correlation coefficient ranging from 21 to 1. A representation of their correlation coefficient in terms of the matrices defined above is given in Fisher and Lee (1986) and Breckling (1989):

rFL 5

det(P12 ) , [det(P11 )det(P22 )]1/2

(6)

where det(A) denotes the determinant of matrix A. Breckling (1989) proposed the following correlation coefficient:

rB 5

tr ([S12 S21]1/2) . [ tr (S11) tr (S22)]1/2

(7)

VOLUME 15

This correlation coefficient takes values from 0 to 1. Hanson et al. (1992) define a variant of this vector correlation coefficient with a sign factor, det(S12 )/|det(S12 )|. If the products p (x, y) in (4) in Sverdrup’s definition (5) are replaced by the centered covariances s (x, y) in (1), the expression (7) of Breckling (1989) is recovered. Crosby et al. (1993) discuss a vector correlation coefficient defined by

rCBG 5 {tr([S11 ]21S12 [S 22 ]21S 21 )}1/2 ,

(8)

which is essentially the definition given by Hooper (1959) and which was further developed by Jupp and Mardia (1980). The squared correlation coefficient 2 rCBG is the sum of squares of the canonical correlations (Crosby et al. 1993) and ranges from 0 to 2. We use in the following a normalized form,

r C 5 rCBG /Ï2,

(9)

which takes values from 0 to 1. Breaker et al. (1994) studied significance tests of this vector correlation. Hanson et al. (1992) summarize and discuss in detail the invariance properties of these and many other correlation coefficients. 3. Directional correlation coefficient for channeled flow All of the definitions in the last section apply to any type of flow and do not make use of any specific properties of the flow. In contrast, the correlation for channeled flow as defined in this section incorporates the properties of channeled flow into the definition of the correlation coefficient. This specific correlation coefficient becomes better suited for the channeled flows but is not applicable to other types of flow. The near-surface winds over complex terrain are often channeled by valleys, even showing countercurrents to the geostrophic flow (Wippermann and Gross 1981; Wippermann 1984; Whiteman and Doran 1993). In smaller valleys, often thermally induced flows develop (for a review see Whiteman 1990). To compare winddirection data from different valleys, it is desirable to have a correlation coefficient that indicates whether upor down-valley flow prevails in both valleys. Figure 1 shows the wind rose of a station in the Rhein Valley east of Basel (station E1, see section 5 for more details). Two preferred directions (southeasterly and westerly winds) are evident. This is a good example of a distribution of directions in a flow that is usually referred to as channeled. We term one of them the main wind direction (southeasterly in this case) and the other the secondary wind direction (westerly in this case). It should be noted that these two dominant wind directions are, however, not 1808 apart. For the definition of a correlation coefficient, each wind direction is assigned to the main or the secondary wind direction, or to the class of all other wind directions—thus, to one of three classes. All wind directions of the main class are as-

FEBRUARY 1998

KAUFMANN AND WEBER

91

FIG. 2. Wind roses of idealized channeled flows as described in text. The left chart represents the angular distribution of the angle w1 of the independent flow. The right chart shows the distribution of the coupled wind direction w 2 .

FIG. 1. Wind rose of station E1 (see Table 1 and Fig. 9). The length of each 108 sector is proportional to its frequency of occurrence. The circle indicates the 5% level. A uniform angular distribution would have constant lengths of 2.8% for all 36 sectors.

signed the value 11, all wind directions of the secondary class are assigned the value 21, and all other wind directions obtain the value 0. This assignment of values allows the calculation of a standard Pearson cross-correlation coefficient between wind directions at different stations (denoted by i and j) by

rchf ij 5

(E[a i a j ] 2 E[a i ]E[a j ]) , [(E[a ] 2 E[a i ] 2 )(E[a 2j ] 2 E[a j ] 2 )]1/2 2 i

(10)

where a i denotes the values 21, 0, 11 assigned to the wind directions at station i. The choice of the three values 21, 0, 11 seems somewhat arbitrary. However, two of the three values can be chosen arbitrarily without changing the value of the correlation coefficient (10), as in the definition of rchf the variables are standardized. One of the three values must be fixed, and its value influences the resulting value of the correlation coefficient. We used a variable quantity C instead of the fixed value 0 for the wind directions not belonging to one of the two dominant classes. The parameter C can then be varied to make |rchf| a maximum. We did this maximization with the data described in section 5 and obtained values of C close to zero (ø0.02). Therefore, C 5 0 was chosen throughout this paper, giving values of |rchf| very close to its maximum. As the three wind direction classes are nominal variables, Pearson’s contingency coefficient (Sachs 1982) could be used to describe dependencies between them. However, it has, for our intended application, two drawbacks. First, it has no sign, thus, it cannot be distinguished whether two stations have simultaneously upvalley winds or one has up-valley and the other downvalley winds. Second, as the contingency coefficient is based on a x 2 test of the contingency table (Sachs 1982), it gives all classes the same weight, whereas with our correlation measure the two preferred directions have more weight than the other directions. However, section

5 shows that a cluster analysis based on the correlation matrix gives essentially the same results for our correlation and for the contingency coefficient. As the correlation coefficient (10) is defined as a standard product-moment correlation, although with discretized variables, Fisher’s z-transform (Stuart and Ord 1987) can be used to get an estimate of significance levels. The correlations are calculated from a sample of size N. Let n11 be the observed frequency of simultaneous occurrence of the main wind direction at both stations, n 22 the frequency of secondary direction at both stations, etc. The 3 3 3 values n kl are distributed following a multinomial distribution with nine classes. For large N (section 5 shows that for our dataset N . 3800) the multinomial distribution is well approximated by a normal distribution (Johnson and Kotz 1969). Hence, for large N a standard test of the correlation coefficient can be used. 4. Test of correlation coefficients with a model for channeled flow An idealized situation of flow with strong channeling at two different locations is considered. The two wind roses are shown in Fig. 2. The wind direction w1 at the first location has a probability of 42% to be in the main direction (758, 1058) and of 26% to be in the secondary direction (2558, 2858). These probabilities are similar to the observed ones of station E1, whose wind rose is shown in Fig. 1. The wind direction w 2 at the second site is determined from the wind direction w1 at the first site by compressing the wind directions with a northern component (w , 908 or w1 $ 2708) and stretching the wind directions with a southern component (908 # w1 , 2708). The angle g (Fig. 2) indicates how much the main and secondary directions at the second site deviate from 1808. For g 5 0, the transformation is just the identity transformation, and the directions w1 and w 2 are perfectly correlated. If g ± 0, the angles are still correlated in the sense that they simultaneously occur in the main direction class, in the secondary direction class, or in the remaining group. Figure 3 shows the different correlation coefficients discussed in sections 2 and 3 as a function of the angle g. The new correlation coefficient rchf (10) for channeled flow is the only one among the five coefficients that gives a value of 1, independent of the angle g. The other four correlation coefficients, which do not use any specific properties of the flow and may be used for all types

92

JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY

FIG. 3. Correlation coefficient as a function of the angle g, where 1808 2 g is the angle between the main and secondary direction for the model described in section 4. Five different correlations are shown: (6) of Fisher and Lee (1986), (7) of Breckling (1989), (5) of Sverdrup (1917), (8) of Crosby et al. (1993), and the correlation (10) for channeled flow (Coeff-chf in the graph).

of flow, decrease with increasing g. The new coefficient (10), which is only applicable to channeled flow, indicates a perfect correlation even when the main direction and the secondary direction of the wind direction distribution are not 1808 apart. This is a desirable property for the analysis of channeled flow, as the class of the direction (for example up- or down-valley) is more important than the direction per se. The coefficient meets our requirement that the correlation be high if both observations are in the same class of directions. We conclude that the new coefficient (10) is best suited for our purpose of characterizing channeled flow. If noise (white noise with a triangular amplitude distribution was used) is added to the wind direction w 2 , the two angles become decorrelated to some extent, and a properly defined correlation coefficient should become smaller with increasing noise intensity. In fact, all five correlation coefficients discussed before get smaller with increasing values of noise intensity. Our correlation rchf decays fastest for small and moderate values of noise intensity and thus shows more accurately the presence of an uncorrelated part of the wind directions. 5. Application to wind data from the MISTRAL area and use for a cluster analysis of sites In 1991–92 the field experiment ‘‘Modell fu¨r Immissions-Schutz bei Transport und Ausbreitung von Luftfremdstoffen’’—model for impact prevention during transport and diffusion of air pollutants (MISTRAL)— took place in a region of about 55 km 3 55 km around Basel, Switzerland (Kamber and Kaufmann 1992). The MISTRAL experiment was part of an international climatological project called ‘‘Regio-Klima-Projekt’’— Regio climate project (REKLIP), which takes place in the upper Rhein Valley (Parlow 1992, 1996). The MISTRAL area, shown in Fig. 4, has quite complex topog-

VOLUME 15

raphy. The Rhein flows from the east through Basel, where it turns sharply northward forming the wide upper Rhein Valley bordered by the Scharzwald (Black Forest) to the east and the Vosges to the west (not shown in Fig. 4). To the south the Jura Mountains separate the Rhein Valley from the Swiss Middleland. Several smaller tributary rivers run through the mountain ranges. In the MISTRAL area, 50 meteorological stations (triangles in Fig. 4) were operated with the goal of measuring in detail the near-surface winds and finding the typical flow patterns of that region with its complex topography. By means of an automated classification method, a small number of 12 typical regional flow patterns could be identified among the 8784 1-h mean wind fields of a 1-yr period (Weber and Kaufmann 1995; Kaufmann and Weber 1996; Kaufmann 1996). For climatological studies of the flow patterns in this region and for applications such as air-pollution control or on-line emergency response planning, it would be desirable to identify the flow patterns from a few stations only, instead of being forced to operate and analyze all 50 stations. We try, therefore, to identify groups of stations with similar behavior of wind direction. Provided such groups exist, it may then be sufficient to select only one station per group and still get all necessary information to identify the regional flow patterns. A list of the MISTRAL stations is given in Table 1 [more detailed information about the stations can be found in Kaufmann (1996)]. The first column gives the station label used in Fig. 7. The second column of Table 1 indicates the orographic situations in which the stations are located. We distinguish among stations in a valley (V), stations on a valley slope (S), stations in hilly terrain without pronounced orographic features (H), stations on a pass (P), and stations on isolated mountain tops (M). As can be seen from the sensor heights in Table 1 (fourth column), the anemometers were placed at nonstandard heights, ranging from 6 to 15 m for masts on open space. Stations were also mounted on buildings with sensor heights up to 70 m above ground. Station C (operated by the Swiss Meteorological Institute) is even located on a telecommunication tower 262 m above ground. The classification of the 50 stations into groups is done by a cluster analysis (Anderberg 1973). In a similar way, climatic regions were identified by use of cluster analysis (e.g., Stooksbury and Michaels 1991; Jackson and Weinland 1995). Based on our experience with the classification of wind fields (Weber and Kaufmann 1995; Kaufmann and Weber 1996) we used a hierarchical cluster analysis, the complete linkage method (Anderberg 1973). All hierarchical clustering methods need a measure describing the dissimilarity (or distance) between the objects to be grouped. The complete linkage method is invariant under monotonous transformation of the distance (Jain and Dubes 1988), whereas other clustering methods are sensitive to details of the distance

FEBRUARY 1998

KAUFMANN AND WEBER

93

FIG. 4. The observation area of the MISTRAL project (55 km 3 55 km) around the city of Basel showing the 50 measurement sites (black and white triangles). Contour lines and shading give the height above sea level. White areas are below 300 m MSL, light shaded areas are 300– 500 m MSL, medium shaded areas are 500–700 m MSL, dark shaded areas are 700–900 m MSL, and black areas are higher than 900 m MSL. Station locations higher than 700 m MSL are marked with a white triangle. The station labeled by a C is St. Chrischona with its anemometer placed on a tower 262 m above ground.

definition. We define a dissimilarity, or distance, measure between two stations i and j by chf dchf ij 5 1 2 |r ij |,

(11)

where rchf ij is the correlation coefficient (10) for channeled flow between the wind directions at stations i and j. Because the sign of rchf depends on the arbitrary definition of main and secondary wind direction, we use the absolute value of the correlation coefficient in the distance definition. The definition of the main and secondary wind directions, which are necessary for the calculation of rchf ij , is to a certain extend subjective. Figure 5 shows the wind roses of 6 stations for the selected 1-yr period from 1 September 1991 to 31 August 1992. The main direction is indicated by dark shading, the secondary direction by light shading. Most (34) of the 50 stations have a distinct bimodal distribution of wind direction, indicating a strong channeling of the flow. For some stations, the two preferred directions are about 1808

apart (e.g., station V4 in Fig. 5). For many others like S9 (see Fig. 5) the angle between the preferred directions is less than 1808, and for station B7 (Fig. 5), it is only about 1108 Some stations (B1, B2, B4, N1, N2, N3, S4, S5, S6, S7, W1, W6, and W7) have a trimodal distribution of wind direction (B2 and N2 are shown in Fig. 5). In these cases, the third mode is caused by a small tributary valley whose outflow to the main valley gives the third peak in wind direction. Therefore, the wind directions along the main valley axis are taken as main and secondary directions. A few stations have a quite uniform distribution of wind direction (S3 and V5, see station S3 in Fig. 5) or only one evident mode (N7), which makes the assignment of the preferred directions subjective. However, since only three stations are involved, this should not have great influence on the results of the cluster analysis. With the distance measure (11) the matrix of all distances between pairs of stations can be calculated. To see whether there are pairwise correlations significantly

94

JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY

VOLUME 15

TABLE 1. Location and altitude of the 50 anemometers used in the MISTRAL field experiment. The labels are the ones shown in Fig. 7. The type indicates whether the station is in a valley (V), on the slope of a valley (S), in unspecific hilly terrain (H), on a pass (P), or on a mountain top (M). The third and fourth columns give the altitude of the station above mean sea level (MSL) and the sensor height above ground (AGL). The cluster number in the last column refers to the nine cluster solution described in section 5. Label

Type

Altitude (m MSL)

Sensor (m AGL)

Cluster number

B1 B2 B3 B6 B7 B8 B9 E1 E3 N1 N2 N3 N5 N7 V2 V3 V4 V8 W6 W8 B5 S1 V1 V5 V6 V7 V9 B4 E2 E4 E5 E6 N4 N6 S3 S4 S6 W1 W2 W3 W4 W7 S8 S9 C N8 S2 S5 S7 W5

V V V V V V V V V V V V V V V V V V V V S S S S S S S H H H H H H H H H H H H H H H P P M M M M M M

266 257 270 258 273 282 289 350 297 237 230 294 280 335 293 310 335 422 350 450 350 370 355 470 480 490 475 293 568 557 587 623 460 480 494 598 620 395 440 440 320 300 750 870 490 802 712 1175 1001 765

68 64 34 54 43 25 53 13 53 28 30 53 69 13 43 50 48 21 25 14 24 13 13 13 6 7 5 29 13 13 12 13 10 10 15 13 13 13 15 14 54 26 10 13 262 10 10 13 35 50

4 4 1 1 5 1 1 1 1 4 4 3 1 8 3 3 3 3 6 3 1 1 5 9 2 3 8 6 2 2 2 2 1 1 3 1 2 2 2 5 1 5 7 7 2 2 2 2 2 4

different from zero, Fisher’s z transform was used. At least N 5 3800 pairs of valid data enter the calculation of the correlations’ coefficients. By means of the z transform (Stuart and Ord 1987) it can be estimated that all correlations with |r| . 0.04 are significantly (at the 1% level) different from zero for this N. As the hourly means

FIG. 5. Wind roses of six MISTRAL stations for the period from 1 September 1991 through 31 August 1992. The circles indicate the 5% level. The main direction has dark shading, the secondary direction has light shading. The wind roses of V4, S9, and B7 are three of 34 bimodal distributions; B7 has the smallest angle between the preferred directions. The wind roses N2 and B2 are 2 of 13 trimodal wind roses; B2 has the largest third modus relative to the second. Site S3 is one of the two sites with nearly uniform wind roses.

are autocorrelated in time, the effective degrees of freedom (Bayley and Hammersley 1946) are reduced. Still, even if a reduction by a factor of 10 takes place, at least 380 effective degrees of freedom remain in the worst case, and all correlations with |r| . 0.13 are significantly (at the 1% level) different from zero. For each station the maximum correlation (in absolute value) to any of the other stations was searched. Considering only the maximum correlation corresponds to a multiple testing of the data (Sneyers 1990). Maintaining the same overall significance level of 0.01, the binomial test described in Sneyers (1990) gives a corrected significance level of 0.0002 for the maximum correlation. The critical value of the correlation is increased for N 5 3800 to |r| . 0.06 and for N 5 380 to |r| . 0.19. The observed maxima range from 0.28 to 0.84 and are all significantly different from zero. The 50-by-50 correlation matrix is then used for the cluster analysis. The hierarchical cluster analysis successively merges stations to groups (or clusters) until only a single cluster remains that consists of all stations. The number of clusters to be retained can be inferred from a plot of the distance at which clusters are merged versus the number of clusters (Fig. 6). Moving from 50 clusters (each station forming a group of its own) to smaller numbers of clusters, the distance increases, showing abrupt changes at several places. Strong increases can be seen for 30, 27, 9, and 7 clusters in Fig. 6. To select 27 or 30 clusters does not make sense since for these choices less than two stations would belong on average to a cluster. We chose nine clusters because the largest increase of distance takes place between 9 and 8 clusters. For a given number of clusters the mean distance within all clusters and the mean distance be-

FEBRUARY 1998

KAUFMANN AND WEBER

95

FIG. 6. Distance level at which two clusters are merged in the hierarchical cluster analysis as a function of the number of clusters.

tween all clusters can be calculated. In the case of 9 clusters these distances become 0.25 and 0.56, respectively. Hence, there is a clear separation of the stations into the 9 clusters. The cluster membership of each station is given in the last column of Table 1. The clusters are ordered according to their size, cluster 1 includes 13 stations, cluster 9 only 1 station. In Fig. 7 the cluster membership of the 50 MISTRAL stations is shown by different symbols in a map of the area. Often, stations that are very close geographically, like B3 and B6 or B8 and S1, belong to the same cluster. However, cluster 3, for example, includes station W8 in the Birs Valley and station N3 in the Wiese Valley, which are not close geographically, but whose orographical location (in a valley, see Table 1) is similar. Cluster 1 includes stations in the Rhein Valley around Basel, B3, B6, B8, B9, E1, E3, and N5 (all V), but also several stations located in the hilly terrain between the valleys, B5 (S), S1 (S), S4 (H), N4 (H), N6 (H), and W4 (H). [The type of geographical location (V, S, H, P, or M, see Table 1) of the stations is given in parentheses.] Most stations of cluster 1 are strongly influenced by the flow in the Upper Rhein Valley. Only station S4 shows a southerly flow during nighttime, when the Upper Rhein Valley stations show down-valley flow. In contrast to the stations in other valleys, the stations of cluster 1 are located in places that are open to the west, such that strong westerly winds can suppress the valley flows. In cluster 2 stations are from the whole MISTRAL area. Besides the two stations in the western part, W1 and W2 (both H), the stations at higher altitudes, V6 (S), S2 (M), S5 (M), S6 (H), S7 (M), E2 (H), E4 (H), E5 (H), E6 (H), and N8 (M) and the very exposed location C (M) belong to this cluster. These stations represent larger-scale flow features in contrast to the valley stations of cluster 3, which have locally generated thermal winds.

FIG. 7. Cluster membership of the stations for the nine clusters as described in text. The area is divided into six regions indicated by a gray letter. Each station has a label consisting of the letter of its region and a digit (see Table 1). The station on the St. Chrischona tower, near the center of the area, is denoted by a C.

Cluster 3 includes stations in the smaller valleys (compared to the Rhein Valley) of the Wiese, N3 (V), the Birs, W8 (V), the Ergolz, V2, V3, and V4 (all V), a smaller tributary river, V7 (S), and V8 (V), and one station, S3 (H) in the hilly terrain near the Ergolz Valley. The main direction at these eight stations was chosen as down-valley and the secondary direction as up-valley. The correlations between the eight stations are all positive, showing that down-valley flow or up-valley flow occurs simultaneously in the smaller valleys. This confirms that these flows are thermally driven (Kaufmann and Weber 1996; Kaufmann 1996). The stations in the Rhein Valley north of Basel, B1, B2, N1, N2 (all V), and W5 (M) form cluster 4. The flow in this part of the Rhein Valley is mainly affected by the channeling through the surrounding mountain ranges of the Schwarzwald and the Vosges (see also Wippermann and Gross 1981; Wippermann 1984). Station W5 south of Basel is located on a mountain and may therefore be influenced by the flow through the Rhein Valley extending to greater height than the local, thermally induced flows in the Birs Valley. Cluster 5 consists of stations in the Birs Valley, B7 (V) and W7 (H), the Ergolz Valley, V1 (S), and station W3 (H) in the hilly terrain between the valleys. These stations are not located directly on the valley axis, but rather on the valley slopes, which may explain why these stations do not belong with the valley stations to cluster 3.

96

JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY

The largest clusters built by a cluster analysis are generally those with small distances (highest correlations) within the clusters. The elements with larger distances are left to build clusters with other outliers. These outliers are not necessarily clustered with the nearest element, but with the nearest element not already part of another cluster. The identification of outliers depends to some extent on the choice of the clustering method (Anderberg 1973). The two stations B4 (H) and W6 (V) form cluster 6. Both stations have a trimodal wind rose (Fig. 5), and the choice of the preferred directions is not obvious. The correlation between these two station is relatively low (rchf 5 0.39). Each is slightly higher correlated to one other site, both of these being a member of a larger group. This cluster seems to be built due to the effect described above. The two stations S8 and S9 (both P) on passes of the Jura Mountains form cluster 7. They are moderately correlated (rchf 5 0.60). Stations S8 and S9 have strongly channeled flow in south–north and southeast–northwest directions, respectively (Fig. 5). These directions are perpendicular to the Jura Mountain range (Fig. 4). Hence, the wind direction at these pass stations represents the flow from the Swiss Middleland to the Rhein Valley across the Jura Mountains. This flow seems to be governed by mechanisms other than the flow in the MISTRAL area north of the Jura Mountains. Cluster 8 consists also of two stations only: N7 (V) and V9 (S). Station N7 in the Wehra Valley is special since it has mostly northerly, down-valley winds (see Fig. 5). These two stations are only weakly correlated (rchf 5 20.28) and are presumably outliers that are by chance in the same group. Station N7 actually shows slightly higher correlations with five other stations, all but one of them with north–south channeled flow. Cluster 9 consists of a single station V5 (S) located on the south-facing slope of the Ergolz Valley. This station is greatly affected by local influences like sheltering and slope winds and does not follow larger-scale flow patterns (see also Figs. 8 and 9 of Kaufmann and Weber 1996). It is most highly correlated (rchf 5 0.34) with the nearby station V4. If only seven clusters are chosen, the cluster analysis procedure merges clusters 1 and 7 and clusters 5 and 6. If only five clusters are retained, the cluster analysis procedure merges in addition clusters 8 and 9 and clusters 3 and 4. For comparison, the same cluster analysis as described above was performed with a correlation matrix based on Pearson’s contingency coefficient (Sachs 1982). In this case 11 clusters emerge. Still, the large clusters 1, 2, 3, and 4 remain essentially the same. The most important difference is that the two pass stations S8 and S9 do not form their own cluster but belong now to cluster 1. Some stations (V6, V9, W4, and N7) form clusters of their own and are not merged to any of the larger clusters.

VOLUME 15

We can compare the station clusters obtained in the present paper (Fig. 7) with the mean winds at the stations for the 12 typical regional flow patterns obtained by the classification of the wind fields (Kaufmann and Weber 1996; Kaufmann 1996). It can be seen from Figs. 8 and 9 of Kaufmann and Weber (1996) that stations belonging to the same cluster show winds consistent with the typical regional flow patterns. From the clusters obtained, we conclude that the correlation of two stations not only depends on the horizontal distance between the stations but is also, and sometimes even much more, influenced by the vertical distance of the stations (see also Palomino and Martı´n 1995) and the orographic features of the station locations. Hence, interpolation of winds using only the horizontal distance as weights of the interpolation scheme will not be sufficient in complex terrain. More important than the horizontal distance is whether the stations and the grid points at which winds should be interpolated have similar exposures, orographic features, and altitude. For example, one suitably chosen station in the Birs, Ergolz, or Wiese Valley may well represent the flow in all smaller valleys, as all these flows are thermally forced and the whole MISTRAL area usually experiences about the same incoming solar radiation during a day. 6. Summary and conclusions For flow patterns with strong channeling, a simple correlation coefficient is defined. It is based on the existence of two preferred directions for a channeled flow. Three classes of directions can then be formed—the main direction class, a secondary direction class, and the remaining directions. Assigning a numeric value to each of these three classes, one can calculate a linear correlation coefficient. In this way a correlation coefficient is defined that takes into account the channeling of the flow. A simple model for channeled flow at two locations was considered. The new correlation coefficient, which makes use of the specific properties of channeled flow but is only applicable to such flows, is the only one that gives maximum correlation even if the two preferred directions are not opposite each other and is, therefore, well suited for the analysis of channeled flow. Other correlations taken from the literature, which are applicable to all types of flows and do not make use of any specific flow properties, show a decrease of correlation if the two preferred directions are not opposite each other. The correlation for channeled flow was applied to a dataset of atmospheric wind measurements. In a mesoscale region over complex terrain, 50 ground stations with anemometers captured the near-surface wind field during the MISTRAL field experiment. The 1-h means over an entire year were used to calculate the correlations between all pairs of stations. Based on this correlation matrix the stations were grouped by a cluster

FEBRUARY 1998

KAUFMANN AND WEBER

analysis. Nine groups of stations were identified showing similar channeling behavior. The stations within a group are not necessarily close in space, but instead may be located throughout the whole region. More important for a similar behavior of the wind than spatial distance is the topographic location and the altitude of the stations. These results cast some doubt on the widely used interpolation schemes for near-surface wind fields that only use spatial distance for the determination of the weights. Because many flows in geophysics are influenced by topography they may experience channeling by these boundaries. Our new correlation coefficient provides a simple tool for the analysis of such channeled flows from a wide range of fields. Acknowledgments. The project REKLIP/MISTRAL is partly funded by the two cantons of Basel. Additional data were kindly provided by the Swiss Meteorological Institute, Zu¨rich and the Geographical Institute of the University of Basel. REFERENCES Anderberg, M. R., 1973: Cluster Analysis for Applications. Academic Press, 359 pp. Bayley, G. V., and J. M. Hammersley, 1946: The ‘‘effective’’ number of independent observations in an autocorrelated time series. J. Roy. Stat. Soc., 8 (Suppl.), 184–197. Breaker, L. C., W. H. Gemmill, and D. S. Crosby, 1994: The application of a technique for vector correlation to problems in meteorology and oceanography. J. Appl. Meteor., 33, 1354–1365. Breckling, J., 1989: The Analysis of Directional Time Series: Applications to Wind Speed and Direction. Vol. 61, Lecture Notes in Statistics, Springer-Verlag, 238 pp. Crosby, D. S., L. C. Breaker, and W. H. Gemmill, 1993: A proposed definition for vector correlation in geophysics: Theory and application. J. Atmos. Oceanic Technol., 10, 355–367. Dietzius, R., 1916: Ausdehnung der Korrelationsmethode und der Methode der kleinsten Quadrate auf Vektoren. Sitzungsber. Akad. Wiss. Wien Math. Naturwiss. Kl. Abt 2a, 125, 3–20. Essenwanger, O. M., 1986: General Climatology, 1B: Elements of Statistical Analysis. Elsevier, 424 pp. Fisher, N. I., 1993: Statistical Analysis of Circular Data. Cambridge University Press, 277 pp. , and A. J. Lee, 1983: A correlation coefficient for circular data. Biometrika, 70, 327–332. , and , 1986: Correlation coefficients for random variables on a unit sphere or hypersphere. Biometrika, 73, 159–164. Hanson, B., K. Klink, K. Matsuura, S. M. Robeson, and C. J. Willmott, 1992: Vector correlation: Review, exposition, and geographic application. Ann. Assoc. Amer. Geogr., 82, 103–116.

97

Hooper, J. W., 1959: Simultaneous equations and canonical correlation theory. Econometrica, 27, 245–256. Jackson, I. J., and H. Weinland, 1995: Classification of tropical rainfall stations: A comparison of clustering techniques. Int. J. Climatol., 15, 985–994. Jain, A. K., and R. C. Dubes, 1988: Algorithms for Clustering Data. Prentice-Hall, 320 pp. Johnson, N. L., and S. Kotz, 1969: Distribution in Statistics: Discrete Distributions. John Wiley & Sons, 328 pp. Jupp, P. E., and K. V. Mardia, 1980: A general correlation coefficient for directional data and related regression problems. Biometrika, 67, 163–173. Kamber, K., and P. Kaufmann, 1992: Das Mistral-Messnetz, Konzeption, Aufbau und Betrieb. Regio Basiliensis, 33, 107–114. Kaufmann, P., 1996: Regionale Windfelder u¨ber komplexer Topographie. Ph.D. thesis No. 11565, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, 147 pp. [Available from Dr. Rudolf O. Weber, Paul Scherrer Institute, CH-5232 Villigen PSI, Switzerland.] , and R. O. Weber, 1996: Classification of mesoscale wind fields in the MISTRAL field experiment. J. Appl. Meteor., 35, 1963– 1979. Mardia, K. V., 1972: Statistics of Directional Data. Academic Press, 357 pp. Palomino, I., and F. Martı´n, 1995: A simple method for spatial interpolation of the wind in complex terrain. J. Appl. Meteor., 34, 1678–1693. Parlow, E., 1992: REKLIP—Klimaforschung statt Meinungsmache am Oberrhein. Regio Basiliensis, 33, 71–80. , 1996: The regional climate project REKLIP—An overview. Theor. Appl. Climatol., 53, 3–7. Sachs, L., 1982: Applied Statistics: A Handbook of Techniques. Springer-Verlag, 706 pp. Sneyers, R., 1990: On the statistical analysis of series of observations. World Meteorological Organization Tech. Note 143, 192 pp. [Available from WMO, Case Postale 2300, CH-1211 Geneva 2, Switzerland.] Stooksbury, D. E., and P. J. Michaels, 1991: Cluster analysis of southeastern U.S. climate stations. Theor. Appl. Climatol., 44, 143– 150. Stuart, A., and J. K. Ord, 1987: Kendall’s Advanced Theory of Statistics, Vol. 1: Distribution Theory. Charles Griffin & Company Ltd., 604 pp. ¨ ber die Korrelation zwischen Vektoren mit Sverdrup, H. U., 1917: U Anwendungen auf Meteorologische Aufgaben. Meteor. Z., 34, 285–291. Weber, R. O., and P. Kaufmann, 1995: Automated classification scheme for wind fields. J. Appl. Meteor., 34, 1133–1141. Whiteman, C. D., 1990: Observations of thermally developed wind systems in mountainous terrain. Atmospheric Processes over Complex Terrain, Meteor. Monogr., No. 45, Amer. Meteor. Soc., 5–42. , and J. C. Doran, 1993: The relationship between overlying synoptic-scale flows and winds within a valley. J. Appl. Meteor., 32, 1669–1682. Wippermann, F., 1984: Air flow over and in broad valleys: Channeling and counter-current. Beitr. Phys. Atmos., 57, 92–105. , and G. Gross, 1981: On the construction of orographically influenced wind roses for given distributions of the large-scale wind. Beitr. Phys. Atmos., 54, 492–501.