Application Potential of Four Nontraditional Similarity Metrics in

0 downloads 0 Views 2MB Size Report
This paper presents a review and assessment of four nontraditional similarity metrics that can be applied to ... 201-401 Burrard St., Vancouver, BC V6C 3S5, Canada. ...... J. Math. Chem., 31, 251–270, doi:10.1023/A:1020784004649. McCuen, R. H., and W. M. .... Wang, B., H.-J. Kim, K. Kikuchi, and A. Kitoh, 2011: Diagnostic.
1862

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

Application Potential of Four Nontraditional Similarity Metrics in Hydrometeorology RUPING MO Environment Canada, Vancouver, British Columbia, Canada

CHENGZHI YE Hunan Meteorological Service, Changsha, Hunan, China

PAUL H. WHITFIELD Environment Canada, Vancouver, British Columbia, and Centre for Hydrology, University of Saskatchewan, Saskatoon, Saskatchewan, Canada (Manuscript received 15 August 2013, in final form 21 April 2014) ABSTRACT This paper presents a review and assessment of four nontraditional similarity metrics that can be applied to hydrological and meteorological data. These metrics are 1) the uncentered correlation coefficient, 2) the Hodgkin–Richards index, 3) the Petke index, and 4) the Wang–Bovik index. The first metric has been widely used in hydrometeorology, and the other three have been proposed in other disciplines for similarity analysis. It is demonstrated that these similarity metrics, in their original formulations, either do not actually have the purported advantage over the traditional Pearson correlation coefficient or are not suitable for some hydrometeorological applications. They are reformulated in this study to address these deficiencies. The resulting modified metrics are unitless, bounded, and proportional to the Pearson correlation coefficient, and three of them have the confirmed advantage of explicitly penalizing for differences in the mean and/or in the variance. Two application examples are used to demonstrate the applicability of these similarity metrics in hydrometeorology. A metavalidation model and a graphical tool (Taylor diagram) are used to evaluate the performances of these similarity metrics. In a case study of analog analysis, the Wang–Bovik index stands out as the best metric for simulation of the human perception of similarity between two-dimensional patterns, whereas the modified Petke index and the traditional root-mean-square distance may perform slightly better than the others in the regions with a very large difference between the variances.

1. Introduction Similarity (or dissimilarity) between two patterns is a fundamental concept in many disciplines, including biology, economics, psychology, geophysics, chemistry, and information technology (Boas 1922; Gressens and Mouzon 1927; Cattell 1949; Goodall 1966; Sepkoski 1974; Geller and Mueller 1980; Boyle et al. 1990; Maggiora et al. 2002; Wang and Bovik 2009; Kufareva and Abagyan 2012). The term ‘‘pattern’’ is used here in its generic sense, applied in both temporal and spatial

Corresponding author address: Dr. R. Mo, National Laboratory for Coastal and Mountain Meteorology, Environment Canada, 201-401 Burrard St., Vancouver, BC V6C 3S5, Canada. E-mail: [email protected] DOI: 10.1175/JHM-D-13-0140.1 Ó 2014 American Meteorological Society

dimensions. In many hydrometeorological studies (e.g., forecast verification, analog and predictability analysis, and pattern recognition), the primary method of validating models is by quantifying the overall similarity between modeled and observed patterns using statistical measures (Nash and Sutcliffe 1970; Willmott 1981; Murphy and Winkler 1987; Santer et al. 1993; Legates and McCabe 1999; Taylor 2001; Cannon and Hsieh 2008; Wang and Fan 2009; Fleming and Whitfield 2010; Delle Monache et al. 2011; Wu et al. 2013). The most commonly used similarity metrics are the root-mean-square distance (RMSD) and the Pearson correlation coefficient (PCC; Pearson 1896). In cases where the ordering of the data does not count (e.g., when the time series is viewed as a distribution), other metrics would be used, and the interested reader can consult a recent

OCTOBER 2014

MO ET AL.

paper by Grenier et al. (2013). It is well known that the RMSD is sensitive to scaling and differences in the means, whereas the PCC is not. Since the RMSD and the PCC often provide complementary statistical information about the correspondence between two patterns, using only one of them to ‘‘summarize’’ the degree of similarity in complicated hydrometeorological situations would be incomplete (e.g., Brier and Allen 1951; McCuen and Snyder 1975; Gutzler and Shukla 1984; Yarnal 1984; Murphy 1988; van den Dool 1989; Toth 1991; Taylor 2001; Wang et al. 2011). Generally speaking, similarity needs to consider both pattern association and statistical distribution. The RMSD is a geometric-distance criterion. This metric retains the unit of the input data, making it directly interpretable as a mean error magnitude (Wilks 2011). Because its value varies from 0 to ‘, with similarity decreasing with increasing values, the RMSD is often considered as a measure of dissimilarity rather than similarity. An inherent disadvantage of the RMSD in similarity analysis is that it is dominated by the error amplitudes and cannot offer clear information about pattern association or fidelity (Gutzler and Shukla 1984; Wang et al. 2004; Wang and Bovik 2009; Kufareva and Abagyan 2012). In addition, the RMSD cannot be used directly to identify antianalogs (i.e., the exact inverse of the signal; van den Dool 1987; Livezey and Barnston 1988). It is generally more convenient to use a unitless, bounded metric to quantify the degree of similarity, especially when comparing model performances with multiple variables. The PCC is a unitless, bounded metric that measures the linear association between two data series. As a similarity metric, however, it has the disadvantage of not taking into account any systematic differences in the means and variances. Thus, a high degree of correlation can exist, despite large differences in the pattern means and/or variances, which may indicate a lack of similarity. Therefore, the PCC is often considered as an inexact similarity metric for the purpose of forecast verification (Brier and Allen 1951; McCuen and Snyder 1975; Willmott 1981; Murphy 1995; Legates and McCabe 1999; Wang et al. 2011). Alternative similarity metrics have been proposed for some specific applications in hydrometeorology. The ‘‘S1’’ score, proposed by Teweles and Wobus (1954), measures the skill in predicting gradients that are meteorologically important. Nash and Sutcliffe (1970) defined the coefficient of efficiency as unity minus the ratio of the mean-square error of a model to the variance of the observed data. This metric can be analytically related to the PCC and model biases (Murphy 1988, 1995). McCuen and Snyder (1975) proposed a modified

1863

correlation index for comparing hydrographs; it is defined as the PCC adjusted by a ratio of the variances of the two compared hydrographs. Toth (1991) compared nine metrics that could be applied in circulation analog analysis. Ehret and Zehe (2011) proposed a series distance to quantify the similarity of two hydrographs on the scale of hydrological events. The present work considers some other nontraditional similarity metrics that might have general applicability in hydrometeorology. It would be practically impossible to examine a large collection of mathematically defined metrics that have been proposed in various fields for similarity analysis (e.g., Cronbach and Gleser 1953; Sepkoski 1974; Boyle et al. 1990; Maggiora et al. 2002; Garrett-Mayer 2006; Brunet et al. 2012). This study focused on four correlation-like metrics: the uncentered correlation coefficient (UCC), the Hodgkin–Richards (HR) index, the Petke index, and the Wang–Bovik (WB) index. These metrics are unitless, bounded, and sensitive to the pattern association (correlation) and differences in the mean and variance (the first two distributional moments). Therefore, they have the potential to overcome the above-mentioned limitations associated with the PCC and the RMSD in similarity analysis. The UCC is similar to the PCC, but it is evaluated without removing the pattern means from the data. A variant of this metric, known as the anomaly correlation coefficient in meteorology, has been widely used to measure the degree of correspondence between forecasts and observations (Miyakoda et al. 1972; Hollingsworth et al. 1980; van den Dool 1987; Stanski et al. 1989; Murphy 1995; Shukla et al. 2000; DelSole and Shukla 2006; Wang and Fan 2009; Wilks 2011). Many studies use the UCC as a similarity metric, but they do not state clearly why the UCC is preferred over the PCC. It is generally assumed that, because the mean is not removed from the data, the score from the application of UCC would be penalized for the difference between the means (Garrett-Mayer 2006; Smith et al. 2013); however, this conventional expectation is not always justified (Santer et al. 1993; Potts et al. 1996). Ward and Folland (1991) proposed that a modified version of the UCC can explicitly penalize for the difference in the means. The deficiencies in this modified UCC and its variants are highlighted in this study. The other three similarity metrics (Hodgkin–Richards, Petke, and Wang–Bovik) are widely used in other disciplines such as chemistry and information technology (e.g., Hodgkin and Richards 1987; Petke 1993; Wang and Bovik 2002, 2009), though they are perhaps new to meteorologists and hydrologists. The present work serves to illustrate their potential applicability in

1864

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

hydrometeorology, with a focus on their sensitivity to three fundamental factors in similarity analysis: difference between the means, difference between the variances, and correlation.

2. Traditional similarity metrics: RMSD versus PCC Let x 5 (xn ; n 5 1, 2, ... , N) and y 5 (yn ; n 5 1, 2, ... , N) represent two comparable patterns measured at N discrete points. The similarity between x and y is commonly measured by either the RMSD Dxy or the PCC Rxy : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 N (1) Dxy 5 å (x 2 yn )2 with Dxy 2 [0, ‘) , N n51 n Rxy 5 sxy /(sx sy ) with

Rxy 2 [21, 1 1] ,

(2)

where sample means (x, y), standard deviations (sx , sy ), and covariance sxy are defined by x5 s2x 5

N

1 N

n51

1 N

å (xn 2 x)2 ,

1 sxy 5 N

å xn , N

n51

y5

1 N

N

å yn ,

FIG. 1. Examples showing the limitations of the PCC and RMSD as similarity metrics; t represents time for time series.

n51

s2y 5

1 N

N

å (yn 2 y)2 ,

n51

N

å (xn 2 x)(yn 2 y) .

(3)

n51

If x and y resemble one another, they form a pair of analogs. They are considered as perfect analogs if y 5 x, thereby Dxy 5 0 and Rxy 5 1. They are considered as perfect antianalogs if y 5 2x 2 x (so that the two patterns share the same mean and variance), thereby Dxy 5 2sx and Rxy 5 21. An ideal similarity metric would have the three properties of 1) uniquely detecting a pair of perfect analogs, 2) giving the degree to which two patterns correspond, and 3) uniquely detecting a pair of perfect antianalogs. For the identification of perfect analogs, Dxy 5 0 is a necessary and sufficient condition, but Rxy 5 1 is just a necessary condition. Either Dxy 5 2sx or Rxy 5 21 is only a necessary condition for identifying a pair of perfect antianalogs. As a dimensioned quantity, the RMSD has the intuitive meaning of proximity: the smaller the average distance, the greater the similarity between x and y. It has a range from zero (perfect analogs) to infinity (total dissimilarity). This metric, however, is not a good measure of phase match or structural alignment (e.g., Gutzler and Shukla 1984; Wang and Bovik 2009; Kufareva and Abagyan 2012). This limitation is illustrated in Fig. 1, where the red line (x 5 sint) is a perfect antianalog of the cyan line (u 5 2sint) and an imperfect analog of the blue line (y 5 3 sint). The phase differences

are 1808 between x and u, and 08 between x and y. However, the RMSD valuespofffiffiffi these two combinations are identical ðDxu 5 Dxy 5 2Þ. In other words, the RMSD cannot distinguish an in-phase pattern from an out-phase pattern and cannot detect a pair of antianalogs. Note that the values of Dxy and Dxz are also identical, despite the fact that there p isffiffiffia 608 phase difference between the violet line (y 5 2 1 sint) and the pffiffiffiffiffiffiffi green line [z 5 1:5 1 sin(t 1 p/3)]. In this example, the functions are continuous rather than discrete time series. Therefore, the summations in (1) and (3) must be replaced by integrals. The PCC is widely used as a measure of linear association between x and y. It has the desirable properties of being unitless, bounded by a maximum of 1 for x 5 y and a minimum of 21 for x 5 2y. It is sensitive to the phase shift between the patterns. For example, when two time series are compared, Rxy 5 1 means in phase, Rxy 5 0 corresponds to a phase difference of 908, and Rxy 5 21 corresponds to a phase difference of 1808 (see Fig. 1). However, since the PCC is computed from the deviations of the patterns from their respective means, and the normalization in (2) rescales the patterns so that their variances also match, it is entirely independent of differences in the mean and variance (Brier and Allen 1951; McCuen and Snyder 1975; Garrett-Mayer 2006). For example, substitution of w 5 6 ax 1 b with a . 0 into (2) always gives Rxw 5 6 1, irrespective of the actual values of constants a and b. In Fig. 1, the obvious

OCTOBER 2014

1865

MO ET AL.

difference between the means of x 5 sint (red line) and pffiffiffi y 5 2 1 sint (violet line) pffiffiffi cannot be detected from the PCC (Rxy 5 1, y 2 x 5 2). As a similarity metric, insensitivity to known differences is considered a disadvantage. Also note in Fig. 1 that, since Rxu 5 21, one can identify the cyan line as a perfect antianalog of the red line; however, where other combinations also produce a PCC of 21, one might be misled to treat them as a perfect antianalog of the red line. Ward and Folland (1991) and Taylor (2001) showed that (1) can be rearranged to yield ^2 , D2xy 5 d2xy 1 D xy

(4)

where dxy is the difference between the means (or model ^ xy is the centered RMSD, defined rebias) and D spectively by dxy 5 x 2 y

and

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N ^ 5 1 å [(x 2 x) 2 (y 2 y)]2 D xy n N n51 n qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 5 s2x 1 s2y 2 2sx sy Rxy .

(5)

(6)

Noticing that the PCC (i.e., Rxy) and the RMSD (i.e., Dxy) provide complementary statistical information of pattern similarity, Taylor (2001) devised a unique diagram based on (6) to provide a graphical summary of how closely a pattern (or a set of patterns) matches a reference pattern in terms of their PCC and centered RMSD. Figure 2 is an example of such Taylor diagram, in which the red point represents a reference pattern (x 5 sint; see Fig. 1) and the blue, cyan, and green points represent three test pffiffiffiffiffiffiffi patterns given by y 5 3 sint, u 5 2sint, and z 5 1:5 1 sin(t 1 p/3); if a violet point pffiffiffi for y 5 2 1 sint were also plotted, it would be at the same location as the red point. The distance between each test point and the reference point on the plot equals ^ xy Þ. For the green (cyan, the centered RMSD ði.e:, D blue) pattern, the correlation with the red pattern is 0.5 (21.0, 1.0). The standard deviation of each pattern is proportional to the radial distance from the origin. The gray solid contours (with the same magnitude and unit as the standard deviation) indicate the centered RMSD between the test patterns and the reference pattern. Because the Taylor diagram focuses on centered pattern differences and excludes differences in the means, the latter information is presented as ‘‘bias’’ in the legend of Fig. 2. To close this section, note that (4) can be further rearranged into the following form,

FIG. 2. Sample Taylor diagram displaying statistical comparisons of a reference pattern (x 5 sint) in red,pwith ffiffiffiffiffiffiffi three test patterns: y 5 3 sint in blue, u 5 2sint in cyan, and z 5 1:5 1 sin(t 1 p/3) in green. The model bias (the reference mean minus the test mean) is indicated in the legend. The point position of each pattern on the polar graph is determined by a radial distance equal to its std dev and an azimuthal angle equal to the arc cosine of its correlation with the reference pattern; the dashed radial lines are labeled nonlinearly by the cosine of the angle (i.e., the correlation coefficient). The solid gray lines measure the distance from the reference point and indicate the centered RMSD. See Taylor (2001) for further details.

D2xy 5 (x 2 y)2 1 (sx 2 sy )2 1 2sx sy (1 2 Rxy ) .

(7)

The above equation indicates that D2xy is a summary measure of difference between the means, difference between the variances, and the loss of correlation.

3. Four correlation-like similarity metrics and their variants As demonstrated above and widely in the literature, the PCC as a similarity metric is insensitive to differences in the mean and variance. Many alternative similarity metrics have been proposed to overcome this limitation; four of them are considered in this section for hydrometeorological applications.

a. The uncentered correlation coefficient The UCC is also known as cosine coefficient (Salton and Lesk 1968). It can be defined by ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Cxy 5

N

å xn yn

n51

N

N

n51

n51

å x2n å y2n .

(8)

This metric is similar to Rxy defined in (2), but it is evaluated without removal of the means from xn and yn (i.e., not centered). If xn and yn are same-sign variables (e.g., precipitation amounts), then Cxy is bounded by 0 and 1 and has the value of 1 when two compared objects are identical. When xn and yn can have both positive and negative values, Cxy varies in the same range as

1866

JOURNAL OF HYDROMETEOROLOGY

Rxy , from 21 to 1. As illustrated in appendix A, Cxy can be related to Rxy by ! sx sy xy Cxy 5 Rxy , 1 hx hy hx hy

(9)

0 For dxy 5 0, the above equation indicates that Cxy 5 Rxy . 0 For dxy 6¼ 0, the inequality jCxy j , jRxy j always holds. 0 explicitly penalizes for dxy (the mean Therefore, Cxy 0 j/0 as d2xy /‘. Note difference) in the sense that jCxy 0 that Cxy is asymmetric under interchange of x and y, that 0 0 6¼ Cyx , where is, Cxy

where x, y, sx , and sy are defined in (3) and qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hx 5 x2 1 s2x

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi and hy 5 y2 1 s2y .

0 Cxy 5 Cx 0 y 0 5

N

å x0n y0n

n51

N

N

å x02n å y02n ,

n51

(11)

n51

where x0n and y0n are linear transformations of xn and yn , defined by x0n 5 xn 2 x

and y0n 5 yn 2 x 5 (yn 2 y) 2 dxy .

sx 0 ffi Rxy . 5 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Cyx 2 dxy 1 s2x

(10)

Miyakoda et al. (1972) pioneered the meteorological application of UCC in the context of model verification. In their formula, the climatological average value of the observed field at each data point cn is first subtracted from both the observation xn and forecast yn. Then, the resulting anomalies x~n 5 xn 2 cn and y~n 5 yn 2 cn are used to replace xn and yn in (8). This UCC variant has been widely known as the ‘‘anomaly correlation coefficient’’ in the meteorological community (Stanski et al. 1989; Potts et al. 1996; DelSole and Shukla 2006; Wilks 2011). Note that an integral version of (8) was introduced by Zawadzki (1973) to analyze the statistical properties of precipitation patterns [also see Germann and Zawadzki (2002)]. In modern chemical research, an integral version of (8) proposed by Carbó et al. (1980) has been widely known as Carbó index (Hodgkin and Richards 1987; Petke 1993; Maggiora et al. 2002). The usefulness of Cxy as an alternative to Rxy in similarity analysis depends on the assumption that Cxy is capable of penalizing for difference between the means in the sense that jCxy j could be substantially lower if a substantial mean difference exists. However, this conventional expectation is not always justified (Santer et al. 1993; Potts et al. 1996). For example, substitution of yn 5 axn with a 6¼ 0 into (8) gives Cxy 5 a21 jaj 5 61, which is independent of the mean difference given by dxy 5 (1 2 a)x. To obtain a more useful similarity metric, Ward and Folland (1991) proposed a modified UCC as ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

(12)

0 can be related to As illustrated in appendix A, Cxy Rxy by sy 0 Cxy 5 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Rxy . (13) d2xy 1 s2y

VOLUME 15

(14)

A modified UCC being symmetric under interchange of x and y can be defined as 0

1

sy /2 C 1 0 B sx /2 y 0 ffi 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiARxy . Cxy 5 (Cyx 1 Cxy ) 5 @qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 d2 1 s2 d2 1 s2 xy

x

xy

y

(15) 0 0 y , Cyx , and Cxy are proIn the above equations, Cxy portional to Rxy and are explicitly penalized for dxy . These modified versions of UCC are affected by the pattern variances in complex ways depending on the size of d2xy , but they are not necessarily penalized for the difference between the variances. To demonstrate this, consider two patterns with sy 5 bsx and b $ 1, then (15) can be rewritten as

0

1

sx /2 B sx /2 C y ffi 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 5 @qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Cxy ARxy . 2 2 22 2 2 dxy 1 sx b dxy 1 sx

(16)

Because the difference between the variances is proportional to b in this case, the above equation indicates y j increases as the variance difthat, when dxy 6¼ 0, jCxy y j 5 jRxy j. In other words, ference increases and lim jCxy b/‘ y Cxy is actually depenalized for the variance difference in this case. In similarity analysis, this is obviously undesirable.

b. The Hodgkin–Richards index Noticing that the Carbó index (an integral UCC) is insensitive to differences between the variances, Hodgkin and Richards (1987) proposed an alternative metric to measure the similarity between two molecules in terms of their electron densities. The discrete version of the HR index can be given as Hxy 5 2

N

,

N

å (xn yn ) å (x2n 1 y2n ) .

n51

n51

The above equation can be rearranged to yield

(17)

OCTOBER 2014

1867

MO ET AL.

Hxy 5 1 2

,

N

N

å (xn 2 yn )2 å (x2n 1 y2n ) ,

n51

(18)

n51

which is essentially a normalized mean-square distance subtracted from unity. As demonstrated in appendix A, Hxy can be related to Cxy by ! 2hx hy Cxy . Hxy 5 (19) h2x 1 h2y The HR index was proposed as an alternative to the UCC based on the assumption that it can penalize for differences both in the mean and in the variance (Hodgkin and Richards 1987; Petke 1993). However, none of the above three expressions of Hxy provides a clear indication of such penalty. In fact, this expectation is not always true. For example, substituting yn 5 axn into (17) yields Hxy 5 2a/(1 1 a2 ), which is not related to the mean difference given by (x 2 y) 5 (1 2 a)x or to the variance difference given by (sx 2 sy ) 5 (1 2 a)sx . On the other hand, the same data transformations given in (12) can be used to define a modified HR index as y Hxy 52

N

å

n51

N

, (x 0n y 0n )

N

å

n51

02 (x02 n 1 yn ) .

(20)

2 6 54

2sx sy 2 dxy 1 s2x 1 s2y

n51 N

Pxy 5

å

n51

2

N

x2n ,

å

n51

y2n

hx hy

Rxy 3

d2xy 1 (sx 2 sy )2 1 2sx sy

7 5Rxy ,

(21)

(23)

D2xy s2x 1 s2y 1 d2xy

.

The derivation of the second expression in (23) is given in appendix A. This metric will be referred to as the Petke index. Maggiora et al. (2002) demonstrated that the following inequalities always hold: jPxy j # jHxy j # jCxy j # 1 .

y is symmetric under interchange of which shows that Hxy x and y and explicitly penalizes for the mean difference (dxy 5 x 2 y) and the variance difference (sx 2 sy ). Furthermore, combining (21) with (7) yields

(22)

y is a normalized mean-square distance subIn (22), Hxy tracted from unity. Following the convention of Boas (1922), one can consider the normalized mean-square distance as a measure of dissimilarity and its compley as a measure of similarity. ment Hxy

c. The Petke index Petke (1993) proposed an alternative to the HR index:

3

5C . !54 max(h2x , h2y ) xy

!

2sx sy

y 512 Hxy

å (xn yn )

max

Substituting (12) into (20) yields (see appendix A), y Hxy 5

FIG. 3. Variation of four similarity metrics Rxy , Cxy , Hxy , and Pxy between two series x and y, where y 5 ax. The definitions of these metrics are given in (2), (8), (17), and (23), respectively.

(24)

These inequalities establish an ordering of these bounded similarity metrics. Figure 3 illustrates this ordering for yn 5 axn . In this case, s2y 5 a2 s2x and a difference between the variances exists when a2 6¼ 1; however, Cxy and Rxy are identical, discontinuous at a 5 0, and insensitive to the magnitude of a (i.e., Cxy 5 Rxy 5 a21 jaj 5 61 if a 6¼ 0, and Cxy 5 Rxy 5 0 if a 5 0). The HR index [i.e:, Hxy 5 2a/(1 1 a2 )] changes continuously and nonlinearly with a. The Petke index is obtained as Pxy 5 a if jaj # 1 and Pxy 5 1/a if jaj . 1; it is more sensitive to the amplitude than the HR index. For this special case, the difference between the means is given as dxy 5 (1 2 a)x. If a 6¼ 1 and x 6¼ 0, then dxy 6¼ 0, but none of Cxy , Hxy , or Pxy are sensitive to such differences. As shown in (16) and (20), the modified y y and Hxy , are versions of Cxy and Hxy , that is, Cxy 2 explicitly penalized for dxy . It is preferable to define

1868

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

a modified Petke index using the same data transformations given in (12): N

å (x0n y0n )

Pyxy 5

n51 N

max 2 54

å

n51

x02 n ,

N

å

n51

! y02 n

s x sy max(s2x , s2y ) 1 d2xy

3 5R . xy

(25)

The derivation of the second expression in (25) is given y y and Hxy , Pyxy is symmetric under in appendix A. Like Cxy interchange of x and y and is explicitly penalized for the mean difference (i. e:, dxy ). Since max(s2x , s2y ) $ sx sy with the equality sign only for sx 5 sy , Pyxy also explicitly penalizes for the variance difference. When dxy 5 0, Pyxy is the symmetric version of the modified correlation index proposed by McCuen and Snyder (1975). As demonstrated in appendix B, the following inequalities always hold: y y j # jCxy j # jRxy j # 1, jPyxy j # jHxy

(26)

with the equality sign only for sx 5 sy when dxy 5 x 2 y 5 0. These inequalities establish the sensitivity order y y , Hxy , and for these four similarity metrics. Note that Cxy y Pxy are bounded by 21 and1 and always have the same sign as Rxy . Therefore, they can be loosely referred to as modified correlation coefficients. The modifications are introduced with the idea of explicitly penalizing for differences in the mean and/or variance that cannot be y and Pyxy can achieve the detected by Rxy . Both Hxy maximum value of 1 only for a pair of perfect analogs (i.e:, yn 5 xn for all possible n) and reach the minimum value of 21 for a pair of perfect antianalogs (i.e:, if x 5 y, sx 5 sy , and Rxy 5 21). The inequalities in (26) are illustrated in Fig. 4 for a special case where yn 5 axn , xn 5 n/N, and n 5 1, 2, . . . , N. In this case, x 5 (N 1 1)/(2N), y 5 ax, s2x 5 (N 2 2 1)/(12N 2 ), s2y 5 a2 s2x , dxy 5 (1 2 a)x, and y y , Hxy , Rxy 5 a21 jaj (with Rxy 5 0 if a 5 0). Metrics Cxy and Pyxy can be obtained by substituting these statistics into (15), (20), and (25), respectively. As compared to Fig. 3, Fig. 4 shows how these three modified similarity metrics are penalized for both the mean and variance differences. Another similarity metric (Qyxy ) in Fig. 4 will be introduced next.

d. The Wang–Bovik index To measure the structural difference between two images, Wang and Bovik (2002) proposed a similarity metric that combines a loss of correlation with both the

y y FIG. 4. Variation of five similarity metrics Rxy , Cxy , Hxy , Pyxy , and Qyxy between two series x and y, where y 5 ax with xn 5 n/N; n 5 1, 2, . . . , N; and N 5 1000. The definitions of these metrics are given in (2), (15), (20), (25), and (28), respectively.

difference between the means and the difference between the variances. This WB index can be defined for nonnegative image signals x and y as ! ! ! 2sx sy sxy 2x y . Qxy 5 sx sy x2 1 y2 s2x 1 s2y |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl} mxy

y xy

(27)

Rxy

Each of the three components in the above equation takes the value of 1 as both numerator and denominator approach 0. Wang et al. (2004) introduced some small constants in (27) to ensure that that near-zero denominators will not lead to computational instability and named the resulting metric the Structural Similarity (SSIM) index. For image processing systems, the small constants in SSIM can also be used to characterize the saturation effects of the visual system at low luminance and contrast regions and may have a role in improving the predictive power of the subjective similarity between signals (Wang et al. 2004; Brunet et al. 2012). In practice, the formula given by (27) usually works quite well (Wang and Bovik 2002, 2009). Here, the original definition given in (27) is adopted. In (27), both mxy and y xy range from 0 to 1, with mxy 5 1 when x 5 y and y xy 5 1 when sx 5 sy . Therefore, Qxy can also be considered as a modified correlation coefficient, which has the same sign and range as Rxy but is explicitly penalized for difference in the means through mxy and difference in the variances through y xy .

OCTOBER 2014

1869

MO ET AL.

Note that mxy can be considered as the HR index for the means x and y. Also note that mxy is not affected by the variances (sx or sy ), and y xy does not depend on the size of dxy . With these unique features, Qxy would be y y , Hxy , or Pyxy in similarity analysis. preferable to Cxy The properties stated above concern the applications of Qxy as a similarity metric on the same-sign data only. For the mixed-sign data used in many hydrometeorological applications, mxy defined in (27) cannot serve as an appropriate penalty for the mean distortion. To overcome this limitation, a modified WB index can be defined as 2

3

! ! 2(x 2 cxy )(y 2 cxy ) 7 2sx sy sxy 5 , (x 2 cxy )2 1 (y 2 cxy )2 sx s y s2x 1 s2y |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl}

Qyxy 5 6 4

myxy

y xy

Sensitivity

Dxy

Rxy

y Cxy

y Hxy

Pyxy

Qyxy

Association Mean difference Variance difference Antianalog

No Yes Yes

Yes No No

Yes Yes Yes/No*

Yes Yes Yes

Yes Yes Yes

Yes Yes Yes

No

Yes/No**

Yes/No**

Yes

Yes

Yes

y * The metric Cxy may or may not be penalized by the variance difference. ** A pair of perfect antianalogs must satisfy three conditions: x 5 y, sx 5 sy , and Rxy 5 21; it cannot be uniquely detected by Rxy y and Cxy .

Rxy

(28) where cxy is the minimum value of two compared patterns, that is, cxy 5 min(xn , yn j n 5 1, 2, . . . , N) .

TABLE 1. A summary of the properties associated with six similarity metrics: the RMSD (i.e:, Dxy ), the PCC (i.e:, Rxy ), and the y y modified versions of UCC (i.e:, Cxy ), HR index (i.e:, Hxy ), Petke index (i.e:, Pyxy ), and WB index (i. e:, Qyxy ).

(29)

Note that Qyxy is equal to Qxy yy , where the transformed data are given as (xy , yy ) 5 (x, y) 2 cxy ; the variance and correlation are not affected by this linear transformation ði.e:, yxy 5 y xy yy and Rxy 5 Rxy yy Þ. The sensitivity of Qyxy is compared with the others for a special case in Fig. 4. Apparently, Qyxy does not fit into the orderings described by (26). To capture local similarities, Wang and Bovik (2002) suggested computing the WB index locally on a B 3 B sliding window and then averaging the local scores into a single global index. Wang and Li (2011) proposed an information-content weighting scheme to achieve better overall performance of the WB index. This approach deserves future work, but it is out of the scope of this paper.

e. Section summary The usefulness of a similarity metric has to be assessed when used in applications other than originally intended. In this study, similarity metrics that should be sensitive to pattern association and that penalize for differences in the mean and in the variance were being sought for hydrometeorological applications. Table 1 gives a summary of y y , Hxy , Pyxy , and the properties associated with Dxy , Rxy , Cxy y , Pyxy , and Qyxy , are desirable Qyxy . Only three, that is, Hxy similarity metrics in terms of being sensitive to the three prominent attributes of pattern differences and are also capable of detecting an antianalog. In addition, Qyxy can be used as a three-dimensional vector metric for separately analyzing the pattern correlation and penalties imposed y or Pyxy by differences in the mean and variance. Either Hxy

can only be presented as a two-dimensional vector metric, with one dimension for the pattern correlation and another for the combined mean–variance penalty.

4. Hydrometeorological applications Similarity analysis in hydrometeorology aims to evaluate how closely one geophysical situation resembles another either in space or in time. Common applications include forecast verification, analog analysis, and model comparison. In this section, two hydrometeorological examples are used to compare the performances of the four nontraditional similarity metrics with the two traditional metrics.

a. Precipitation similarity analysis The similarity of monthly precipitation amounts observed from two weather stations over a 60-yr period from 1953 to 2012 (Fig. 5) provides a simple illustration of the comparison of these metrics. These two stations are in southern China (Fig. 6a): the Nanyue Mountain Meteorological Observatory is a special manned observatory located at an elevation of 1266 m (Chen et al. 2013), and the Hengyang Weather Station is a regular observatory located about 45 km to the south at a lower elevation (98 m). A visual inspection of Fig. 5 reveals that precipitation time series at these two stations are more similar during the cold season from November to April than during the warm season from May to October. This is consistent with the observation that the warm-season precipitation events in China are more convective and localized (Ding 1994). The scores of similarity between the two series measured by five similarity metrics—the PCC (i.e:, Rxy ), the UCC (i.e:, Cxy ), the HR index (i.e:, Hxy ), the Petke index (i.e:, Pxy ), and the WB index (i.e:, Qxy )—are shown in Fig. 6b. The first feature to notice is that Cxy has the least seasonal variation and that it is always greater

1870

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

FIG. 5. Time series of monthly precipitation amounts (mm) at the Nanyue Mountain Meteorological Observatory and the Hengyang Weather Station from 1953 to 2012.

than Rxy . For Cxy . Rxy , (9) indicates that xy . (hx hy 2 sx sy )Rxy . This inequality does not reveal any information on how Cxy is affected by differences between the means of the two precipitation series. Therefore, it is not sure at this point whether Cxy has any real advantage over Rxy as a similarity metric. In this regard, there is also no convincing evidence to suggest that Hxy or Pxy has a real advantage over Rxy . On the other hand, Fig. 6b confirms that Qxy is explicitly penalized for differences in the mean and variance, leading to jQxy j , jRxy j. The presence of differences in the mean and variance, as indicated in Fig. 6c, makes the two series less similar to each other.

The corresponding similarity scores based on the four y y , Hxy , Pyxy , and Qyxy Þ are modified similarity metrics ðCxy compared with the PCC in Fig. 6d. All of the four modified metrics have smaller similarity scores than the corresponding PCC. In other words, they have the confirmed advantage over the PCC, as they explicitly penalize for differences in the mean and/or variance. The modified Petke index receives the most severe penalty from differences in the mean and variance. Note that there is no significant difference between Qxy and Qyxy . A specific advantage of the WB index ðQxy or Qyxy Þ is that it can be decomposed into three components to

OCTOBER 2014

MO ET AL.

1871

FIG. 6. (a) Topographic map showing locations of the Nanyue Mountain Meteorological Observatory (1266 m; 27.29748N, 112.68938E) and the Hengyang Weather Station (98 m; 26.88918N, 112.59598E) in Hunan Province of China. (b) Seasonal variation of five similarity metrics Rxy , Cxy , Hxy , Pxy , and Qxy , based on (2), (8), (17), (23), and (27), respectively, for the monthly precipitation amounts between the two stations over 60 years (1953–2012). (c) The y y WB metric (i.e:, Qxy ) and its three components (mxy , y xy , and Rxy ), based on (27). (d) As in (b), but for Cxy , Hxy , Pyxy , and Qyxy , based on (15), (20), (25), and (28), respectively.

separate the pattern correlation and the penalties to the correlation due to differences in the mean and variance. As shown in Fig. 6c, all of these three components reach their minimums in September with (mxy , y xy , Rxy ) 5 (0:56, 0:60, 0:60), leading to the minimum of Qxy 5 mxy y xy Rxy 5 0:20. To judge how well the similarity metrics reflect human subjective assessments of similarity, a survey was conducted by asking participants to rank the similarity of the 12 patterns in Fig. 5 from the most similar to the least similar. The survey received valid

responses from 31 meteorologists associated with the Hunan Provincial Meteorological Bureau in China, and the results are summarized in Table 2. The mean opinion score (MOS; with 0 , MOS # 12 in this case; see appendix C) from the survey shows that the two precipitation series are more similar during the cold season than during the warm season, with the best similarity in January (MOS 5 11.3) and the worst in September (MOS 5 1.7). To compare the MOS-based rankings given in the last column of Table 2, two kinds of rank correlation, Spearman’s rho (Spearman 1904)

1872

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

TABLE 2. The results from a survey asking meteorologists to rank the 12 months based on their judgments of the similarity between the two precipitation series in each month shown in Fig. 5. The survey received 31 responses. The italic integer is the number of judgments nk given to each rank for each month. The MOS is calculated from (C1) in appendix C, with K 5 12 and N 5 31. In this case, 0 , MOS # 12. The last column gives the MOS-based rankings of each month. Rank judged by individuals (k)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1

2

3

4

5

6

7

8

9

10

11

12

MOS

Rank

15 3 — — — — — — — — — 13

9 10 — — — — — — — — — 12

7 17 — — — — — — — — 1 6

— 1 10 2 — — — — — — 18 —

— — 19 — — — — — — 2 10 —

— — 2 10 — — 1 — 2 16 — —

— — — 12 3 — 4 — — 10 2 —

— — — 5 6 2 11 5 — 2 — —

— — — 1 7 8 8 7 — — — —

— — — — 8 14 3 5 1 — — —

— — — 1 4 4 2 11 8 1 — —

— — — — 3 3 2 3 20 — — —

11.3 10.5 8.3 6.2 3.6 3.1 4.3 3.0 1.7 6.5 8.5 11.2

1 3 5 7 9 10 8 11 12 6 4 2

and Kendall’s tau (Kendall 1938), were computed. These two methods use different techniques for determining the degree of similarity between two rankings, so their values are not normally the same. Both statistics, however, utilize the same amount of information about the association between two ranked variables. The distribution of Kendall’s tau under the null hypothesis is simpler, especially when the sample size is small (Kendall 1975). As shown in Table 3, both rank correlations suggest that the human perception (MOS-based assessment) of similarity between these precipitation time series is best correlated with the assessment by the RMSD (i.e:, Dxy ). It is also observed that the rank correlation coefficients associated with the PCC (i.e:, Rxy ) are higher than those with the UCC (i.e:, Cxy ), the HR index (i.e:, Hxy ), and the Petke index (i.e:, Pxy ), but lower than those with the WB index (i.e:, Qxy or Qyxy ) y y and Hxy . The relative merit and two modified metrics, Cxy of different similarity metrics from this metavalidation experiment can be summarized in the following order: y y , Cxy )/(Qyxy , Qxy )/(Rxy , Pyxy )/Pxy/Hxy / Dxy/(Hxy Cxy . It should be emphasized that this merit order is based on the comparison with a subjective similarity assessment, which is not necessarily the best assessment. In addition, the MOS order in Table 2 may also be subject to the random error, given that some MOS values are very close together.

b. Analog analysis Analog analysis is an active area of research in hydrometeorology, often for the purposes of predictability assessment, weather forecasting, and pattern recognition (e.g., Lorenz 1969; Gutzler and Shukla 1984; van den Dool 1987, 1989; Nourani et al. 2007; Ren et al. 2009; Wang and Fan 2009; Delle Monache et al. 2011). Most of these studies relied on the PCC or the RMSD as the similarity metric. Here an example is used to compare the performances of different similarity metrics as they are applied to an analog analysis of El Niño–Southern Oscillation (ENSO), which is associated with the interannual fluctuation of sea surface temperature (SST) anomalies in the equatorial Pacific (Philander 1990; Sarachik and Cane 2010); the events associated with the warm and cold phases of ENSO are referred to as El Niño and La Niña, respectively. In this example, the six similarity metrics are used to search for analogs for the 2009/10 El Niño event. The PCC and the RMSD are included because of their widespread use, but the focus is on the four modified metrics described previously. Studies have shown that this El Niño event produced specific atmosphere–ocean teleconnection patterns and had a significant impact on the North American weather anomalies (Ratnam et al. 2012; Mo et al. 2014). In general, El Niño events occur at irregular intervals between 2 and 7 years, typically locking

TABLE 3. Rank correlations between the MOS-based rankings of similarity of the 12 pairs of precipitation series (the last column of y y Table 2) and the corresponding rankings found by 10 similarity metrics: Dxy , Rxy , Cxy , Cxy , Hxy , Hxy , Pxy , Pyxy , Qxy , and Qyxy . Correlation

Dxy

Rxy

Cxy

y Cxy

Hxy

y Hxy

Pxy

Pyxy

Qxy

Qyxy

Spearman’s rho Kendall’s tau

0.99 0.97

0.92 0.79

0.76 0.61

0.96 0.85

0.81 0.67

0.96 0.85

0.83 0.67

0.92 0.79

0.93 0.82

0.93 0.82

OCTOBER 2014

MO ET AL.

1873

FIG. 7. (a) SST anomalies over the tropical Pacific (208S–208N, 1208E–708W) during the NDJ season in 2009/10. (b) NDJ Niño-3.4 index, which is the average of SST anomalies over the area bounded by the blue lines (58S–58N, 1708–1208W) in (a), during a 60-yr period. SST anomalies are relative to their climatological values over the 60-yr period.

their phases with the boreal winter (Philander 1990; Mo et al. 1998; Sarachik and Cane 2010). Figure 7 shows the 3-month-average [November–January (NDJ)] SST anomalies over the tropical Pacific during the mature phase of the 2009/10 El Niño and the 60-yr (from November 1952 to January 2012) NDJ time series of the Niño-3.4 index, which is the average SST anomaly in the region bounded by 58N–58S, 1708–1208W. The SST data are provided on a 28 3 28 grid by the National Oceanic and Atmospheric Administration (NOAA)/Office of Oceanic and Atmospheric Research (OAR)/Earth System Research Laboratory (ESRL)/Physical Sciences Division (PSD), Boulder, Colorado [Extended Reconstruction Sea Surface Temperature, version 3b (ERSST.v3b), from www.esrl.noaa. gov/psd/data/gridded/; see Smith et al. 2008]. For comparison, the SST anomaly patterns in NDJ for the other 24 events with a positive Niño-3.4 index (Fig. 7b) are plotted in Fig. 8. Figure 7a shows a typical SST anomaly pattern for a mature El Niño, with a maximum anomaly of 2.38C at the equator near 1508W. This pattern is compared with the SST anomalies over the same domain in the same season (NDJ) of the other 59 years. A good analog of this pattern would be expected to reproduce the

maximum anomaly along the equator with similar strength and location. A list of the best five analogs for each of the six similarity metrics are given in Table 4. For the best analog, the RMSD (i.e:, Dxy ), the modified y y ), and the modified HR index (i.e:, Hxy ) UCC (i.e:, Cxy select 2002/03; the PCC (i.e:, Rxy ) selects 1977/78; the modified Petke index (i.e:, Pyxy ) selects 1994/95; and the modified WB index (i.e:, Qyxy ) selects 1991/92. Visually, it appears that the 1991/92 pattern (Fig. 8, first column, sixth row) has the best resemblance to the 2009/10 pattern (Fig. 7a). This pattern is selected as the second best by Pyxy and as the third best by the other five metrics. The 1977/78 pattern (Fig. 8, first column, fourth row) is much weaker than the 2009/10 pattern. This pattern does not enter the top-five lists of any other metrics; however, since Rxy does not address differences in mean or variance, this may be the ‘‘best shape match.’’ The 2002/03 event has the maximum spread too widely along the equator. This pattern is selected as the second best by Pyxy , the third best by Qyxy , and the fourth best by Rxy . The 1994/95 pattern is slightly weaker than the 2009/10 pattern, and its maximum is shifted farther to the west. A subjective experiment was performed to obtain the best analogs for the 2009/10 pattern from human observers.

1874

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

FIG. 8. As in Fig. 7a, but for NDJ seasons in the other 24 El Niño events.

A survey asking participants to identify the top-five analog list for the 2009/10 pattern in Fig. 7a from the other 24 patterns in Fig. 8 received 48 responses from meteorologists associated with the Pacific Storm Prediction Centre in Canada and the Hunan Provincial Meteorological Bureau in China. The survey results are summarized in Table 5. The MOS-based top-five analogs are 1) 1991/92, 2) 1965/66,

3) 1957/58, 4) 1994/95, and 5) 1986/87. More than half of 48 participants (29 people) selected the 1991/92 pattern as the best analog. In Table 4, only the modified WB index selects this pattern as the best analog. To compare the MOS-based top-five list in Table 5 with the lists found by the similarity metrics in Table 4, the order categories were first converted into a numerical

TABLE 4. The top-five analogs of the El Niño event during the NDJ season in 2009/10 based on the similarity between the SST anomaly y y patterns over the tropical Pacific. The degree of similarity is measured by six metrics: Dxy , Rxy , Cxy , Hxy , Pyxy , and Qyxy . The identified NDJ analogs are indicated by the corresponding years and the values of similarity metrics. Rank

Dxy (8C)

Rxy

y Cxy

y Hxy

Pyxy

Qyxy

1

2002/03 0.3018 1994/95 0.3497 1991/92 0.3856 1987/88 0.4190 2004/05 0.4248

1977/78 0.8602 1957/58 0.8583 1991/92 0.8577 2002/03 0.8445 1994/95 0.8431

2002/03 0.8437 1994/95 0.8059 1991/92 0.8031 1957/58 0.7760 1965/66 0.7459

2002/03 0.8332 1994/95 0.8037 1991/92 0.7987 1957/58 0.7652 1965/66 0.7325

1994/95 0.7287 2002/03 0.7192 1991/92 0.6973 1957/58 0.6220 1965/66 0.5807

1991/92 0.8394 1957/58 0.8343 2002/03 0.8338 1994/95 0.8333 1965/66 0.8057

2 3 4 5

OCTOBER 2014

1875

MO ET AL.

TABLE 5. The results from a survey asking participants to identify the top-five analog list for the SST anomaly pattern in Fig. 7a from the 24 patterns in Fig. 8. The survey received 48 responses. The italic integer is the number of judgments (i.e., nk) given to each rank for each pattern; those patterns without a single vote are not included. The MOS is calculated from (C1) in appendix C, with K 5 5 and N 5 48. In this case, 0 # MOS # 5. The last column shows the MOS-based top-five rankings. Rank judged by individuals (k) Events

1

2

3

4

5

MOS

Rank

1957/58 1963/64 1965/66 1968/69 1972/73 1977/78 1982/83 1986/87 1987/88 1991/92 1993/94 1994/95 1997/98 2002/03

7 4 3 — 1 — — 2 — 29 — — — 2

15 — 17 — — 1 — 3 1 6 — 4 — 1

10 — 16 1 1 5 — 3 — 5 — 7 — —

4 7 6 — 1 2 — 7 2 4 — 9 1 5

6 7 1 — 3 2 1 7 1 2 1 7 — 10

2.90 0.85 3.00 0.06 0.27 0.52 0.02 1.08 0.19 4.04 0.02 1.29 0.04 0.71

3 — 2 — — — — 5 — 1 — 4 — —

scale from 5 to 1 for the top-five patterns in the list and 0 for all the other patterns included in Tables 4 and 5. These transformed data were then used to compute the rank correlation coefficients (Kendall’s tau) for metavalidation of different similarity metrics. As shown in Table 6, the similarity assessment by the modified WB index (i.e:, Qyxy ) achieves the highest correlation with the human perception (MOS-based assessment). The rank correlation coefficient for Qyxy of 0.68 is the only one statistically significant at the 0.01 critical level (p 5 0.005). The assessment by the modified Petke index (i.e:, Pyxy ) achieves the second-best correlation. The rank correlay ) tions associated with the modified HR index (i.e:, Hxy y and the modified UCC (i.e:, Cxy ) are tied at the third position. These three are significant at the 0.05 level. The PCC and RMSD result in nonsignificant rank correlation coefficients 0.350 ( p 5 0.160) and 0.133 ( p 5 0.619), respectively. A further assessment was performed by applying each of the six similarity metrics to search for the best analog of the SST anomaly pattern over the tropical Pacific domain (208S–208N, 1208E–708W) for each of the 60 NDJ seasons. The analog-emulated Niño-3.4 index was computed from the best analog (as the average SST anomaly in the region bounded by 58N–58S, 1708– 1208W) and then compared to the observed Niño-3.4 index given in Fig. 7b. As shown in Fig. 9, the analogemulated and observed Niño-3.4 index series are highly correlated. The highest and lowest correlation scores are achieved by the modified Petke index (r 5 0:970) and the

TABLE 6. The rank correlation (Kendall’s tau) between the top-five analog rankings given in the last column of Table 5 and the corresponding rankings in Table 4 based on six similarity metrics: Dxy , Rxy , y y Cxy , Hxy , and Pyxy , and Qyxy . The corresponding two-tailed probability p is also given, with boldfaced values being significant at the 0.05 level.

Kendall’s tau Two-tailed p

Dxy

Rxy

y Cxy

y Hxy

Pyxy

Qyxy

0.133 0.619

0.350 0.160

0.550 0.023

0.550 0.023

0.583 0.016

0.683 0.005

PCC (r 5 0:916), respectively. The relative merits of these analog-emulated Niño-3.4 index series can be better inferred from the Taylor diagram. In Fig. 10, the points of the analog-emulated series that agree better with the observations lie closer to the point marked ‘‘OBS’’ on the x axis; these series have relatively smaller centered RMSD that requires the combination of both a high correlation and a small difference between the variances. All of the model biases (the differences between the observed mean and the analog-emulated mean) are negligibly small. It appears that the points associated with the modified Petke index and the RMSD agree best with observations. The worst performance is given by the PCC, which is not sensitive to differences in the mean and variance during the analog search. The second-worst performance is given by the modified UCC, which may not be sensitive to difference between the pattern variances. The performances of the modified HR index and the modified WB index are only slightly worse than those of the RMSD and the modified Petke index.

5. Concluding remarks Similarity is often a convenient concept in scientific studies concerning the goodness of match of two patterns (Cattell 1949). If only the strength of association, or the linear dependence, of two patterns is concerned, the PCC would suffice as an adequate similarity metric. In many circumstances, an appropriate measure of sameness is important when the concern is about how well things fit (e.g., observed and modeled series, or in the search for analog cases). If two time series are being compared, say modeled and observed, the similarity should not simply be a high degree of correlation, but they should also have similar mean, variance, and other attributes of time series, such as lag-1 autocorrelation and Hurst coefficient (Hurst 1951). As a complementary measure, the RMSD can be used to detect total similarity. It is shown in (7) that the RMSD is indeed capable of taking into account the impact of the three most important factors in similarity analysis: difference between the means, difference between the variances, and correlation. This composite measure is, however, not a good

1876

JOURNAL OF HYDROMETEOROLOGY

VOLUME 15

FIG. 10. The Taylor diagram summarizing the skill of the analogemulated Niño-3.4 index shown in Fig. 9. The bias (observation minus simulation) of each similarity metric is given in the legend.

FIG. 9. NDJ season time series of the observed (green dots) and analog-emulated (red and blue bars) Niño-3.4 index. The six similarity metrics are used to find the best analog for the SST anomaly pattern over the tropical Pacific (208S–208N, 1208E–708W) in each of the 60 NDJ seasons, and the analog-emulated Niño-3.4 index is computed from the SST anomalies in a subdomain (58S–58N, 1708– 1208W) of the identified best analog. The correlation coefficient r between the observed and emulated series is given in the panel title.

metric to monitor the degree of structural or phase alignment, and it turns out to be a poor indicator when the purpose is to assess signal quality and fidelity (Shukla et al. 2006; Wang and Bovik 2009; Brunet et al. 2012; Kufareva and Abagyan 2012). In addition to the traditional metrics, such as the PCC and the RMSD, many other similarity metrics have been

defined in the literature. Often, these have been constructed to address the insensitivity of correlation to differences in the mean and/or variance between the compared patterns. Four of these metrics were introduced and evaluated in this paper for meteorological and hydrological applications. Some of them (i.e., the UCC, the HR index, and the Petke index) do not always perform with the purported advantages over the PCC. Modified versions of these metrics were proposed to include explicit penalties for differences in the mean and/or variance. These metrics can be collectively referred to as modified correlation coefficients, as they can be generally related to the PCC (i.e:, Rxy ) in the form Sxy 5 Mxy Rxy , where Mxy (0 # Mxy # 1) is a modification factor that adjusts the value of similarity metric Sxy 2 [21, 1 1] downward (toward 0) when the compared patterns differ in means and/or in variances. It was pointed out that the modification factor of the modified UCC is not explicitly related to difference in the variance and, under certain circumstances, could adjust the similarity metric upward (toward 1 or 21) for larger difference in the variance. The UCC, either in its original or modified formulation, is not recommended for use in similarity analysis because of this inconvenient effect. For the modified HR index and the modified Petke index, Mxy represents the combined modification for differences in the mean and variance. For the WB index and its variants, Mxy can be decomposed into two components, representing different penalties due to difference between the means and difference between the variances.

OCTOBER 2014

1877

MO ET AL.

Two examples were used to illustrate the superiority of the recommended similarity metrics over the PCC in some common hydrometeorological applications. The first example is a simple demonstration of various relationships among the similarity metrics. The analysis of similarity between monthly precipitation time series from two nearby weather stations confirms the inequalities given in (24) and (26) and illustrates how some modified metrics are explicitly penalized for differences between the means and/or variances. The second example illustrates the application of similarity metrics in an analog analysis. It shows that different similarity metrics produce different groups of analogs, raising the practical question of how to choose an appropriate similarity metric for a specific application. Although the four nontraditional metrics have been demonstrated mathematically to have a confirmed advantage over the traditional PCC, it may be necessary to provide a more critical assessment of these and other similarity metrics, or even a framework that could be used in such an assessment. One possibility is to design a metavalidation model to compare the similarity assessments of various metrics with the human perception of similarity. In this study, two surveys were conducted to compare human’s subjective rankings on the similarity to those determined statistically. These metavalidations against the subjective mean opinion score (MOS) indicate that, for the precipitation time series examined in the first example, the best performance is achieved by the RMSD, with the order of merit given y y , Cxy )/(Qyxy , Qxy )/(Rxy , Pyxy )/Pxy / as Dxy /(Hxy Hxy /Cxy . For the spatial patterns in the analog analysis, the order of merit (with respect to the MOS) is given as y y , Cxy )/Rxy /Dxy . The contrasting Qyxy /Pyxy / (Hxy performances of the modified WB index and the RMSD in the second example (spatial similarity analysis) are consistent with previous studies on simulating human perception of image quality (Wang and Bovik 2002, 2009; Wang et al. 2004; Brunet et al. 2012). The similarity metrics were also objectively assessed based on their performances of predicting or simulating important signals embedded in the patterns. In the second example, the Niño-3.4 index was chosen as the important ENSO signal to validate the analogs of SST anomaly patterns selected by different similarity metrics. Using a Taylor diagram to evaluate the quality of the analog-emulated Niño-3.4 index against the observed Niño-3.4 index, the order of merit was found as y y , Qyxy )/Cxy /Rxy The slightly better (Dxy , Pyxy )/(Hxy y and Qyxy in performances of Dxy and Pyxy than Hxy this case (Fig. 10) could be attributed to the facts that Dxy and Pyxy are more sensitive to difference in the y and Qyxy , and the Niño-3.4 index is the variances than Hxy average of SST anomalies in an equatorial subdomain

where the maximum SST variances are located. It should be pointed out that these orders of merit are based on only one example, and there is no guarantee and no reason to think that this result would be general. Our final argument concerns the decomposability of similarity metrics. In a perfect world, one would be able to use metrics that are specific and robust. In assessing similarity, the interest is in two areas: the pattern association and the statistical distribution. In practice, one should be careful about arguing that a metric that blurs multiple features of time series or spatial patterns is better than others. Ideally, the pattern association (and its multiple properties) and the statistical distribution (and its multiple moments) should be examined separately, and these should be sensitive to real differences. An appealing characteristic of the four modified similarity y y , Hxy , Pyxy , and Qxy or Qyxy Þ is that they can metrics ðCxy be decomposed into two or three components to identify the pattern correlation and the penalties to the correlation in the presence of a difference in the mean and/or variance. Such decomposability allows a scalar index being also presented as a vector metric, which would be critical to making informed decisions in some applications (Taylor 2001; Crout et al. 2008; Brunet et al. 2012). Acknowledgments. We thank Chris Doyle, Paul Joe, Sean Fleming, and Prof. Roland Stull for insightful discussions and three anonymous reviewers for their thoughtful comments and suggestions on earlier drafts. The responses to our surveys from 48 Chinese and Canadian meteorologists are highly appreciated. We acknowledge NOAA/OAR/ESRL PSD for making the Extended Reconstructed Sea Surface Temperature data readily available. R.M. would like to thank Alison Dodd for her assistance in bibliographic research. C.Y. wishes to acknowledge the R&D Special Fund for the Public Welfare Industry (Meteorology, GYHY201306016) from the China Meteorological Administration and the research grant from the National Natural Science Foundation of China (Grant 41075034).

APPENDIX A Derivations of (9), (13), (19), (21), (23), and (25) This appendix describes the manipulation steps in the derivations of some equations. The following relations are repeatedly used in these derivations: N

N

n51

n51

å xn yn 5 å [(xn 2 x) 1 x][(yn 2 y) 1 y] 5 N(sxy 1 xy), (A1)

1878

JOURNAL OF HYDROMETEOROLOGY

N

N

å x2n 5 N(s2x 1 x2 ) 5 Nh2x , å y2n 5 Nh2y ,

n51

(A2)

Now from a trivial inequality a2 1 b2 $ 2ab, we have qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 s2 and dxy y qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 s2 . d2xy 1 sx2 1 sy2 $ 2sy dxy x

n51

N

N

n51

n51

å x0n2 5 Ns2x , å y0n2 5 N(d2xy 1 s2y ),

d2xy 1 sx2 1 sy2 $ 2sx

and

N

å x0n y0n 5 Nsxy ,

(A3)

n51

where x, y, sx , sy , and sxy are defined in (3); hx and hy are defined in (10); x0n and y0n are defined in (12); and dxy is defined in (5). Some expressions in the above equations are obtained from the fact that the summations over (xn 2 x) and (yn 2 y) vanish by definition. Substituting (A1) and (A2) into (8) results in (9). Substituting (A3) into (11) results in (13). Combining (17) with (8) and then using (A2) to rearrange the result we obtain (19). The first expression in (21) is obtained by y substituting (A3) into (20) and then using (2) to relate Hxy with Rxy . The second expression in (23) is obtained by combining the first expression with (8) and then using (A2) to rearrange the result. The second expression in (25) is obtained by substituting (A3) into the first expression and then using (2) to relate Pyxy with Rxy .

Combining (B3) with (B2) leads to the following inequality: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 s2 )(d 2 1 s2 ) (dxy x xy y qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 5 1, # y 21 21 2 2 2 1 s2 ) Cxy 2(sx sx 1 sy sy ) (dxy 1 sx )(dxy y y

4

Hxy

(B4) y y j # jCxy j, with the equality sign only which leads to jHxy for sx 5 sy when dxy 5 x 2 y 5 0. y and Rxy reThe third inequality in (26) between Cxy sults from rearranging (15), that is,

0 y Cxy

Rxy

1

sy sx 1B C 1 ffi 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA # (1 1 1) 5 1, 5 @qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 d 2 1 s2 d 2 1 s2 xy

y

Hxy

d2xy 1 sx2 1 s2y 2 1 max(s2 , s2 )] 2[dxy x y

#

2 1 s2 1 s2 dxy x y 2 1 s2 1 s2 2dxy x y

xy

y

APPENDIX C

The inequalities presented in (26) establish an ordering of the similarity metrics. To prove the first iny , combine (20) with equality in (26) between Pyxy and Hxy 2 2 (25) and note that 2 max(sx , sy ) $ (s2x 1 s2y ) yields 5

x

(B5)

Proofs of the Inequalities in (26)

y

(B3)

y which leads to jCxy j # jRxy j, with the equality sign only for sx 5 sy when dxy 5 x 2 y 5 0.

APPENDIX B

Pxy

VOLUME 15

# 1, (B1)

y j, with the equality sign only which leads to jPyxy j # jHxy for sx 5 sy when dxy 5 x 2 y 5 0. y To prove the second inequality in (26) between Hxy y and Cxy , first combine (15) with (20), and after some rearrangements, it gives

y

Hxy y

Mean Opinion Score The MOS is a subjective quality measure of an object (voice, image, similarity between two patterns, etc.), as perceived by a large number of people who respond to a test or survey (Williams and Moye 1971; Miyahara et al. 1998; Wang et al. 2004). Let N be the total number of responses obtained from the test or survey, in which people are asked to assign the object quality to one of K categories based on their perceived judgments. Assume that the opinion categories are designed in the order of decreasing quality (i.e., the best object in the first category and the worst in the last category). By converting the k opinion category into a numerical scale Ak 5 K 2 k 1 1, one can define the MOS as the arithmetic mean of all the individual scores, that is,

Cxy

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 s2 )(d 2 1 s2 ) 4 (dxy x xy y 5 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . 2 1 s2 1 s2 )(s21 d 2 1 s2 1 s21 d 2 1 s2 ): (dxy x y x xy x y xy y (B2)

MOS 5

1 N

K

1

K

å nk Ak 5 N å nk (K 2 k 1 1),

k51

(C1)

k51

where nk is the number of judgments given to the kth category.

OCTOBER 2014

MO ET AL. REFERENCES

Boas, F., 1922: The measurement of differences between variable quantities. J. Amer. Stat. Assoc., 18, 425–445, doi:10.1080/ 01621459.1922.10502487. Boyle, T. P., G. M. Smillie, J. C. Anderson, and D. R. Beeson, 1990: A sensitivity analysis of nine diversity and seven similarity indices. Res. J. Water Pollut. Control Fed., 62, 749–762. Brier, G. W., and R. A. Allen, 1951: Verification of weather forecasts. Compendium of Meteorology, T. F. Malone, Ed., Amer. Meteor. Soc., 841–848. Brunet, D., E. R. Vrscay, and Z. Wang, 2012: On the mathematical properties of the structural similarity index. IEEE Trans. Image Process., 21, 1488–1499, doi:10.1109/TIP.2011.2173206. Cannon, A. J., and W. W. Hsieh, 2008: Robust nonlinear canonical correlation analysis: Application to seasonal climate forecasting. Nonlinear Processes Geophys., 15, 221–232, doi:10.5194/ npg-15-221-2008. Carbó, R., L. Leyda, and M. Arnau, 1980: How similar is a molecule to another? An electron density measure of similarity between two molecular structures. Int. J. Quantum Chem., 17, 1185–1189, doi:10.1002/qua.560170612. Cattell, R. B., 1949: rp and other coefficients of pattern similarity. Psychometrika, 14, 279–298, doi:10.1007/BF02289193. Chen, T., C. Ye, D. Chen, J. Zhang, and C. Luo, 2013: Temperature variation contrast between Nanyue Mountain and lowelevation areas in past 58 years (in Chinese with English abstract). Meteor. Sci. Tech., 41, 713–719. Cronbach, L. J., and G. C. Gleser, 1953: Assessing similarity between profiles. Psychol. Bull., 50, 456–473, doi:10.1037/h0057173. Crout, N., and Coauthors, 2008: Good modelling practice. Environmental Modelling, Software and Decision Support, A. J. Jackman et al., Eds., Elsevier, 15–31. Delle Monache, L., T. Nipen, Y. Liu, G. Roux, and R. Stull, 2011: Kalman filter and analog schemes to postprocess numerical weather predictions. Mon. Wea. Rev., 139, 3554–3570, doi:10.1175/2011MWR3653.1. DelSole, T., and J. Shukla, 2006: Specification of wintertime North American surface temperature. J. Climate, 19, 2691–2716, doi:10.1175/JCLI3704.1. Ding, Y., 1994: Monsoons over China. Kluwer Academic, 419 pp. Ehret, U., and E. Zehe, 2011: Series distance—An intuitive metric to quantify hydrograph similarity in terms of occurrence, amplitude and timing of hydrological events. Hydrol. Earth Syst. Sci., 15, 877–896, doi:10.5194/hess-15-877-2011. Fleming, S. W., and P. H. Whitfield, 2010: Spatiotemporal mapping of ENSO and PDO surface meteorological signals in British Columbia, Yukon, and southeast Alaska. Atmos.–Ocean, 48, 122–131, doi:10.3137/AO1107.2010. Garrett-Mayer, E., 2006: Overview of standard clustering approaches for gene microarray data analysis. DNA Microarrays and Related Genomics Techniques: Design, Analysis, and Interpretation of Experiments, D. B. Allison and Coauthors, Eds., Chapman & Hall, 131–158. Geller, R. J., and C. S. Mueller, 1980: Four similar earthquakes in central California. Geophys. Res. Lett., 7, 821–824, doi:10.1029/ GL007i010p00821. Germann, U., and I. Zawadzki, 2002: Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology. Mon. Wea. Rev., 130, 2859–2873, doi:10.1175/1520-0493(2002)130,2859:SDOTPO.2.0.CO;2. Goodall, D. W., 1966: A new similarity index based on probability. Biometrics, 22, 882–907, doi:10.2307/2528080.

1879

Grenier, P., A.-C. Parent, D. Huard, F. Anctil, and D. Chaumont, 2013: An assessment of six dissimilarity metrics for climate analogs. J. Appl. Meteor. Climatol., 52, 733–752, doi:10.1175/ JAMC-D-12-0170.1. Gressens, O., and E. D. Mouzon, 1927: The validity of correlation in time sequences and a new coefficient of similarity. J. Amer. Stat. Assoc., 22, 483–492, doi:10.1080/01621459.1927.10502977. Gutzler, D. S., and J. Shukla, 1984: Analogs in the wintertime 500 mb height field. J. Atmos. Sci., 41, 177–189, doi:10.1175/ 1520-0469(1984)041,0177:AITWMH.2.0.CO;2. Hodgkin, E. E., and W. G. Richards, 1987: Molecular similarity based on electrostatic potential and electric field. Int. J. Quantum Chem., 32, 105–110, doi:10.1002/qua.560320814. Hollingsworth, A., K. Arpe, M. Tiedtke, M. Capaldo, and H. Savijaervi, 1980: The performance of a medium range forecast model in winter—Impact of physical parameterizations. Mon. Wea. Rev., 108, 1736–1773, doi:10.1175/ 1520-0493(1980)108,1736:TPOAMR.2.0.CO;2. Hurst, H. E., 1951: Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civ. Eng., 116, 770–799. Kendall, M. G., 1938: A new measure of rank correlation. Biometrika, 30, 81–93, doi:10.1093/biomet/30.1-2.81. ——, 1975: Rank Correlation Methods. 4th ed., Charles Griffith, 212 pp. Kufareva, I., and R. Abagyan, 2012: Methods of protein structure comparison. Homology Modelling: Methods and Protocols, A. J. W. Orry and R. Abagyan, Eds., Methods in Molecular Biology, Vol. 857, Humana Press, 232–257. Legates, D. R., and G. J. McCabe, 1999: Evaluating the use of ‘‘goodness-of-fit’’ measures in hydrologic and hydroclimatic model validation. Water Resour. Res., 35, 233–241, doi:10.1029/ 1998WR900018. Livezey, R. E., and A. G. Barnston, 1988: An operational multifield analog/antianalog prediction system for United States seasonal temperatures: 1. System design and winter experiments. J. Geophys. Res., 93, 10 953–10 974, doi:10.1029/JD093iD09p10953. Lorenz, E. N., 1969: Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci., 26, 636–646, doi:10.1175/1520-0469(1969)26,636:APARBN.2.0.CO;2. Maggiora, G. M., J. D. Petke, and J. Mestres, 2002: A general analysis of field-based molecular similarity indices. J. Math. Chem., 31, 251–270, doi:10.1023/A:1020784004649. McCuen, R. H., and W. M. Snyder, 1975: A proposed index for comparing hydrographs. Water Resour. Res., 11, 1021–1024, doi:10.1029/WR011i006p01021. Miyahara, M., K. Kotani, and V. R. Algazi, 1998: Objective picture quality scale (PQS) for image coding. IEEE Trans. Commun., 46, 1215–1226, doi:10.1109/26.718563. Miyakoda, K., G. D. Hembree, R. F. Strickler, and I. Shulman, 1972: Cumulative results of extended forecast experiments: I. Model performance for winter cases. Mon. Wea. Rev., 100, 836–855, doi:10.1175/1520-0493(1972)100,0836:CROEFE.2.3.CO;2. Mo, R., J. Fyfe, and J. Derome, 1998: Phase-locked and asymmetric correlations of the wintertime atmospheric patterns with the ENSO. Atmos.–Ocean, 36, 213–239, doi:10.1080/ 07055900.1998.9649612. ——, P. I. Joe, C. Doyle, and P. H. Whitfield, 2014: Verification of an ENSO-based long-range prediction of anomalous weather conditions during the Vancouver 2010 Olympics and Paralympics. Pure Appl. Geophys., 171, 323–336, doi:10.1007/ s00024-012-0523-3. Murphy, A. H., 1988: Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon. Wea.

1880

JOURNAL OF HYDROMETEOROLOGY

Rev., 116, 2417–2424, doi:10.1175/1520-0493(1988)116,2417: SSBOTM.2.0.CO;2. ——, 1995: The coefficients of correlation and determination as measures of performance in forecast verification. Wea. Forecasting, 10, 681–688, doi:10.1175/1520-0434(1995)010,0681: TCOCAD.2.0.CO;2. ——, and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 1330–1338, doi:10.1175/ 1520-0493(1987)115,1330:AGFFFV.2.0.CO;2. Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through conceptual models. Part I—A discussion of principles. J. Hydrol., 10, 282–290, doi:10.1016/0022-1694(70)90255-6. Nourani, V., P. Monadjemi, and V. P. Singh, 2007: Liquid analog model for laboratory simulation of rainfall–runoff process. J. Hydrol. Eng., 12, 246–255, doi:10.1061/ (ASCE)1084-0699(2007)12:3(246). Pearson, K., 1896: Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philos. Trans. Roy. Soc. London, 187A, 253–318, doi:10.1098/rsta.1896.0007. Petke, J. D., 1993: Cumulative and discrete similarity analysis of electrostatic potentials and fields. J. Comput. Chem., 14, 928– 933, doi:10.1002/jcc.540140808. Philander, S. G., 1990: El Niño, La Niña, and the Southern Oscillation. Academic Press, 293 pp. Potts, J. M., C. K. Folland, I. T. Jolliffe, and D. Sexton, 1996: Revised ‘‘LEPS’’ scores for assessing climate model simulations and long-range forecasts. J. Climate, 9, 34–53, doi:10.1175/ 1520-0442(1996)009,0034:RSFACM.2.0.CO;2. Ratnam, J. V., S. K. Behera, Y. Masumoto, K. Takahashi, and T. Yamagata, 2012: Anomalous climatic conditions associated with the El Niño Modoki during boreal winter of 2009. Climate Dyn., 39, 227–238, doi:10.1007/s00382-011-1108-z. Ren, H. L., J. F. Chou, J. P. Huang, and P. Q. Zhang, 2009: Theoretical basis and application of an analogue-dynamical model in the Lorenz system. Adv. Atmos. Sci., 26, 67–77, doi:10.1007/ s00376-009-0067-3. Salton, G., and M. E. Lesk, 1968: Computer evaluation of indexing and text processing. J. ACM, 15, 8–36, doi:10.1145/321439.321441. Santer, B. D., T. M. L. Wigley, and P. D. Jones, 1993: Correlation methods in fingerprint detection studies. Climate Dyn., 8, 265– 276, doi:10.1007/BF00209666. Sarachik, E. S., and M. A. Cane, 2010: The El Niño–Southern Oscillation Phenomenon. Cambridge University Press, 369 pp. Sepkoski, J. J., Jr., 1974: Quantified coefficients of association and measurement of similarity. Math. Geol., 6, 135–152, doi:10.1007/ BF02080152. Shukla, J., D. A. Paolino, D. M. Straus, D. DeWitt, M. Fennessy, J. L. Kinter, L. Marx, and R. Mo, 2000: Dynamical seasonal predictions with the COLA atmospheric model. Quart. J. Roy. Meteor. Soc., 126, 2265–2291, doi:10.1256/smsqj.56713. ——, T. DelSole, M. Fennessy, J. Kinter, and D. Paolino, 2006: Climate model fidelity and projections of climate change. Geophys. Res. Lett., 33, L07702, doi:10.1029/2005GL025579. Smith, D. M., R. Eade, and H. Pohlmann, 2013: A comparison of full-field and anomaly initialization for seasonal to decadal climate prediction. Climate Dyn., 41, 3325–3338, doi:10.1007/ s00382-013-1683-2. Smith, T. M., R. W. Reynolds, T. C. Peterson, and J. Lawrimore, 2008: Improvements to NOAA’s historical merged land– ocean surface temperature analysis (1880–2006). J. Climate, 21, 2283–2296, doi:10.1175/2007JCLI2100.1.

VOLUME 15

Spearman, C., 1904: The proof and measurement of association between two things. Amer. J. Psychol., 15, 72–101, doi:10.2307/ 1412159. Stanski, H. R., L. J. Wilson, and W. R. Burrows, 1989: Survey of common verification methods in meteorology. WMO World Weather Watch Tech. Rep. 8, WMO/TD 358, 114 pp. Taylor, K. E., 2001: Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res., 106, 7183– 7192, doi:10.1029/2000JD900719. Teweles, S., and H. Wobus, 1954: Verification of prognostic charts. Bull. Amer. Meteor. Soc., 35, 455–463. Toth, Z., 1991: Intercomparison of circulation similarity measures. Mon. Wea. Rev., 119, 55–64, doi:10.1175/ 1520-0493(1991)119,0055:IOCSM.2.0.CO;2. van den Dool, H. M., 1987: A bias in skill in forecasts based on analogues and antilogues. J. Climate Appl. Meteor., 26, 1278–1281, doi:10.1175/1520-0450(1987)026,1278:ABISIF.2.0.CO;2. ——, 1989: A new look at weather forecasting through analogues. Mon. Wea. Rev., 117, 2230–2247, doi:10.1175/ 1520-0493(1989)117,2230:ANLAWF.2.0.CO;2. Wang, B., H.-J. Kim, K. Kikuchi, and A. Kitoh, 2011: Diagnostic metrics for evaluation of annual and diurnal cycles. Climate Dyn., 37, 941–955, doi:10.1007/s00382-010-0877-0. Wang, H., and K. Fan, 2009: A new scheme for improving the seasonal prediction of summer precipitation anomalies. Wea. Forecasting, 24, 548–554, doi:10.1175/2008WAF2222171.1. Wang, Z., and A. C. Bovik, 2002: A universal image quality index. IEEE Signal Process. Lett., 9, 81–84, doi:10.1109/97.995823. ——, and ——, 2009: Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag., 26, 98–117, doi:10.1109/MSP.2008.930649. ——, and Q. Li, 2011: Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process., 20, 1185–1198, doi:10.1109/TIP.2010.2092435. ——, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, 2004: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process., 13, 600–612, doi:10.1109/ TIP.2003.819861. Ward, M. N., and C. K. Folland, 1991: Prediction of seasonal rainfall in the north nordeste of Brazil using eigenvectors of sea-surface temperature. Int. J. Climatol., 11, 711–743, doi:10.1002/joc.3370110703. Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed., Elsevier, 704 pp. Williams, G., and L. S. Moye, 1971: Subjective evaluation of unsuppressed echo in simulated long-delay telephone communications. Proc. Inst. Electr. Eng., 118, 401–408, doi:10.1049/ piee.1971.0074. Willmott, C. J., 1981: On the validation of models. Phys. Geography, 2, 184–194, doi:10.1080/02723646.1981.10642213. Wu, M. R., B. J. Snyder, R. Mo, A. J. Cannon, and P. J. Joe, 2013: Classification and conceptual models for heavy snowfall events over East Vancouver Island of British Columbia, Canada. Wea. Forecasting, 28, 1219–1240, doi:10.1175/WAF-D-12-00100.1. Yarnal, B., 1984: A procedure for the classification of synoptic weather maps from gridded atmospheric pressure surface data. Comput. Geosci., 10, 397–410, doi:10.1016/0098-3004(84)90041-4. Zawadzki, I. I., 1973: Statistical properties of precipitation patterns. J. Appl. Meteor., 12, 459–472, doi:10.1175/1520-0450(1973)012,0459: SPOPP.2.0.CO;2.

Suggest Documents