like the pressure model in our Stage 3 can employ a different model within a similar analysis. There is an interesting challenge to the simplicity school in this ...
BAYESIAN STATISTICS 6, pp. 000--000 J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (Eds.) Oxford University Press, 1998
Bayesian Methods in the Atmospheric Sciences L. MARK BERLINER Ohio State University, USA and National Institute of Statistical Sciences, USA J. ANDREW ROYLE, CHRISTOPHER K. WIKLE and RALPH F. MILLIFF National Center for Atmospheric Research, USA SUMMARY Current issues in the atmospheric and allied sciences are of fundamental interest to both the scientific community and the general population. We briefly review some of these issues and the basic methods by which they are typically analyzed. The goal of this review is to motivate numerous challenges to the Bayesian Statistics community. Two essential aspects amenable to Bayesian analyses are (1) the efficient combination of information (‘‘science’’ and ‘‘data’’), and (2) the quantification and management of uncertainty. In the face of the complexities arising from the presence of massive datasets as well as nonlinear physical models, we suggest Bayesian hierarchical modeling and analysis as a fundamental strategy for statistical approaches to problems of weather and climate science. We present a review of our work in an ongoing project employing physical principles and satellite (scatterometer) data providing information on sea-surface winds. Keywords:
CLIMATE MODELS; HIERARCHICAL BAYESIAN ANALYSIS; MARKOV CHAIN MONTE CARLO;
NUMERICAL WEATHER FORECASTING; PARAMETERIZATION; PHYSICS; PRIOR ELICITATION; SATELLITE DATA.
1. INTRODUCTION ‘‘Global warming. Those words encapsulate for the public all the complex issues involved in anticipated future climate change associated with observed greenhouse gas increases. ... The only quantitative tools we have for predicting future climate are climate models which include as their central components atmospheric and oceanic General Circulation Models (GCMs).’’ This motivation opens a highly recommended book on climate system modeling (Trenberth, 1992). The issue is hardly simple. Schneider (1992) notes ‘‘The Earth’s climate changes. ... In the future it will surely continue to evolve. In part the evolution will be driven by natural causes, such as fluctuations in the Earth’s orbit. But future climatic change, unlike that of the past, will probably have another source as well: human activities.’’ Though details remain unclear, many scientists now find the notion that our climate has begun to display impacts of anthropogenic effects highly plausible (Houghton et al., 1996). However, quantification of impacts remains elusive. Weather prediction is also of fundamental societal importance. Major centers throughout the world (e.g., The National Centers for Environmental Prediction (NCEP) in the USA
2
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli
and the European Centre for Medium Range Weather Forecasting (ECMWF)) engage in ambitious efforts involving the collection and processing of remarkably vast amounts of data in combination with large-scale numerical models for the temporal evolution of weather. However, there is more to the story than providing information for local weather predictions given in the ‘‘evening news.’’ Predictions of major weather events are extremely challenging and of obvious importance. Local phenomena, such as tornadoes, are known to be difficult to predict, yet carry a devastating impact. Even straight-line high winds can be a severe hazard, particularly in developing countries. Oceanic weather is clearly important to coastal areas as well as ocean vessels; further, optimal routing of transoceanic shipping to follow or avoid major currents (e.g., the Kurishio in the Pacific and the Gulf Stream in the Atlantic) can have significant economic impacts. Moderate range (both in time and space) prediction of weather behavior is the subject of intense current research. Better predictive ability regarding monsoons throughout Asia, seasonal flooding and droughts, etc., impact both local and global economics as well as human health and well-being. The Winter of 1998 has made the public, especially residents of California and Florida in the USA and fishermen on the western coast of South America, keenly aware of the impacts of El Nin~o and the value in its prediction. The Perceived Role of Statistics. Epstein (1985) wrote ‘‘There are two general sources of uncertainty that are inherent in all the predictions that face the climatologist. One involves the fundamental lack of predictability of the climate system. ... The other ... is the fact that our knowledge is imperfect.’’ He suggested ‘‘... the very best we can ever hope to do in the way of climate prediction will involve uncertainty.’’ While few deny that uncertainty is present in the climate sciences, it is fair to claim that little work has been done that would be viewed as formally Bayesian. A variety of reasons for this circumstance exist. Some of these involve tradition and patterns of practice within the atmospheric science community. One essential issue involves the perceived view of statistics among physical scientists. It appears that statistics is viewed as a discipline separate from physics. Statistics is thought to be useful in estimating certain model features and in confirming or denying particular scientific models, but in some sense that is all. Consider the comments of Schneider (1992): ‘‘We attempt to determine the future behavior of the climate system from knowledge of its past behavior and present state, basically taking two approaches. One, the ‘empirical statistical,’ uses empirical statistical method, such as regression equations with past and present observations, to obtain the most probable extrapolation. The other uses ‘first principles:’ equations believed to represent the physical, chemical, and biological processes governing the climate system for the scales of interest. ... Since the statistical approach depends on historical data, it is obviously limited to predicting climates that have been observed or are not caused by new processes. ... Thus, the more promising approach to climate prediction .. is [the second approach]. Then, the validation of the predictions of such models becomes a chief concern.’’ The view of Schneider is hardly unique, rather it is prevalent. It seems to suggest that there are two disjoint ways to think about climate prediction: deterministic modeling or stochastic
Bayesian Methods in the Atmospheric Sciences
3
climate modeling. Though exceptions exist, the discipline is currently dominated by the deterministic view of modeling. The first point of our thesis is that the right view is to actually seek methods which are combinations of these extremes. The second is that hierarchical Bayesian thinking provides a mechanism for achieving effective combined models. We do not mean to suggest that we seek to merely convince physical scientists that bi-polar views of models as statistical versus physical are untenable. We also believe that as a group the statistical community puts too little value in the use of physics in analyses. In general, useful stochastic prediction models must genuinely combine both viewpoints. We believe that aggressive Bayesian modeling is the key. To achieve this, substantial collaboration and learning on the parts of both statisticians and physical scientists are needed. Before proceeding, note that this article is not intended to give a complete review of all the opportunities for Bayesian contributions in the atmospheric sciences. In addition, while we emphasize use of hierarchical models, we do not provide a review of the literature concerning their development and application. 2. PHYSICAL MODELING OVERVIEW To lay groundwork for contributing Bayesian thinking, we briefly review selected aspects of physical climate and weather modeling, and associated uncertainties introduced in the process. See Ehrendorfer (1997) for additional discussion. Physical modeling of processes related to weather and climate focuses on the mathematical description of so-called field variables. These are continuous functions of both space and time; examples include temperature and pressure. Mathematical modeling leads to partial differential equations for these variables. However, as described by Holton (1992, p.2), ‘‘The general set of partial differential equations governing the motions of the atmosphere is extremely complex; no general solutions are known to exist. To acquire an understanding of the physical role of atmospheric motions in determining the observed weather and climate, it is necessary to develop models based on systematic simplification of the fundamental governing equations.’’ 2.1. Global Numerical Models Houghton (1994, p. 62) explains ‘‘Numerical models of the weather and climate are based on the fundamental mathematical equations which describe the physics and dynamics of the movements and processes taking place in the atmosphere, the ocean, the ice and on the land.’’ Dynamics are developed employing Newton’s Laws on motion and conservation laws, all adjusted for the Earth’s rotation (i.e., Coriolis acceleration). Approximate solutions to the partial differential equations are obtained via numerical integration on a discretized spatial grid and a discretization of time. For atmospheric models, grid boxes on the order of 100 km on a side at the surface and roughly 1 km in the vertical are used. Discretization of the laws of physics leads to uncertainty. Features important to the process, but not resolved at this grid resolution are parameterized (one view of parameterization is given below). Depending on one’s purpose, separate models for the atmosphere, ocean, etc., are ‘‘coupled’’ via various
4
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli
procedures. 2.2. Global Numerical Weather Forecasting Major analysis centers (NCEP and ECMWF) provide forecasts that combine both observational data and numerical models. The process is generally known as data assimilation. (See Daley, 1991.) Steps involve the compilation of vast amounts of observational data, leading to estimates of the ‘‘current state’’ of atmospheric variables. Large-scale statistical analyses are a component process. ‘‘Current state’’ estimates are then used to initialize deterministic GCM’s, whose output lead to forecasts. Grid size, integration technique, simplification in models, and role of observational data all vary, depending on the scope and goal of the analyses. It is well recognized that uncertainties arising in boundary/initial condition estimation based on observational data add difficulties. Further, the basic equations used in developing a model are highly nonlinear leading to concerns regarding limits on predictability and chaos. The notion of ensemble forecasting, in which the numerics are done for an ensemble of boundary/initial conditions is an attempt at accounting for such uncertainty. The level of success is quite remarkable, though there is substantial room for improvements, including uncertainty management, in the process. We believe that the Bayesian viewpoint has much to contribute. 2.3. Parameterization Parameterization in deterministic dynamical systems modeling is prevalent. Crucial parameterizations in climate models include the development and evolution of clouds and convective processes (transfer of heat). Further, the choice and validity of a given parameterization is often the source of the greatest debates over a model. Consider a dynamical system, consisting of a very large number of evolving variables. In a preliminary step, the variables are broken into two groups, say x and y . The x variables are crucial in some way; our goal is to ‘‘get rid’’ of y . The complete system is hypothesized to evolve according to the model:
xt+1 = f (xt ; yt) yt+1 = g(xt; yt):
(1) (2)
We follow the following recipe: 1. Suggest a summarizing function, say T (y ), which takes the high dimensional y down to a relatively small dimensional summary, T , along with a companion dynamical function, say f~, hoping that (1) is well approximated by
xt+1 = f~(xt; T (yt)):
(3)
2. Build a parameterization of the form
T (yt) h(xt ; );
(4)
Bayesian Methods in the Atmospheric Sciences
5
where is some parameter. 3. The dimension reduction is completed by applying (4) in (3) leading to an approximating system:
xt+1 = f~(xt; h(xt; )):
(5)
The key is that (5) is an autonomous dynamical system typically of much lower dimension than the original involving both x and y . Clearly, there is strong potential for statisticians to make significant contributions in the process. For example, immediate opportunities exist in Step 2. This is a regression problem, and is often based on observational data as well as ‘‘physics.’’ Also, quantification of errors is needed at every step, including the initial construction of the dynamical system. 2.4. Scales in Space and Time The physics deemed most important in a particular analysis depend on the spatial and temporal scale of the phenomenon of interest. Careful scale analysis can provide substantial reductions in the level of complexity of a model; the model is constructed so that unimportant physics are ignored. The spatial scales of interest are broadly categorized as macro-, meso-, and microscales. Macroscale phenomena are those seen on planetary or synoptic distances on the order of 106 m to 107 m. The phenomena include certain forms of planetary wave behavior as well as very large scale cyclonic behavior. Mesoscale phenomena operate at length scales ranging from 103 m to 105 m. These levels include hurricanes, the ‘‘fronts’’ of the weather forecaster, and convective storm systems (e.g., super-cell thunderstorms). Finally, the microscale phenomena of interest include molecular motions on the order of 10?7 m up to wind gusts operating on 10 m to 100 m. The sort of modeling, parameterizations, etc. needed at these scales vary substantially. Also, the location chosen for modeling determines the crucial model aspects. For example, the tropics are known to have moisture and thermodynamic properties very different than those of the Arctic. An important relation between such analyses and statistical modeling is the possibility, depending on the spatial and temporal scales of interest, of employing purely stochastic models as a replacement for the complex differential equation models. Some physical scientists view the question as a dichotomy between determinism and randomness, physics versus statistics, von Neumann versus Weiner, etc. (See Lorenz, 1987.) Operationally, the methods involve the replacement of models for the time evolution of a spatial field via differential equations by a simple, though very high dimensional vector autoregression, usually of order 1. The matrix of regression parameters is estimated via least squares.
6
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli
3. HIERARCHICAL BAYESIAN MODELING Essential challenges to the statistician developing models for applications to climate and weather include: Complexity, Massiveness, and Uncertainty. Our suggestion is to employ Bayesian hierarchical analysis. First, the Bayesian view is arguably the most efficient approach to uncertainty management and combination of information. Indeed, the Bayesian viewpoint is to incorporate all available ‘‘prior’’ information. Further, in the presence of complexity and massiveness, the hierarchical view emerges as a plausible modeling tool. However, hierarchical modeling is not merely necessary. Rather, we suggest that it actually enhances one’s ability to incorporate available understanding of the physical processes at hand. The essence of the hierarchies we suggest is specification of parameterized process models:
[process; parameters] = [processjparameters][parameters]: Bayes’ Theorem of course provides [process, parametersjdata] by combining observational data model [datajprocess, parameters].
(6) (6) with an
Two crucial aspects relating to the power of the hierarchical view warrant discussion. The first involves the separation of statistical modeling of observational data from stochastic modeling of the underlying physical processes. Consider a hypothetical example. Suppose the data form a spatially indexed matrix of observations of some process variable (e.g., a discretized field). Our first task is to formulate the conditional distribution of the data, given the true process at the observation time. This view has remarkable by-products in physical modeling. The key is that the specification of [datajprocess, parameters] ought to be much simpler than the standard non-Bayesian analog. Namely, a non-Bayesian statistical analysis typically focuses on the specification of [datajparameters]. Instead, the key is to formulate the background understanding of the measurement process separated from the sources of variation due to the process itself. For the non-Bayesian, [datajparameters] in principle includes spatial (and temporal) dependence structures arising in the spatial structure of the process. A particularly important advantage for the hierarchical Bayesian involves combining observational data from various sources. For example, a non-Bayesian joint probability model for rain gauge data and satellite derived precipitation data (relevant to an area containing the gauges) must carefully account for the dependence structure of these data. Measurement error models, conditioned on true precipitation, are certainly more readily modeled for each system in turn (gauges, satellites, etc.). Finally, unlike non-Bayesian counterparts, this view enables ready analysis of problems with missing data. The second point involves the multitude of potential modeling strategies that can build on the physicist’s understanding of the interrelationships of various process variables at differing scales. Namely, rather than specifying the joint process model [processjparameters] directly, the analyst can do so hierarchically. Such hierarchies can involve relationships among weather variables (e.g., winds determined by pressure; precipitation as determined by water vapor and temperature, etc.). Hierarchical thinking can also be used to model spatial and temporal variation of a particular variable. Note that the construction of a hierarchical model generalizes the construction of parameterizations in physical modeling as discussed earlier.
7
Bayesian Methods in the Atmospheric Sciences 3.1. Temporal Models Consider modeling a single variable over time; namely, a time series ‘‘Markovian’’ modeling begins with the expansion
[X j1] = [X1j1]
T Y
X = (X1; : : :; XT ).
[XtjX1; : : :; Xt?1; 1]:
t=2
(7)
(See Berliner (1996).) Next, one assumes that for each t, [Xt jX1; : : :; Xt?1; 1] actually depends on a recent subset of the past; e.g., a first order Markov model assumes [XtjX1; : : :; Xt?1; 1 ] = [XtjXt?1 ; 1]. Suppose two time series X1 and X2 are available. Useful models for [X1jX2; 1] can be built based on components like
[X1tjX11; : : :; X1;t?1; X21; : : :; X2;t?1; 1] = [X1tjX1;t?1; X2;t?1; 1]; for example. In an analysis of runoff (X1) from a lake and rainfall (X2), Lu and Berliner (1998) build a hierarchical Bayesian model in which recent past runoffs and rainfalls conditionally drive current runoff. Separate modeling for rainfall can then be pursued. 3.2. Spatial Models A variety of spatial models (e.g., Markov random fields, Markov Meshes, etc.) are available to hierarchical modelers (Cressie, 1993). In dealing with multiple spatial variables, hierarchical layers can again be considered as in temporal settings. We present an example in the next section. 3.3. Space-Time Numerous strategies can be employed in the development of hierarchical space-time models. Wikle, Berliner, and Cressie (1998) provide a description of one general class of space-time models. An example that begins to confront massive geophysical data sets is Wikle, Milliff, Nychka, and Berliner (1998). 4. SEA-SURFACE WINDS Our application is concerned with prediction of near-surface winds on a grid of points over the Labrador Sea. An idealized two-dimensional grid of N points is constructed over the region of interest (see Figure 1). The component of the wind vector in the ‘‘ x-direction’’ (to the approximate East) is denoted by u; the component in the ‘‘y -direction’’ (to the approximate North) is denoted by v . The analysis is to incorporate scatterometer observations. A scatterometer is a satellite-borne device that infers near-surface (10 m) wind estimates by analyzing the backscatter (by ocean surface waves) of returned radar pulses. How this inversion is performed is an interesting problem in its own right, but not our topic here. (See Naderi et al.,
8
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli 66
64
62
60
58
56
54
52
50 300
305
310
315
320
Figure 1. NSCAT Scatterometer Data and ECMWF Grid
1991, and Stoffelen and Anderson, 1997.) The data (in m s?1 ) obtained are of high-resolution, but irregular in both space and time. The data arise as averages within small rectangles (‘‘wind vector cells’’) of dimension 50 km by 50 km. It is likely that the measurement error variance is small. However, the fact that observations from a typical ‘‘time-slice’’ come from two swaths separated by 100 minutes does induce error. For this example, the grid locations were chosen to correspond to the ECMWF grid depicted in Figure 1. The scatterometer data for a single day are indicated by wind vectors in Figure 1. A non-Bayesian statistician might well start by considering the spatial dependence structure of the data. Clearly, however, concerns arise due to complexities in the data source as well as issues of spatial extrapolation as opposed to mere spatial interpolation. The Bayesian statistician might begin by separating the measurement problem from the underlying true wind field and its dependence structure. However, complexities endure with this first level hierarchical view. Let U and V each be N -vectors of the u and v components of the wind field at the grid locations. Direct development of a stochastic model for this process is challenging. Our approach was based on seeking to explain ‘‘wind.’’ At an appropriately averaged spatial scale, wind is approximately air moving from high to low pressure, modified by the rotation of the planet. (The appropriate scale has averaged out microscale turbulence, etc.) That is, to some approximation, the wind field ought to roughly follow the gradient of a pressure field. Our suggestion is to divide modeling efforts into (1) a conditional model for winds given pressure information, and (2) a model for pressures. This is
9
Bayesian Methods in the Atmospheric Sciences
an example of a ‘‘hidden process model,’’ since we used no pressure data. The consensus among our subject-matter experts was that we could indeed develop a viable pressure model based on experience reported in the literature. Our first att empt is described next. Thiebaux (1985) used the geostrophic relationship (e.g. Holton, 1992, p. 59) to relate geopotential height fields (denoted by Z ) and the u and v components of wind. (Geopotential heights will be related to the gradient of pressure later.) The geostrophic relationship assumes a balance between the gradient of geopotential height and the Coriolis acceleration, e.g., the planetary rotation term. The relationship holds fairly well in the extratropics, above the planetary boundary layer (approximately 1 km). However, below the planetary boundary layer, the effects of friction are significant. Thus, we would not expect a model based solely on this relationship to be adequate for surface wind fields in the Labrador Sea. Nevertheless, the relationship does provide some physically based information that can be built into a model in a hierarchical framework. Thiebaux (1985) derived geostrophic covariance relationships under an additional assumption that the Z field is a realization of second-order (spatial) autoregressive stochastic process. Although a low-order autoregressive model is somewhat simplistic, the covariance model from such a process has been shown to model many atmospheric phenomenon adequately, in both the spatial and spectral domains. The stationary spatial autoregressive covariance model for locations s and s0 is:
Cov(Z (s); Z (s0)) = [cos(2 jjs ? s0 jj) + 1 sin(2jjs ? s0 jj)]e?1jjs?s jj ; 0
2
(8)
where jjs ? s0 jj denotes the Euclidean distance between the locations. This exponentially damped sinusoid produces good empirical fits to data which exhibit quasi-periodic correlation structure which can have negative correlations. Although Thiebaux’s work was concerned with the geostrophic relationship on a constant pressure surface, our interest is with a constant elevation (e.g. 10m). The geostrophic relationship on such a surface then relates the pressure gradient field to the Coriolis acceleration (e.g. Holton, 1992, p. 40). 4.1. The Model The scatterometer data, denoted by Du and Dv are not, in general, observed at grid locations. There are typically 2 or 3 observations with in each grid cell (see Figure 1). We relate these observations to the u and v components of wind at grid sites in the measurement stage of the model:
! ! Hu U ; e2 I 0 ): (9) e Hv V 0 e2I Here, Hu and Hv are matrices which map the u and v components at the grid locations to the data locations. If u and v are always observed together (which they are, in the dataset analyzed here), then we specify Hu = Hv = H . A simple form for these matrices, and the one that we
Stage 1:
[Du ; Dv jU; V; Hu; Hv ; 2] is Gau(
10
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli
will use here, maps observation sites to the nearest grid location (Wikle, Berliner, Cressie, 1997). That is, it is assumed that observations within the grid box are observations of u and v at the grid node plus independent, mean zero measurement error. Formally, the rows of H are all 0’s with a single 1, indicating the element of U with which the particular observation is associated. One could consider a more complicated H matrix by taking rows to be interpolating weights, mapping the gridded process to data locations. In this example the measurement error variances are assumed constant in space. Stage 2: We now concentrate on a model for the U and V processes. Let the N -vector P be the pressure field at our N grid locations. We specify the distribution of U and V , conditional on P as: !
[U; V jP; Bu ; Bv ; ] is Gau( BBu PP v
; )
(10)
Some flexibility can be permitted in specification of . We discuss this in Section 4.2. We must also choose parameterizations for the matrices Bu and Bv . These could conceivably be very complicated. However, in this problem a natural choice arises based on physical arguments, see Section 4.3. Stage 3: Next, we formulate a stochastic model for the pressure field, P . At this stage of the model, we incorporate information derived from the second-order spatial autoregressive model previously discussed. Although P is defined on a lattice, we base its distribution on a model generating a pressure field with continuous spatial index. In particular, we assume that the pressure field is a Gaussian random process with global (prior) mean zero and covariance kernel p2 r(s; t). That is, the covariance of the pressure at arbitrary locations, s; t is given by p2r(s; t). (Note that we are assuming a priori that the variance p2 of the pressure is the same at all sites.) This model implies that the gridded values P have a multivariate Gaussian distribution with correlation matrix R, where the entries in R are the appropriate values of r for the gridded sites. That is,
[P jp ; p2; R1;2 ] is Gau(p1; p2R1;2 );
(11)
where, p is set to zero everywhere and 1 and 2 are the parameters in the covariance function given by (8). It is hoped that by putting (meaningful) covariance structure at this stage, the remaining structure can be explained more parsimoniously in the mean of the stage 2 conditional, thus permitting a simplified parameterization of . The basic idea of this model for the pressure field is to place very strong spatial structure in P and use this information to produce (primarily geostrophic) predictions of u and v away from the data locations.
11
Bayesian Methods in the Atmospheric Sciences
4.2. Parameterization of The matrix is interpreted as the ageostrophic covariance, i.e. that structure not accounted for by P . One of the motivating factors of the hierarchical modeling approach is to simplify the structure of from a large and complicated joint variance-covariance matrix to a simpler model, letting the more complicated structure be taken up in the random field model for P . We assume that u and v are correlated at the same site, but not across sites, so that = K22 IN N where !
u2 uv uv v2 is the variance-covariance matrix of (U (s); V (s)). K=
(12)
4.3. Parameterization of Bu and Bv Recalling the rough argument that the u and v components should be proportional to the gradient of the pressure field, the geostrophic approximation yields:
vg / f1 dP dx
and ug
/ ? f1 dP dy
where x and y are the east-west and north-south directions, respectively, and term. Consider the following first-order neighborhood of the point s3 :
s2 s1 s3 s5 s4
f is the Coriolis
(13)
The east-west gradient in P is proportional to
([P (s5 ) ? P (s3 )] + [P (s3 ) ? P (s1 )])=2 = ([P (s5 ) ? P (s1 )])=2: The north-south gradient in P is proportional to
([P (s2 ) ? P (s3 )] + [P (s3 ) ? P (s4 )])=2 = ([P (s2 ) ? P (s4 )])=2: Thus, under geostrophy, we might expect that U (s3 ) = 1([P (s5 ) ? P (s1 )]=2) + 2 ([P (s2) ? Under geostrophy, we further expect 1 = 0 for the u component (Holton, p.40). However, for now we retain this as a parameter to be estimated. Similarly, we might expect V (s0 ) = 3[P (s5) ? P (s1)] + 4 [P (s2 ) ? P (s4 )], and 4 = 0 under geostrophy (Holton, p.40). These relationships imply a simple anisotropic nearest-neighbor structure on the matrices Bu and Bv .
P (s4 )]=2).
12
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli 4.4. Priors and Estimation
Conjugate priors were used for all parameters except those of the covariance function which were taken to be Gaussian. All priors were proper. ‘‘Science’’ was used in the elicitation process. For example, advertised precisions in data given in the scatterometer literature provided a starting point for the specification of priors on measurement errors. The old saw that geostrophy explains 70% of the variance of winds was converted into priors for the -parameters. Posterior estimation was based on a Gibbs sampling algorithm, including a MetropolisHastings procedure for the covariance parameters. See Royle, Berliner, Wikle and Milliff (1998) for details. 65 −0.
0.75
5 −0
.2
5
−1.5
−1
−0
.7
−1.75
0
5
5
0. 1.25
1
5
0.2
5
−1.2
60
−1.5 −0.5
−1.2
5
−0.7
−0
.25
5
.25
5
−0
0.7
−1
−1 5
0
0.5
0
−0
.7
55 −0
0.2
5
.5
−0
.25
0
50
298
300
302
304
306
308
310
312
314
316
318
320
Figure 2. Estimated posterior mean wind and pressure fields
4.5. Results and Enhancements Pointwise posterior mean estimates of the wind field and the pressure field are given in Figure 2. (In this article all plots of pressure fields show standardized pressure estimates for that field.) We obtained ECMWF analysis field estimates of pressures for this period. These appear in Figure 3. We were partially pleased at the rough correspondence between these estimated pressures. However, there are differences in the estimated pressure fields that are physically important. First, a little background. The Labrador Sea is a region of vigorous air-sea interaction (exchanges of heat, moisture and momentum) leading to major convection events (large, cyclonic storms, or ‘‘polar lows’’) having local, synoptic, and climatic implications.
13
Bayesian Methods in the Atmospheric Sciences
Tracking such lows is therefore very important. The day in our example has such a low, indicated in the elliptical region in the upper left in Figure 2 (and 3). (Note that the standardized pressures in this region are negative, indicating a low pressure system.) 65
0.2
5
−0.
−1.25
75 0.5
.5 −0 −1
60 −1
.25
.5
−1
−0
.75
1.25
0.75
0
0.25
−0.25
1
25
−1.
−1 5
.7
−0
0 0.
0
−0
.25
1.7
5
25
0.5
−0.5
−0
.5
55
50
298
300
302
304
306
308
310
312
314
316
318
320
Figure 3. ECMWF pressure field
Our analysis locates this storm further west and about a degree north of location suggested by ECMWF. Also, our estimated storm is a bit tighter in space than the one indicated by ECMWF. Indeed, our reconstructed pressure field is actually more in tune with the expectations of a meteorologist based on the scatterometer wind data (Brown and Levy, 1986). The principal reason why meteorologists would prefer our estimated pressure field is that the center of the low pressure system is in much better agreement with the center of the wind circulation suggested by the scatterometer data. (We imposed the scatterometer data on the ECMWF pressure to facilitate this comparison.) This is a very encouraging feature of our model. Before our model the ECMWF pressure fields had been generally viewed as the best, though flawed, pressure estimates available. Our model appears to offer practical improvements. Mainly motivated as illustration, we considered a possible extension of the model. Rather than simply comparing our pressure field to that of the ECMWF, we can use ECMWF fields as ‘‘data’’ in our analyses. The key is the treatment of someone else’s ‘‘posterior’’ (the estimates from ECMWF) as a likelihood function ([ECMWF estimates of pressure given true pressure]). This strategy offers a way of combining information in a Bayesian approach. In particular, we can incorporate a variety of data sources and a GCM-scale numerical model without running that model ourselves. For our example, let Dp denote the ECMWF pressure field. We assume that the
14
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli
scatterometer wind data and the ECMWF pressure ‘‘data’’ to be conditionally independent given true winds and pressures. Since ECMWF did not use scatterometer data in producing pressure analyses, this seems quite plausible. We simply appended the following model to our Stage 1 data step: [DpjP; e2] is Gau(P; e2I ); (14)
where e2 represents a measurement error variance associated with the ECMWF output. Hence, the modified Stage 1 model is the product of (9) and (14). The presence of actual pressure ‘‘data’’ enables a more general formulation of the prior for pressure. Recall that in (11) the prior mean for the pressure field was set to zero everywhere. This does not really reflect our prior information, but was not considered critical since only differences in values of pressures appear in the wind model, i.e., a common mean value for pressures cancels in the wind model. With pressure data, we believe there may be substantial value in an extension of (11) to include a grand mean p , which itself is endowed with a prior. (Based on coarse climatological estimates, we used a normal prior with prior mean 105 Pa and standard deviation of 1500 Pa, where 1 Pa (Pascal) = 1 kg m?1 s?2 .) The potential value in this change involves rethinking the specification of the covariance in (11). While we believe the spatial correlation structure suggested in R remains meaningful, the variability modeled via p2 is different. In particular, we expect this quantity to be smaller than in (11), since it reflects variation around an improved mean model. We modified the prior on p2 to reflect this. On the other hand, we do not believe that the simple model in (14) captures the genuine distribution of ECMWF analyses. This enhanced formulation led to the results depicted in Figure 4. Note that while the pressure field appears more in line with meteorological understanding than does the original ECMWF analysis, we believe that our analysis without incorporating ECMWF is actually superior. This may be due to flaws in the ECMWF techniques and/or weaknesses in the model (14). Our intent was not to resolve such issues, but to exemplify a method by which Bayesians can readily incorporate various sources of data. We are also pursuing a variety of enhancements of the pressure model and extensions to temporal evolution. 5. DISCUSSION
5.1 Constructing Model Components In the spirit of employing physics in the construction of models, a full spectrum of statistical models can be entertained. At one extreme, usage of functional evolution equations, including nonlinear dynamics, is possible in principle. Recalling the time series idea of modeling [XtjXt?1; ], it is natural to set the conditional expectation of Xt to be a physically derived, nonlinear function of Xt?1 . Extending to the context in which Xt actually represents a spatially distributed variable, this leads to a stochastic, numerical dynamical model. (See Scipione and
15
Bayesian Methods in the Atmospheric Sciences 65 0 −0.5 −0.7 5 −1
−0.
25
−1.75
0.75
−0.5
−0.25
−0.7 5
0.5
0
−1.25
−1
5 0.2
5
.5
.2
−1
−1
60
1
−1
−0
.5
−0.
1.25
55
75
−0.2
5
5 0.2
−0.5
50
298
300
302
304
306
308
310
312
0.5
0
314
316
318
320
Figure 4. Estimated posterior wind field and pressure field, incorporating ECMWF pressure data
Berliner, 1993, for a ‘‘toy’’ example.) Next, flexible classes of statistical models, both nonlinear and linear can be employed in component distributions. Further, physical reasoning, past data, and even computer model output can all be used to formulate the form of model selected as well as the required priors for model parameters. A second direction for model construction involves specification of priors on parameters. Model parameters themselves can be modeled as spatially and/or temporally varying. Examples are discussed in Wikle, Berliner, and Cressie (1997). Incorporating these notions with the idea of ‘‘mixture modeling’’ can produce interesting models intended to capture regimes and blocking. See Lu and Berliner (1998) for an example. 5.2. Scientific Learning Bayesian hierarchical models offer the possibility of genuine learning across scientific investigations. Component models can be viewed as "modules.’’ For example, scientists who do not like the pressure model in our Stage 3 can employ a different model within a similar analysis. There is an interesting challenge to the simplicity school in this context. Namely, if one views a particular analysis as useful or transferable in other contexts, the development of the simplest model compatible with a particular data set may not be a robust strategy.
16
L. M. Berliner, J. A. Royle, C. K. Wikle and R. F. Milli
5.3. Issues While hierarchical modeling is attractive, it is not without problems. 1. Formulation and Model Assessment These issues arise with a vengeance in complex problems. First, we suggest that quite generally ‘‘good,’’ proper priors on parameters are necessary to achieve good results. For example, some models may include nonidentifiable parameters. While restriction to proper priors leads to valid Bayesian results, care in both prior specification and interpretation of results may be necessary. Unfortunately, the complexity we seek to manage can also mean ‘‘good’’ priors may be difficult to assess in hierarchies with many stages. Further, sensitivity studies of the impact of prior specification in gigantic models is both needed and difficult. 2. Calculation. The calculational burden in analyzing large hierarchical Bayesian models can be heavy. Further, monitoring MCMC convergence properties (Mengersen, Robert, and Guihenneuc-Jouyaux, 1998) and output analysis of MCMC realizations are difficult issues for huge models. Fortunately, these problems are very active areas of research. ACKNOWLEDGMENTS This work was supported by the National Center for Atmospheric Research’s Geophysical Statistics Project, sponsored by the National Science Foundation under Grant #DMS93-12686; the NCAR NSCAT Science Working Team cooperative agreement with NASA JPL; and the National Institute of Statistical Sciences, Research Triangle Park, NC. REFERENCES Berliner, L. M. (1996), Hierarchical Bayesian time series models. In Maximum Entropy and Bayesian Methods, K. Hanson and R. Silver (Eds.), Dordrecht: Kluwer, 15-22. Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory, New York: Wiley. Brown, R. A. and Levy, G. (1986). Ocean surface pressure fields from satellite sensed winds. Monthly Weather Review 114, 2197-2206. Cressie, N. A. C. (1993). Statistics for Spatial Data, Revised Edition. New York: Wiley. Daley, R. (1991). Atmospheric Data Analysis. Cambridge: University Press. Epstein, E. S. (1985). Statistical Inference and Prediction in Climatology: A Bayesian Approach. Boston: American Meteorological Society. Ehrendorfer, M. (1997). Predicting the uncertainty of numerical weather forecasts: a review. Meteorol. Zeitschrift, N. F. 6, 147-183. Freilich, M. H. (1997). Validation of vector magnitude datasets: Effects of random component errors. J. of Atmospheric and Oceanic Technology 14, 695-703. Freilich, M. H. and Dunbar, R. S. (1993). A preliminary C-band scatterometer model function for the ERS-1 AMI instrument. Proceeding of the First ERS-1 Symposium: Space at the Service of Our Environment, November 1992, Cannes, FRANCE, ESA SP-359 volume 1 (March 1993), 79-83. Holton, J. R. (1992), An Introduction to Dynamic Meteorology, 3rd Ed., San Diego: Academic Press. Large, W. G., Holland, W. R. and Evans, J. C. (1991). Quasigeostrophic ocean response to real wind forcing: The effects of temporal smoothing. J. Physical Oceanography 21, 998-1017.
Bayesian Methods in the Atmospheric Sciences
17
Lu, Z.-Q. and Berliner, L. M. (1998). Markov switching time series models with application to a daily runoff series. Water Resources Research. To appear. Mengersen, K. L., Robert, C. P., and Guihenneuc-Jouyaux, C. (1998). MCMC convergence diagnostics: A ‘‘reviewww.’’ Bayesian Statistics 6 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.). Oxford: University Press. Milliff, R. F., Large, W. G., Holland, W. R. and McWilliams, J. C. (1996). The general circulation responses of high-resolution North Atlantic Ocean models to synthetic scatterometer winds. J. Physical Oceanography 26, 1747-1768. Naderi, F. M., Freilich, M. H. and Long, D. G. (1991). Spaceborne radar measurement of wind velocity over the ocean- an overview of the NSCAT scatterometer system. Proceedings IEEE 79 850-866. Royle, J. A., Berliner, L. M., Wikle, C. K., and Milliff, R. (1997). A Hierarchical spatial model for constructing wind fields from scatterometer data in the Labrador Sea. Tech. Rep., 97-30, Statistics Department, Ohio State University, Columbus. Schneider, S. H. (1992). Introduction to climate modeling. In it Climate System Modeling (K. E. Trenberth, Ed.), Cambridge: University Press. Scipione, C. M. and Berliner, L. M. (1993). Bayesian inference in nonlinear dynamical systems. 1993 Proc. of the Section on Bayesian Statist. Sci. Washington: American Statistical Association. Stoffelen, A., and Anderson, D. (1997). Scatterometer data interpretation: Measurement space and inversion. J. Atmospheric and Oceanic Technology, in press. Tarantola, A. (1987). Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation. New York: Elsevier. Thiebaux, H. J. (1985). On approximations to geopotential and wind-field correlation structures. Tellus 37A, 126-131. Trenberth, K. E. (Ed.) (1992). Climate System Modeling. Cambridge: University Press. Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-time models. J. Environ. Ecol. Statist. To appear. Wikle, C. K., Milliff, R. F., Nychka, D., and Berliner, L. M. (1998). Spatio-temporal hierarchical Bayesian blending of tropical ocean surface wind data. Tech. Rep., GSP98-01, Geophysical Statistics Project, National Center for Atmospheric Research, Boulder, Colorado, USA.