rameters, then Eq.6 further simplifies to. ËErr(Zu|Zs) = Err(Zu|Zs(xs), Ëθ). (7). i.e. the posterior estimate of error is just the error calculated using the posterior ...
Joint Statistical Meetings - Section on Statistics & the Environment
BAYESIAN NETWORK DESIGNS FOR FIELDS WITH UNKNOWN VARIANCE FUNCTION Milena Banjevic, Paul Switzer, Stanford University Statistics Department, Stanford University, Stanford, CA, 94305 {milena, ps}@stanford.edu Key Words: Spatial Statistics, Optimal Sampling, Bayesian Estimate.
highly inefficient solutions. We explore alternative strategies for solving the problem of selecting the optimal sampling set for fields with unknown parameters. Section 2. introduces the general methodology used throughout the paper, as well as the proposed sampling set selection method. In Section 3. we apply this method to the random field whose variance is a step function of the location and present results of a simulation. In Section 4. we generalize these results by drawing conclusions for large number of repeated simulations, for several methods. Section 5. summarizes the results presented, as well as some future directions.
Abstract: We consider the problem of designing a network of sampling locations in a spatial domain that will be used to interpolate a spatial field. We focus on the random field model in which variance is given by an unknown step function of the locations. We express this uncertainty through an appropriate class of prior distributions and introduce a Bayesian sequential sampling algorithm. At each step, posterior parameter values are updated through realizations from previously selected locations. We examine the convergence of parameter estimates. We discuss the improvement of the Bayesian method over conventional sampling techniques for different prior distributions and cardinalities of the design network.
1.
2.
Methodology
Random field model Random process Z, that may be observed at locations x = (x1 , . . . , xN ) is modelled as a Gaussian random field
Introduction
Z(x)|θ ∼ N(0, Σx,θ )
Consider a problem of finding optimal sampling designs for dependent spatial data. The motivation is a need to interpolate the observed behavior of a process at unobserved locations, as well as to design a network of optimal observation locations which allows accurate representation of the process. Such goals are especially important in geophysics, meteorology and environmental sciences, since it is usually costly, unfeasible or impossible to sample the entire area. Designing an optimal sampling network is a hard problem. Random selection of units can be extremely inefficient, while systematic grid is inflexible to irregular features of the space, such as stratification, non-homogenous variances and anisotropy. In order to find the best sampling design, we would like to model the process by an appropriate spatial random field, incorporate prior knowledge, select the best subset of points of desired cardinality and best represent the field in question. Most of the methods, such as sequential selection, local searches and simulated annealing, assume that random field parameters are known and well defined. Assuming wrong parameters can be fatal and lead to
(1)
where θ are the covariance parameters, that might be given or unknown. Σx,θ is the covariance of Z with elements [Σx,θ ]i,j = σ(xi , θ)σ(xj , θ)ρ(xi , xj )
(2)
We focus on the variance σ 2 (x, θ) and examine the behavior for variance as a step, smooth or any parametric function of the location. Field Interpolation and Design evaluation Let Zs = (Zs1 , . . . , Zsn ) represent the field sampled at locations xs = (x1s , . . . , xns ), Zu = (Zu1 , . . . , ZuN −n ) −n represent the field at locations xu = (x1u , . . . , xN ) u that are not sampled. When θ parameter is known, the prediction error depends only on the sampling design xs and θ, not actual realizations of the field. The field is interpolated with the simple kriging predictor ˆ u = E(Zu |Zs , θ) Z (3) Sampling design xs is evaluated by the average prediction error at unsampled locations Err(Zu |Zs (xs ), θ) =
129
1 tr(V (Zu |Zs , θ)) (4) N −n
Joint Statistical Meetings - Section on Statistics & the Environment
θ˜ given the realizations of the field at locations selected at previous iterations, in order to select further locations. If the variance is more complex func˜ is still a good estimate of tion of θ, Err(Zu |Zs , θ) ˜ Err(Zu |Zs ).
If θ is not known explicitly, we adopt a bayesian view and represent θ through a suitable prior distribution. All the previous knowledge of the parameter θ is incorporated through the choice of the prior distribution and its parameters. The field is interpolated by the posterior mean of Zu , given the data ˜ u |Zs ) = Eθ [E(Zu |θ, Zs )] ˆ u = E(Z Z
3.
(5)
Variance as an unknown step function of the location
Suppose that
Then the average prediction error ([2]) is
σ 2 (x|θ) = θ1 I(x ∈ S1 ) + θ2 I(x ∈ S2 )
˜ Err(Z u |Zs ) == Eθ [Err(Zu |Zs , θ)] 1 = tr(V˜ (Zu |Zs )) N −n
(6)
(10)
where S = S1 ∪ S2 is the study area, as in [5], and 1
2
If the variance σ (x, θ) is a linear function of θ parameters, then Eq.6 further simplifies to
S1
0.9
0.8
S2
0.7
0.6
θ1
0.5
˜ ˜ Err(Z u |Zs ) = Err(Zu |Zs (xs ), θ)
(7)
θ2
0.4
0.3
0.2
i.e. the posterior estimate of error is just the error calculated using the posterior estimate of θ parameter ([3]). This fact simplifies the process of location selection.
Figure 1: Study area S = S1 ∪ S2
Best design selection We wish to select the sampling set xs ∗ that minimizes the error over the whole region ˜ xs ∗ = argmin Err(Z (8) u |Zs )
θ = (θ1 , θ2 ) are the variance parameters. Variance is equal to θ1 in the area S1 and θ2 in the area S2 (Fig. 1). Assume that θ1 is known and fixed, while θ2 is unknown and given through the Gamma prior, θ2 ∼ Gamma(r, λ), with known moments Eθ2 , V θ2 .
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
xs
Parameter Estimates Suppose θt = (θ1t , θ2t ) is the underlying parameter used for simulating the field. Realized value of the parameter for a particular field will vary slightly from θt . We define the realized parameter, θall = (θ1all , θ2all ), to be MLE estimate
We propose the augmented sequential algorithm for the selection of the sampling set. The parameter is updated at each step of the selection process with information from recently sampled locations, as in Table 1. If θ parameter is known, the above method Updated Sequential Algorithm
θall = max p(Z(x)|θ1 , θ2 )
(a) Given already selected xs,i−1 :
(θ1 ,θ2 )
1. Sample, obtain zs,i−1 .
θall can not usually be obtained in practice, but can be calculated it in a simulation study, as a reference for other estimates. Parameter θ2 is unknown a priori. Given the sampled data Zs , it is updated by it’s best estimate, assuming fixed θ1 . We recommend the estimate θ2bay , the posterior mean (bayesian) update Z θ2bay = θ2 p(θ2 |θ1 , Zs (xs ))dθ2 (12)
2. Update θ to θˆi , given zs,i−1 . 3. Select the ith location by: xi0 = arg min Err(Zu |(xs,i−1 ∪ x), θˆi ) x
(11)
(9)
(b) Repeat until n locations have been selected.
Table 1: Updated sequential algorithm simplifies to the usual sequential selection algorithm. If θ is given only through it’s prior distribution, and σ 2 (x, θ) is a linear function of θ parameter, then our method takes advantage of the fact that ˜ ˜ Err(Z u |Zs ) = Err(Zu |Zs , θ), as in Eq.7. At each iteration, the method uses the most recent estimate
θ2
We determine the optimality of this choice of update by comparing the performance of this algorithm with similar algorithms that use different estimates of θ2 , given the data. The choices considered are:
130
Joint Statistical Meetings - Section on Statistics & the Environment
Parameter
Explanation Error Underlying Values θ1t , θ2t simulation value Errt θall MLE at Z Errall Estimates from Sampled Locations θ2pr prior mean Errpr θ2bay posterior mean at Zs Errbay θ2M LE MLE at Zs ErrM LE θ2lm likelihood mean at Zs Errlm
Let θ1t = 5 and θ2 ∼ Gamma(r, λ) with E(θ2 ) = θ2pr = 6 and V (θ2 ) = 24. Underlying parameter used for simulating the field was θ2t = 1, previously chosen from the above Gamma prior. The posterior mean estimate obtained by the program is θ2bay = 2.4 (Table 3). The realized field is shown in Figure 2. Posterior distribution of the parame-
Table 2: Legend of θ2 estimates
Pre-sim. Eθ2 V θ2 6 24
Eθ1 5
θ2M LE , the maximum likelihood estimate from the sampled data θ2M LE = argmax p(Zs (xs )|θ1 , θ2 )
Post-sim. θ2all 1.6
θ2t 1
Estimates θ2pr θ2bay 6 2.4
Table 3: Single simulation results ter is shifted towards θ2t with respect to the prior, reflecting the knowledge gained from the sampled locations (Fig.3). θ2bay = 2.4 estimates well the empirical value θ2all = 1.6, better than θ2pr =6. Starting value of the parameter estimate θˆ2 is the prior mean θ2pr (blue line), while the final estimate is the posterior mean after the last iteration θ2bay (red line), which is closer to the realized θ2all (green line).
(13)
θ2
θ2lm , the mean of the likelihood estimate from the sampled data, equivalent to the posterior mean with the diffuse prior Z θ2lm ∝ θ2 p(Zs |θ1 , θ2 )dθ2 (14) θ2
0.1
and θ2pr , the prior mean that does not take data into consideration. These estimates can be obtained from sampled data and are meant to estimate the realized θ2 parameter, θ2all (Table 2).
← θ2t, simulation value
0.09
← θ2all, realized value
0.08
← θ2bay, posterior mean
0.07
← θ2pr, prior mean
p(θ2)
0.06 0.05
← posterior distribution
0.04 3
0.03 4
0.02
2
← prior distribution
3
0.01
2 1
0 0
Z
1
5
10
θ2
0 0
15
−1
Figure 3: θ2 in a single simulation
−2 −1
−3 1 0.8
1 0.6
0.8 0.4
0.2
y
−2
0.6
0.4 0.2 0
0
1
x
12
2
7
0.8
Figure 2: Single realization of the field, with θ1 = 5, θ2 = 1
0.7
1
16
0.9
3
0.7
20
3
11
0.6 7
0.5
0.5 17
1 0.4
ρ(x1 , x2 ) = exp(−4|x1 − x2 |)
1
13
0.4 8
19
17
0.3 9
0.2 0.1 14
5
0 0
0.2
18 0.4
0.6
0.8
8 20
0.2
4
10
(a) PSEQ
1
0.1 12
5
0 0
0.2
14 10
131
4
19 0.4
0.6
(b) BSEQ
Figure 4: Sampling designs
(15)
16
2
6
15
0.3
15 6
0.8
13
0.6
Single Simulation Results The study area is [0, 1] × [0, 1] square, as in Fig. 1. The area is discretized into 20 × 20 grid of 400 potential sampling locations of which we select a sampling set of cardinality up to n = 20. We assume a known exponential correlation function ([1])
18
0.9 9
11
0.8
1
Joint Statistical Meetings - Section on Statistics & the Environment
PSEQ algorithm (using prior θ2pr (= 6)), selects 7 locations from area S1 and 13 locations from area S2 , reflecting the prior belief that θ2 = 6 is slightly larger than θ1 = 5 (Fig.4(a)). BSEQ algorithm (using posterior θ2bay (= 2.4)), selects 14 locations from area S1 and 6 locations from area S2 (Fig.4(b)). Initially both PSEQ and BSEQ algorithms select the same sampling locations, but as more information is gathered, bayesian algorithm selects fewer points from the area S2 , reflecting the knowledge gained about θ2t (= 1) and θ2all (= 1.6), which are much lower than previously assumed by the prior mean θ2pr (= 6). Average squared prediction error at all locations was Errpr = 0.87 with PSEQ method, as opposed to Errbay = 0.55 for BSEQ method, resulting in a ratio of Errbay /Errpr = 0.63 or a 37% less error with the bayesian algorithm.
more sampling locations are selected. At 20 sampling locations the average ratio is 93%, meaning on average 7% improvement with the bayesian algorithm (BSEQ) over the usual sequential algorithm (PSEQ), averaged over all the values of θ2 , selected from the appropriate Gamma prior.
1.6 ← prior mean θ2pr 1.4
Errbay/Errpr
1.2
1
0.8
0.6
0.4 0
4.
5
Repeated Simulations
In order to test the performance of methods introduced, the simulated example from Section 3. was repeated for different values of parameter θ2 , with multiple realizations of the field for each value.
10
θ2all
15
20
Figure 6: Relative Error ratio Errbay /Errpr throughout the range of θ2all Local Performance We examine the performance when the realized parameter θ2all is smaller, similar, or larger than the mean of the specified Gamma distribution. Figure 6 presents the relative performance Errbay /Errpr for each simulated field vs θ2all , the realized value of parameter θ2 . The standard deviation of the error ratio, over all the points, is sdb/p = 0.15 and is uniform over the range of θ2all . PSEQ and BSEQ perform similarly when realized parameter θ2all is similar to the prior mean θ2pr , since no information is gained with new observations. When θ2all is away from the prior mean θ2pr , performance improves up to 20%.
Overall Performance In simulations we have realizations over the entire field and can therefore calculate realized prediction error, for each field simulated for both BSEQ (Errbay ) and PSEQ (Errpr ) methods. Improvement of the bayesian method with respect to the prior method is represented by the relative performance Errbay /Errpr , calculated for each simulation. Overall performance is represented by average relative performance over all the simulated fields. 1.01 1 0.99
1.3
0.98
1.25
Errb/Erru
← Errpr
0.97
← ErrMLE
1.2
0.96
Err/Errall
1.15
0.95
1.1 ← Errlm
0.94
← Errbay
1.05
0.93
2
4
6
8 10 12 14 cardinality of the sampling set
16
18
20 1
0.95 0
Figure 5: Relative Error ratio Errbay /Errpr vs. cardinality of the sampling set
5
10 θ2all
15
20
Figure 7: Error ratio, for different methods, vs throughout the range of θ2all
Figure 5 presents the average relative performance Errbay /Errpr for different cardinalities of the sampling set. One can see that there is more advantage to be gained from the bayesian algorithm as
Error Comparison We would like to extend our analysis to include other methods and confirm the
132
Joint Statistical Meetings - Section on Statistics & the Environment
results of Section 3. for multiple simulations. In order to get some insight into the absolute performance of the methods, we compare the error with each method to the hypothetical method AMSEQ, which assumes complete knowledge of the realized parameter value θ2all , calculated using observations over the entire field, which are available only in the simulation study. Figure 7 shows the performance of the methods mentioned in Table 2 by presenting their errors as a ratio of Errall . With respect to Errall , Errbay (red line, Fig.7) has a ratio of 1-1.05, i.e. the error with the bayesian method is on average at most 5% larger than the hypothetical ’know-all’ method. Compared to other methods, Bayesian method (BSEQ) on average performs the best, with the ratio Errbay /Errall closest to 1. Usual method (PSEQ), with error Errpr , is very inefficient when θ2all is away from the prior mean; MLE method (MSEQ), with error ErrM LE , is inferior almost everywhere; while likelihood mean method(LSEQ), with error Errlm , is slightly inferior to BSEQ. The bayesian method also has the least variance, with standard deviation of the ratio st.devb/al = 0.2, as opposed to the usual method st.devp/al = 0.23, MLE method st.devl/al = 0.4 and likelihood mean method st.devm/al = 0.21.
5.
optimally chosen, given the previously selected, fixed sampling locations, but as a whole there might exist a better sampling design that could not have been arrived at with the sequential algorithm. One of the alternatives is sequentially selecting batches of sampling locations in several stages [7], where at each stage a batch of points is selected simultaneously, as with Spatial Simulated Annealing Algorithm (SSAA)[6]. In this case the parameter would be updated only after a whole batch is selected and sampled. Even though the arrangement of the locations could potentially be more optimal, we have not considered SSAA here, since the parameter is updated less frequently and less information about the variance is gathered.
References [1] J.Sacks, S.Schiller: Spatial designs [2] P.K.Kitanidis. Parameter uncertainty in estimation of spatial functions Water resources research, 22/4, 499-507, apr.1986 [3] K.V.Mardia, J.T.Kent, J.M.Bibby. Multivariate Analysis, p.63. Academic Press (1997) [4] P.J.Brown, N.D.Le, J.V.Zidek. Multivariate Spatial Interpolation and Exposure to air Pollutants. The Canadian Journal of Statistics 22/4, 489-509 (1994)
Conclusions and Extensions
We have presented some robust strategies for selecting the optimal sampling set when the random field studied is partially unknown. The method is based on an intuitive idea that any newly available information should be incorporated in the future decision process. From the several candidates of the parameter updates posterior mean estimate on average performs the best, in terms of the average prediction error, robustness with respect to the realized parameter value, exhibited variation and least biased, most realistic estimates. The model used in this study has been greatly simplified. The method should easily be modified to include non-zero mean field. The method can also be extended for variance as any parametric function of the location, with any number of parameters. Naturally, the efficiency of the method deteriorates with addition of the parameters to be estimated and scarcity of sampling locations. In general, the priors of the variance and covariance function can be more complicated, such as Wishart priors in [4]. The arrangement of the sampling locations obtained by a sequential algorithm can potentially be suboptimal. Every subsequent sampling location is
[5] G.Arbia, P.Switzer. Spatial sampling designs for stratified correlated units unequal variances. 1994. [6] M.Ferri, M. Piccioni (1992). Optimal selection of statistical units. Computational Statistics & Data Analysis 13(1992),p. 47-61. [7] J.van Groenigen, (1999). Constrained optimization of spatial sampling. Wageningen Agricultural University and ITC.
133